netflux-blog/content/posts/bpf-tools.md

7.8 KiB

+++ title = 'Handy tracing tools with eBPF' date = 2024-11-17T14:36:07+01:00 draft = true tags = ['ebpf', 'tracing', 'network', 'kernel'] +++

eBPF allows event-driven programs, written in high-level languages, to be configured to run against pre-defined hooks such as syscalls, function invocations, and network events. The technology enables the creation of user-space implementations of many tools which previously required a kernel implementation or module.

While researching the technology and scoping out potentially interesting use-cases, I've discovered that eBPF ships with a collection of simple but useful tracing tools. I suspect that I'll be reaching for these frequently in the future — especially for those tricky bugs where more traditional debugging techniques fail to deliver.

The introspection provided by many, if not all of these tools was previously achievable in other ways. However, oftentimes it was using tooling such as strace which has performance issues and is not always a convenient choice. Additionally, the tooling provided by eBPF should theoretically be cross-platform, meaning it works on both Linux and MacOS alike. This is something that frequently is not the case for more legacy solutions.

Installing bcc-tools

The tooling is provided by bcc-tools package. To install on Arch Linux:

% pacman -S bcc bcc-tools python-bcc linux-headers

# required for bashreadline:
% pacman -S python-pyelftools

The tools will be installed to /usr/share/bcc/tools. Inconveniently, this is probably outside of your search path. It's easy to add this path to your PATH variable:

% export PATH=/usr/share/bcc/tools:$PATH

Now, let's try the execsnoop command. This tool traces the creation of new processes.

% execsnoop
bpf: Failed to load program: Operation not permitted

Traceback (most recent call last):
  File "/usr/share/bcc/tools/execsnoop", line 268, in <module>
    b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
  File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 851, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 523, in load_func
    raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run

Running code which installs eBPF hooks requires root privileges or the CAP_BPF capacity. When running outside of containerized environments this probably means running bcc tools with sudo. However, when I tried to run the execsnoop tool with sudo:

% sudo execsnoop
sudo: execsnoop: command not found

Of course, this is a common issue with sudo. Many Linux distributions are configured to not preserve environment variables, including $PATH, when running commands with sudo. Somehow I've managed to live with this over the years with a number of unsatisfying workarounds but the prospect of being able to run this tooling with minimal friction finally provided the inspiration for me to find a better way. This was achieved with a simple alias:

alias sudop='sudo env PATH=$PATH'

After adding the alias, it's easy to run a bcc tool: 🎉

% sudop execsnoop
PCOMM            PID     PPID    RET ARGS

Useful commands

This section only scratches the surface of what's available out-of-the-box with bcc-tools. Most of my inspiration came from Brendan Gregg's blog post.

Tracing newly created processes

As mentioned above, you can watch process creation with execsnoop, which is named after the exec syscall:

% sudop execsnoop
PCOMM            PID     PPID    RET ARGS
sh               497415  2580      0 /bin/sh -c tmbatinfo
tmbatinfo        497415  2580      0 /home/rob/script/tmbatinfo
bash             497415  2580      0 /usr/bin/bash /home/rob/script/tmbatinfo
batinfo          497417  497415    0 /home/rob/script/batinfo
bash             497417  497415    0 /usr/bin/bash /home/rob/script/batinfo
sh               497416  2580      0 /bin/sh -c sysinfo
sysinfo          497416  2580      0 /home/rob/script/sysinfo

This outputs columns:

  • PCOMM: some googling suggests this should be the parent command name, but it seems to me to be the name of the launched (child) command.
  • PID: the PID of the launched process.
  • PPID: the PID of the parent process.
  • RET: I assume this is the return value of the syscall.
  • ARGS: the arguments provided to the syscall, which at least in my experiments seems to include the file or path of the command as well as the arguments.

Importantly, the -x flag can be passed to make execsnoop show failed syscall attempts as well as those that are successful.

Tracing open files

Similarly, opensnoop allows tracing of the open syscall which is used to open files.

I used to rely heavily on the pre-eBPF, dtrace-based implementation of opensnoop back in my Mac days, and it's nice to discover that I can call upon it from Linux.

% sudop opensnoop
PID    COMM               FD ERR PATH
347357 ThreadPoolForeg    27   0 /home/rob/.cache/google-chrome/Default/Cache/Cache_Data/1b2c2ddd2d7ef7c5_0
347357 Chrome_ChildIOT    32   0 /dev/shm/.com.google.Chrome.ZkPQQO
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/.com.google.Chrome.lWY8pP
347312 ThreadPoolForeg   114   0 /dev/shm/.com.google.Chrome.aLGFXG
347312 ThreadPoolForeg   192   0 /proc/347426/stat
347312 ThreadPoolSingl   192   0 /proc/347426/task/347426/status
347312 chrome            192   0 /dev/shm/.com.google.Chrome.ddntL8
347312 ThreadPoolForeg   109   0 /proc/347426/stat
347312 ThreadPoolForeg   103   0 /proc/347426/stat
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/proxy.js
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/fileFetcher.js

The columns are similar to execsnoop:

  • PID: the PID of the process calling open.
  • COMM: the name of the calling process.
  • FD: this is the process-scoped Linux file descriptor which is opened.
  • ERR: error code, 0 for success.
  • PATH: the path of the file being opened, which may of course be a Linux virtual filesystem.

Tracing outgoing TCP connections

With tcpconnect we can trace outgoing TCP connections (using the connect syscall) — in this case triggered by a curl https://google.com from my local network address (192.168.1.147) to Google's server at 172.217.14.14:443.

% sudop tcpconnect
Tracing connect ... Hit Ctrl-C to end
PID     COMM         IP SADDR            DADDR            DPORT
515773  curl         4  192.168.1.147    172.217.17.14    443

Tracing incoming TCP connections

Simialrly, with tcpaccept we can trace incoming TCP connections (using the accept syscall). In this case, I used Ruby to spun up an HTTP server on port 8001:

$ ruby -run -ehttpd . -p8001

And then again used curl to make a request which was traced successfully:

% sudop tcpaccept
PID     COMM         IP RADDR            RPORT LADDR            LPORT
516996  ruby         6  ::1              41200 ::1              8001

Conclusions

I am just starting to explore the world of eBPF but I'm excited to discover a suite of lightweight system tracing utilities that I can see being a genuinely useful addition to my toolkit.

Once again, don't forget to check out the eBPF website and Brendan Gregg's blog to dive deeper.