+++
title = 'Handy tracing tools with eBPF'
date = 2024-11-17T14:36:07+01:00
draft = true
tags = ['ebpf', 'tracing', 'network', 'kernel']
+++

[eBPF](https://ebpf.io/) allows event-driven programs, written in high-level languages, to be configured to run against
pre-defined hooks such as syscalls, function invocations, and network events. The technology enables the creation of
user-space implementations of many tools which previously required a kernel implementation or module.

While researching the technology and scoping out potentially interesting use-cases, I've discovered that eBPF ships with
a collection of simple but useful tracing tools. I suspect that I'll be reaching for these frequently in the future
&mdash; especially for those tricky bugs where more traditional debugging techniques fail to deliver.

The introspection provided by many, if not all of these tools was previously achievable in other ways. However,
oftentimes it was using tooling such as [strace](https://man7.org/linux/man-pages/man1/strace.1.html) which has
performance issues and is not always a convenient choice. Additionally, the tooling provided by eBPF should
theoretically be cross-platform, meaning it works on both Linux and MacOS alike. This is something that frequently is
not the case for more legacy solutions.

## Installing bcc-tools

The tooling is provided by [bcc-tools](https://github.com/iovisor/bcc) package. To install on Arch Linux:

```bash
% pacman -S bcc bcc-tools python-bcc linux-headers

# required for bashreadline:
% pacman -S python-pyelftools
```

The tools will be installed to `/usr/share/bcc/tools`. Inconveniently, this is probably outside of your search path.
It's easy to add this path to your `PATH` variable:

```bash
% export PATH=/usr/share/bcc/tools:$PATH
```

Now, let's try the `execsnoop` command. This tool traces the creation of new processes.

```sh
% execsnoop
bpf: Failed to load program: Operation not permitted

Traceback (most recent call last):
  File "/usr/share/bcc/tools/execsnoop", line 268, in <module>
    b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
  File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 851, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 523, in load_func
    raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run
```

Running code which installs eBPF hooks requires root privileges or the `CAP_BPF` capacity. When running outside of
containerized environments this probably means running bcc tools with `sudo`. However, when I tried to run the
`execsnoop` tool with sudo:

```bash
% sudo execsnoop
sudo: execsnoop: command not found
```

Of course, this is a common issue with sudo. Many Linux distributions are configured to not preserve environment
variables, including `$PATH`, when running commands with `sudo`. Somehow I've managed to live with this over the years
with a number of unsatisfying workarounds but the prospect of being able to run this tooling with minimal friction
finally provided the inspiration for me to find a better way. This [was
achieved](https://git.netflux.io/rob/dotfiles/commit/c39ec29b21751744a645b9bef5ba06d5fabee9bf) with a simple alias:

```bash
alias sudop='sudo env PATH=$PATH'
```

After adding the alias, it's easy to run a bcc tool: :tada:

```bash
% sudop execsnoop
PCOMM            PID     PPID    RET ARGS
```

## Useful commands

This section only scratches the surface of what's available out-of-the-box with bcc-tools. Most of my inspiration came
from Brendan Gregg's [blog post](https://www.brendangregg.com/ebpf.html).

### Tracing newly created processes

As mentioned above, you can watch process creation with `execsnoop`, which is named after the `exec` syscall:

```bash
% sudop execsnoop
PCOMM            PID     PPID    RET ARGS
sh               497415  2580      0 /bin/sh -c tmbatinfo
tmbatinfo        497415  2580      0 /home/rob/script/tmbatinfo
bash             497415  2580      0 /usr/bin/bash /home/rob/script/tmbatinfo
batinfo          497417  497415    0 /home/rob/script/batinfo
bash             497417  497415    0 /usr/bin/bash /home/rob/script/batinfo
sh               497416  2580      0 /bin/sh -c sysinfo
sysinfo          497416  2580      0 /home/rob/script/sysinfo
```

This outputs columns:

* `PCOMM`: some googling suggests this should be the _parent_ command name, but it seems to me to be the name of the
launched (child) command.
* `PID`: the PID of the launched process.
* `PPID`: the PID of the parent process.
* `RET`: I assume this is the return value of the syscall.
* `ARGS`: the arguments provided to the syscall, which at least in my experiments seems to include the file or
  path of the command as well as the arguments.

Importantly, the `-x` flag can be passed to make `execsnoop` show failed syscall attempts as well as those that are
successful.

### Tracing open files

Similarly, `opensnoop` allows tracing of the `open` syscall which is used to open files.

I used to rely heavily on the pre-eBPF, dtrace-based implementation of `opensnoop` back in my Mac days, and it's nice
to discover that I can call upon it from Linux.

```bash
% sudop opensnoop
PID    COMM               FD ERR PATH
347357 ThreadPoolForeg    27   0 /home/rob/.cache/google-chrome/Default/Cache/Cache_Data/1b2c2ddd2d7ef7c5_0
347357 Chrome_ChildIOT    32   0 /dev/shm/.com.google.Chrome.ZkPQQO
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/.com.google.Chrome.lWY8pP
347312 ThreadPoolForeg   114   0 /dev/shm/.com.google.Chrome.aLGFXG
347312 ThreadPoolForeg   192   0 /proc/347426/stat
347312 ThreadPoolSingl   192   0 /proc/347426/task/347426/status
347312 chrome            192   0 /dev/shm/.com.google.Chrome.ddntL8
347312 ThreadPoolForeg   109   0 /proc/347426/stat
347312 ThreadPoolForeg   103   0 /proc/347426/stat
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/proxy.js
347312 ThreadPoolForeg   103   0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/fileFetcher.js
```

The columns are similar to `execsnoop`:

* `PID`: the PID of the process calling `open`.
* `COMM`: the name of the calling process.
* `FD`: this is the process-scoped Linux file descriptor which is opened.
* `ERR`: error code, 0 for success.
* `PATH`: the path of the file being opened, which may of course be a Linux virtual filesystem.

### Tracing outgoing TCP connections

With `tcpconnect` we can trace outgoing TCP connections (using the `connect` syscall) &mdash; in this case triggered by
a `curl https://google.com` from my local network address (`192.168.1.147`) to Google's server at `172.217.14.14:443`.

```bash
% sudop tcpconnect
Tracing connect ... Hit Ctrl-C to end
PID     COMM         IP SADDR            DADDR            DPORT
515773  curl         4  192.168.1.147    172.217.17.14    443
```

### Tracing incoming TCP connections

Simialrly, with `tcpaccept` we can trace incoming TCP connections (using the `accept` syscall). In this case, I used
Ruby to spun up an HTTP server on port 8001:

```bash
$ ruby -run -ehttpd . -p8001
```

And then again used `curl` to make a request which was traced successfully:

```bash
% sudop tcpaccept
PID     COMM         IP RADDR            RPORT LADDR            LPORT
516996  ruby         6  ::1              41200 ::1              8001
```

## Conclusions

I am just starting to explore the world of eBPF but I'm excited to discover a suite of lightweight system tracing
utilities that I can see being a genuinely useful addition to my toolkit.

Once again, don't forget to check out the [eBPF website](https://ebpf.io) and [Brendan Gregg's
blog](https://brendangregg.com/ebpf.html) to dive deeper.