netflux-blog/content/posts/bpf-tools.md

184 lines
7.8 KiB
Markdown
Raw Permalink Normal View History

2024-11-17 19:01:23 +00:00
+++
title = 'Handy tracing tools with eBPF'
date = 2024-11-17T14:36:07+01:00
2024-11-19 19:27:01 +00:00
draft = false
2024-11-17 19:01:23 +00:00
tags = ['ebpf', 'tracing', 'network', 'kernel']
+++
[eBPF](https://ebpf.io/) allows event-driven programs, written in high-level languages, to be configured to run against
pre-defined hooks such as syscalls, function invocations, and network events. The technology enables the creation of
user-space implementations of many tools which previously required a kernel implementation or module.
While researching the technology and scoping out potentially interesting use-cases, I've discovered that eBPF ships with
a collection of simple but useful tracing tools. I suspect that I'll be reaching for these frequently in the future
— especially for those tricky bugs where more traditional debugging techniques fail to deliver.
The introspection provided by many, if not all of these tools was previously achievable in other ways. However,
oftentimes it was using tooling such as [strace](https://man7.org/linux/man-pages/man1/strace.1.html) which has
performance issues and is not always a convenient choice. Additionally, the tooling provided by eBPF should
theoretically be cross-platform, meaning it works on both Linux and MacOS alike. This is something that frequently is
not the case for more legacy solutions.
## Installing bcc-tools
The tooling is provided by [bcc-tools](https://github.com/iovisor/bcc) package. To install on Arch Linux:
```bash
% pacman -S bcc bcc-tools python-bcc linux-headers
# required for bashreadline:
% pacman -S python-pyelftools
```
The tools will be installed to `/usr/share/bcc/tools`. Inconveniently, this is probably outside of your search path.
It's easy to add this path to your `PATH` variable:
```bash
% export PATH=/usr/share/bcc/tools:$PATH
```
Now, let's try the `execsnoop` command. This tool traces the creation of new processes.
```sh
% execsnoop
bpf: Failed to load program: Operation not permitted
Traceback (most recent call last):
File "/usr/share/bcc/tools/execsnoop", line 268, in <module>
b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 851, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/bcc/__init__.py", line 523, in load_func
raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run
```
Running code which installs eBPF hooks requires root privileges or the `CAP_BPF` capacity. When running outside of
containerized environments this probably means running bcc tools with `sudo`. However, when I tried to run the
`execsnoop` tool with sudo:
```bash
% sudo execsnoop
sudo: execsnoop: command not found
```
Of course, this is a common issue with sudo. Many Linux distributions are configured to not preserve environment
variables, including `$PATH`, when running commands with `sudo`. Somehow I've managed to live with this over the years
with a number of unsatisfying workarounds but the prospect of being able to run this tooling with minimal friction
finally provided the inspiration for me to find a better way. This [was
achieved](https://git.netflux.io/rob/dotfiles/commit/c39ec29b21751744a645b9bef5ba06d5fabee9bf) with a simple alias:
```bash
alias sudop='sudo env PATH=$PATH'
```
After adding the alias, it's easy to run a bcc tool: :tada:
```bash
% sudop execsnoop
PCOMM PID PPID RET ARGS
```
## Useful commands
This section only scratches the surface of what's available out-of-the-box with bcc-tools. Most of my inspiration came
from Brendan Gregg's [blog post](https://www.brendangregg.com/ebpf.html).
### Tracing newly created processes
As mentioned above, you can watch process creation with `execsnoop`, which is named after the `exec` syscall:
```bash
% sudop execsnoop
PCOMM PID PPID RET ARGS
sh 497415 2580 0 /bin/sh -c tmbatinfo
tmbatinfo 497415 2580 0 /home/rob/script/tmbatinfo
bash 497415 2580 0 /usr/bin/bash /home/rob/script/tmbatinfo
batinfo 497417 497415 0 /home/rob/script/batinfo
bash 497417 497415 0 /usr/bin/bash /home/rob/script/batinfo
sh 497416 2580 0 /bin/sh -c sysinfo
sysinfo 497416 2580 0 /home/rob/script/sysinfo
```
This outputs columns:
* `PCOMM`: some googling suggests this should be the _parent_ command name, but it seems to me to be the name of the
launched (child) command.
* `PID`: the PID of the launched process.
* `PPID`: the PID of the parent process.
* `RET`: I assume this is the return value of the syscall.
* `ARGS`: the arguments provided to the syscall, which at least in my experiments seems to include the file or
path of the command as well as the arguments.
Importantly, the `-x` flag can be passed to make `execsnoop` show failed syscall attempts as well as those that are
successful.
### Tracing open files
Similarly, `opensnoop` allows tracing of the `open` syscall which is used to open files.
I used to rely heavily on the pre-eBPF, dtrace-based implementation of `opensnoop` back in my Mac days, and it's nice
to discover that I can call upon it from Linux.
```bash
% sudop opensnoop
PID COMM FD ERR PATH
347357 ThreadPoolForeg 27 0 /home/rob/.cache/google-chrome/Default/Cache/Cache_Data/1b2c2ddd2d7ef7c5_0
347357 Chrome_ChildIOT 32 0 /dev/shm/.com.google.Chrome.ZkPQQO
347312 ThreadPoolForeg 103 0 /home/rob/.config/google-chrome/Default/.com.google.Chrome.lWY8pP
347312 ThreadPoolForeg 114 0 /dev/shm/.com.google.Chrome.aLGFXG
347312 ThreadPoolForeg 192 0 /proc/347426/stat
347312 ThreadPoolSingl 192 0 /proc/347426/task/347426/status
347312 chrome 192 0 /dev/shm/.com.google.Chrome.ddntL8
347312 ThreadPoolForeg 109 0 /proc/347426/stat
347312 ThreadPoolForeg 103 0 /proc/347426/stat
347312 ThreadPoolForeg 103 0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/proxy.js
347312 ThreadPoolForeg 103 0 /home/rob/.config/google-chrome/Default/Extensions/fmkadmapgofadopljbjfkapdkoienihi/6.0.1_0/build/fileFetcher.js
```
The columns are similar to `execsnoop`:
* `PID`: the PID of the process calling `open`.
* `COMM`: the name of the calling process.
* `FD`: this is the process-scoped Linux file descriptor which is opened.
* `ERR`: error code, 0 for success.
* `PATH`: the path of the file being opened, which may of course be a Linux virtual filesystem.
### Tracing outgoing TCP connections
With `tcpconnect` we can trace outgoing TCP connections (using the `connect` syscall) &mdash; in this case triggered by
a `curl https://google.com` from my local network address (`192.168.1.147`) to Google's server at `172.217.14.14:443`.
```bash
% sudop tcpconnect
Tracing connect ... Hit Ctrl-C to end
PID COMM IP SADDR DADDR DPORT
515773 curl 4 192.168.1.147 172.217.17.14 443
```
### Tracing incoming TCP connections
Simialrly, with `tcpaccept` we can trace incoming TCP connections (using the `accept` syscall). In this case, I used
Ruby to spun up an HTTP server on port 8001:
```bash
$ ruby -run -ehttpd . -p8001
```
And then again used `curl` to make a request which was traced successfully:
```bash
% sudop tcpaccept
PID COMM IP RADDR RPORT LADDR LPORT
516996 ruby 6 ::1 41200 ::1 8001
```
## Conclusions
I am just starting to explore the world of eBPF but I'm excited to discover a suite of lightweight system tracing
utilities that I can see being a genuinely useful addition to my toolkit.
Once again, don't forget to check out the [eBPF website](https://ebpf.io) and [Brendan Gregg's
blog](https://brendangregg.com/ebpf.html) to dive deeper.