Tracepoints
Note
Full code for the example in this chapter is available on GitHub.
What are Tracepoints in eBPF?
Tracepoints are static probing points inserted at specific locations in the Linux kernel source code. They provide a stable and efficient way to monitor kernel events without the overhead of dynamic tracing methods like kprobes.
Common tracepoint categories include:
- System calls (
syscalls) - Scheduler events (
sched) - Network events (
net) - File system events (
vfs)
The available events are listed under /sys/kernel/tracing/events or can be
enumerated with bpftrace, e.g.:
sudo bpftrace -l 'tracepoint:syscalls:sys_enter_*'
Example project
Let’s create a tracepoint program that monitors system calls using the
sys_enter_execve tracepoint, which fires when a new process is executed.
Design
We’re going to:
- Create a tracepoint program that attaches to
sys_enter_execve - Use a per-CPU array buffer to read the filename from userspace
- Log the command name and filename being executed
eBPF code
The eBPF program will read the filename from the tracepoint context and log information about the execve system call:
#![no_std]
#![no_main]
// TODO(https://github.com/rust-lang/rust/issues/139984): remove.
#![feature(cstr_display)]
use aya_ebpf::{
EbpfContext,
helpers::bpf_probe_read_user_str_bytes,
macros::{map, tracepoint},
maps::PerCpuArray,
programs::TracePointContext,
};
use aya_log_ebpf::info;
#[repr(C)]
pub struct Buf {
pub buf: [u8; 4096],
}
#[map]
pub static FILENAME_BUF: PerCpuArray<Buf> = PerCpuArray::with_max_entries(1, 0);
#[tracepoint]
pub fn tracepoint_execve(ctx: TracePointContext) -> u32 {
match try_tracepoint_execve(ctx) {
Ok(ret) => ret,
Err(ret) => ret as u32,
}
}
fn try_tracepoint_execve(ctx: TracePointContext) -> Result<u32, i32> {
// To get the offset, see
// /sys/kernel/debug/tracing/events/syscalls/sys_enter_execve/format
const FILENAME_OFFSET: usize = 16;
let filename: *const u8 = unsafe { ctx.read_at(FILENAME_OFFSET)? };
let Buf { buf } = unsafe {
let ptr = FILENAME_BUF.get_ptr_mut(0).ok_or(0)?;
&mut *ptr
};
let filename = unsafe {
core::str::from_utf8_unchecked(bpf_probe_read_user_str_bytes(
filename, buf,
)?)
};
let command = ctx.command()?;
let command = core::ffi::CStr::from_bytes_until_nul(&command)
.map_err(|core::ffi::FromBytesUntilNulError { .. }| -1)?;
// We have no reasonable way to log `CStr` today.
let command = unsafe { core::str::from_utf8_unchecked(command.to_bytes()) };
info!(
&ctx,
"Tracepoint sys_enter_execve called by: {}, filename: {}",
command,
filename
);
Ok(0)
}
#[cfg(not(test))]
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
#[unsafe(link_section = "license")]
#[unsafe(no_mangle)]
static LICENSE: [u8; 13] = *b"Dual MIT/GPL\0";
Key points in the eBPF code:
- Per-CPU Buffer: We use
PerCpuArray<Buf>to store the filename string, as the amount of available stack size is very limited. - Context Reading: The filename is read from a specific offset in the
tracepoint context (offset 16 for
sys_enter_execve). The format of the context can be extracted from the file/sys/kernel/debug/tracing/events/syscalls/sys_enter_execve/format. - Userspace Memory: We use
bpf_probe_read_user_str_bytesto safely read the filename string from userspace memory - Logging: We log both the command name and filename using the
info!macro
Userspace code
The userspace code loads the eBPF program and attaches it to the tracepoint:
use aya::programs::TracePoint;
use env_logger::Env;
#[rustfmt::skip]
use log::{info, debug, warn};
use tokio::signal;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
env_logger::Builder::from_env(Env::default().default_filter_or("info"))
.init();
// Bump the memlock rlimit. This is needed for older kernels that don't use the
// new memcg based accounting, see https://lwn.net/Articles/837122/
let rlim = libc::rlimit {
rlim_cur: libc::RLIM_INFINITY,
rlim_max: libc::RLIM_INFINITY,
};
let ret = unsafe { libc::setrlimit(libc::RLIMIT_MEMLOCK, &rlim) };
if ret != 0 {
debug!("remove limit on locked memory failed, ret is: {ret}");
}
// This will include your eBPF object file as raw bytes at compile-time and load it at
// runtime. This approach is recommended for most real-world use cases. If you would
// like to specify the eBPF program at runtime rather than at compile-time, you can
// reach for `Bpf::load_file` instead.
let mut ebpf = aya::Ebpf::load(aya::include_bytes_aligned!(concat!(
env!("OUT_DIR"),
"/tracepoint"
)))?;
match aya_log::EbpfLogger::init(&mut ebpf) {
Err(e) => {
// This can happen if you remove all log statements from your eBPF program.
warn!("failed to initialize eBPF logger: {e}");
}
Ok(logger) => {
let mut logger = tokio::io::unix::AsyncFd::with_interest(
logger,
tokio::io::Interest::READABLE,
)?;
tokio::task::spawn(async move {
loop {
let mut guard = logger.readable_mut().await.unwrap();
guard.get_inner_mut().flush();
guard.clear_ready();
}
});
}
}
let program: &mut TracePoint =
ebpf.program_mut("tracepoint_execve").unwrap().try_into()?;
program.load()?;
program.attach("syscalls", "sys_enter_execve")?;
let ctrl_c = signal::ctrl_c();
info!("Waiting for Ctrl-C...");
ctrl_c.await?;
info!("Exiting...");
Ok(())
}
Steps in the userspace code:
- Memory Limit: Remove the memlock limit for older kernels
- Load Program: Load the compiled eBPF object file
- Logger Setup: Initialize the eBPF logger to receive log messages
- Attach Tracepoint: Attach the program to the
syscalls/sys_enter_execvetracepoint - Signal Handling: Wait for Ctrl-C to exit gracefully
Running the program
$ cargo run
[INFO tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/git
[INFO tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/wc
[INFO tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/tail
[INFO tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/ls
The program will now log every new process execution on the system, showing which command started the process and what binary is being executed.