Tracepoints

Note

Full code for the example in this chapter is available on GitHub.

What are Tracepoints in eBPF?

Tracepoints are static probing points inserted at specific locations in the Linux kernel source code. They provide a stable and efficient way to monitor kernel events without the overhead of dynamic tracing methods like kprobes.

Common tracepoint categories include:

System calls (syscalls)
Scheduler events (sched)
Network events (net)
File system events (vfs)

The available events are listed under /sys/kernel/tracing/events or can be enumerated with bpftrace, e.g.:

sudo bpftrace -l 'tracepoint:syscalls:sys_enter_*'

Example project

Let’s create a tracepoint program that monitors system calls using the sys_enter_execve tracepoint, which fires when a new process is executed.

Design

We’re going to:

Create a tracepoint program that attaches to sys_enter_execve
Use a per-CPU array buffer to read the filename from userspace
Log the command name and filename being executed

eBPF code

The eBPF program will read the filename from the tracepoint context and log information about the execve system call:

#![no_std]
#![no_main]
// TODO(https://github.com/rust-lang/rust/issues/139984): remove.
#![feature(cstr_display)]

use aya_ebpf::{
    EbpfContext,
    helpers::bpf_probe_read_user_str_bytes,
    macros::{map, tracepoint},
    maps::PerCpuArray,
    programs::TracePointContext,
};
use aya_log_ebpf::info;

#[repr(C)]
pub struct Buf {
    pub buf: [u8; 4096],
}

#[map]
pub static FILENAME_BUF: PerCpuArray<Buf> = PerCpuArray::with_max_entries(1, 0);

#[tracepoint]
pub fn tracepoint_execve(ctx: TracePointContext) -> u32 {
    match try_tracepoint_execve(ctx) {
        Ok(ret) => ret,
        Err(ret) => ret as u32,
    }
}

fn try_tracepoint_execve(ctx: TracePointContext) -> Result<u32, i32> {
    // To get the offset, see
    // /sys/kernel/debug/tracing/events/syscalls/sys_enter_execve/format
    const FILENAME_OFFSET: usize = 16;
    let filename: *const u8 = unsafe { ctx.read_at(FILENAME_OFFSET)? };
    let Buf { buf } = unsafe {
        let ptr = FILENAME_BUF.get_ptr_mut(0).ok_or(0)?;
        &mut *ptr
    };
    let filename = unsafe {
        core::str::from_utf8_unchecked(bpf_probe_read_user_str_bytes(
            filename, buf,
        )?)
    };
    let command = ctx.command()?;
    let command = core::ffi::CStr::from_bytes_until_nul(&command)
        .map_err(|core::ffi::FromBytesUntilNulError { .. }| -1)?;
    // We have no reasonable way to log `CStr` today.
    let command = unsafe { core::str::from_utf8_unchecked(command.to_bytes()) };
    info!(
        &ctx,
        "Tracepoint sys_enter_execve called by: {}, filename: {}",
        command,
        filename
    );

    Ok(0)
}

#[cfg(not(test))]
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[unsafe(link_section = "license")]
#[unsafe(no_mangle)]
static LICENSE: [u8; 13] = *b"Dual MIT/GPL\0";

Key points in the eBPF code:

Per-CPU Buffer: We use PerCpuArray<Buf> to store the filename string, as the amount of available stack size is very limited.
Context Reading: The filename is read from a specific offset in the tracepoint context (offset 16 for sys_enter_execve). The format of the context can be extracted from the file /sys/kernel/debug/tracing/events/syscalls/sys_enter_execve/format.
Userspace Memory: We use bpf_probe_read_user_str_bytes to safely read the filename string from userspace memory
Logging: We log both the command name and filename using the info! macro

Userspace code

The userspace code loads the eBPF program and attaches it to the tracepoint:

use aya::programs::TracePoint;
use env_logger::Env;
#[rustfmt::skip]
use log::{info, debug, warn};
use tokio::signal;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    env_logger::Builder::from_env(Env::default().default_filter_or("info"))
        .init();

    // Bump the memlock rlimit. This is needed for older kernels that don't use the
    // new memcg based accounting, see https://lwn.net/Articles/837122/
    let rlim = libc::rlimit {
        rlim_cur: libc::RLIM_INFINITY,
        rlim_max: libc::RLIM_INFINITY,
    };
    let ret = unsafe { libc::setrlimit(libc::RLIMIT_MEMLOCK, &rlim) };
    if ret != 0 {
        debug!("remove limit on locked memory failed, ret is: {ret}");
    }

    // This will include your eBPF object file as raw bytes at compile-time and load it at
    // runtime. This approach is recommended for most real-world use cases. If you would
    // like to specify the eBPF program at runtime rather than at compile-time, you can
    // reach for `Bpf::load_file` instead.
    let mut ebpf = aya::Ebpf::load(aya::include_bytes_aligned!(concat!(
        env!("OUT_DIR"),
        "/tracepoint"
    )))?;
    match aya_log::EbpfLogger::init(&mut ebpf) {
        Err(e) => {
            // This can happen if you remove all log statements from your eBPF program.
            warn!("failed to initialize eBPF logger: {e}");
        }
        Ok(logger) => {
            let mut logger = tokio::io::unix::AsyncFd::with_interest(
                logger,
                tokio::io::Interest::READABLE,
            )?;
            tokio::task::spawn(async move {
                loop {
                    let mut guard = logger.readable_mut().await.unwrap();
                    guard.get_inner_mut().flush();
                    guard.clear_ready();
                }
            });
        }
    }
    let program: &mut TracePoint =
        ebpf.program_mut("tracepoint_execve").unwrap().try_into()?;
    program.load()?;
    program.attach("syscalls", "sys_enter_execve")?;

    let ctrl_c = signal::ctrl_c();
    info!("Waiting for Ctrl-C...");
    ctrl_c.await?;
    info!("Exiting...");

    Ok(())
}

Steps in the userspace code:

Memory Limit: Remove the memlock limit for older kernels
Load Program: Load the compiled eBPF object file
Logger Setup: Initialize the eBPF logger to receive log messages
Attach Tracepoint: Attach the program to the syscalls/sys_enter_execve tracepoint
Signal Handling: Wait for Ctrl-C to exit gracefully

Running the program

$ cargo run
[INFO  tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/git
[INFO  tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/wc
[INFO  tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/tail
[INFO  tracepoint] Tracepoint sys_enter_execve called by: zsh, filename: /usr/bin/ls

The program will now log every new process execution on the system, showing which command started the process and what binary is being executed.

Keyboard shortcuts