Cgroup SKB

Source Code

Full code for the example in this chapter is available here

What is Cgroup SKB?

Cgroup SKB programs are attached to v2 cgroups and get triggered by network traffic (egress or ingress) associated with processes inside the given cgroup. They allow to intercept and filter the traffic associated with particular cgroups (and therefore - containers).

What's the difference between Cgroup SKB and Classifiers?

Both Cgroup SKB and Classifiers receive the same type of context - SkBuffContext.

The difference is that Classifiers are attached to the network interface.

Example project

This example will be similar to the Classifier example - a program which allows the dropping of egress traffic, but for the specific cgroup.

Design

We're going to:

Create a HashMap that will act as a blocklist.
Check the destination IP address from the packet against the HashMap to make a policy decision (pass or drop).
Add entries to the blocklist from userspace.

Generating bindings to vmlinux.h

In this example, we are going to use one kernel structure called iphdr, which represents the IP protocol header. We need to generate Rust bindings to it.

First, we must make sure that bindgen is installed.

cargo install bindgen-cli

Let's use xtask to automate the process of generating bindings so we can easily reproduce it in the future by adding the following code:

xtask/src/codegen.rsxtask/Cargo.tomlxtask/src/main.rs

use aya_tool::generate::InputFile;
use std::{fs::File, io::Write, path::PathBuf};

pub fn generate() -> Result<(), anyhow::Error> {
    let dir = PathBuf::from("cgroup-skb-egress-ebpf/src");
    let names: Vec<&str> = vec!["iphdr"];
    let bindings = aya_tool::generate(
        InputFile::Btf(PathBuf::from("/sys/kernel/btf/vmlinux")),
        &names,
        &[],
    )?;
    // Write the bindings to the $OUT_DIR/bindings.rs file.
    let mut out = File::create(dir.join("bindings.rs"))?;
    write!(out, "{bindings}")?;
    Ok(())
}

[package]
name = "xtask"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1"
clap = { version = "4.1", features = ["derive"] }
aya-tool = { git = "https://github.com/aya-rs/aya" }

mod build_ebpf;
mod codegen;
mod run;

use std::process::exit;

use clap::Parser;

#[derive(Debug, Parser)]
pub struct Options {
    #[clap(subcommand)]
    command: Command,
}

#[derive(Debug, Parser)]
enum Command {
    BuildEbpf(build_ebpf::Options),
    Run(run::Options),
    Codegen,
}

fn main() {
    let opts = Options::parse();

    use Command::*;
    let ret = match opts.command {
        BuildEbpf(opts) => build_ebpf::build_ebpf(opts),
        Run(opts) => run::run(opts),
        Codegen => codegen::generate(),
    };

    if let Err(e) = ret {
        eprintln!("{e:#}");
        exit(1);
    }
}

Once we've generated our file using cargo xtask codegen from the root of the project, we can access it by including mod bindings from eBPF code.

eBPF code

The program is going to start with a definition of BLOCKLIST map. To enforce the police, the program is going to lookup the destination IP address in that map. If the map entry for that address exists, we are going to drop the packet by returning 0. Otherwise, we are going to accept it by returning 1.

Here's how the eBPF code looks like:

cgroup-skb-egress-ebpf/src/main.rs
#![no_std]
#![no_main]

use aya_ebpf::{
    macros::{cgroup_skb, map},
    maps::{HashMap, PerfEventArray},
    programs::SkBuffContext,
};
use memoffset::offset_of;

use cgroup_skb_egress_common::PacketLog;

#[allow(non_upper_case_globals)]
#[allow(non_snake_case)]
#[allow(non_camel_case_types)]
#[allow(dead_code)]
mod bindings;
use bindings::iphdr;

#[map]
static EVENTS: PerfEventArray<PacketLog> =
    PerfEventArray::with_max_entries(1024, 0);

#[map] // (1)
static BLOCKLIST: HashMap<u32, u32> = HashMap::with_max_entries(1024, 0);

#[cgroup_skb]
pub fn cgroup_skb_egress(ctx: SkBuffContext) -> i32 {
    match { try_cgroup_skb_egress(ctx) } {
        Ok(ret) => ret,
        Err(_) => 0,
    }
}

// (2)
fn block_ip(address: u32) -> bool {
    unsafe { BLOCKLIST.get(&address).is_some() }
}

fn try_cgroup_skb_egress(ctx: SkBuffContext) -> Result<i32, i64> {
    let protocol = unsafe { (*ctx.skb.skb).protocol };
    if protocol != ETH_P_IP {
        return Ok(1);
    }

    let destination = u32::from_be(ctx.load(offset_of!(iphdr, daddr))?);

    // (3)
    let action = if block_ip(destination) { 0 } else { 1 };

    let log_entry = PacketLog {
        ipv4_address: destination,
        action: action,
    };
    EVENTS.output(&ctx, &log_entry, 0);
    Ok(action)
}

const ETH_P_IP: u32 = 8;

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::hint::unreachable_unchecked() }
}

Create our map.
Check if we should allow or deny our packet.
Return the correct action.

Userspace code

The purpose of the userspace code is to load the eBPF program, attach it to the cgroup and then populate the map with an address to block.

In this example, we'll block all egress traffic going to 1.1.1.1.

Here's how the code looks like:

cgroup-skb-egress/src/main.rs
use std::net::Ipv4Addr;

use aya::{
    include_bytes_aligned,
    maps::{perf::AsyncPerfEventArray, HashMap},
    programs::{CgroupSkb, CgroupSkbAttachType},
    util::online_cpus,
    Bpf,
};
use bytes::BytesMut;
use clap::Parser;
use log::info;
use tokio::{signal, task};

use cgroup_skb_egress_common::PacketLog;

#[derive(Debug, Parser)]
struct Opt {
    #[clap(short, long, default_value = "/sys/fs/cgroup/unified")]
    cgroup_path: String,
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let opt = Opt::parse();

    env_logger::init();

    // This will include your eBPF object file as raw bytes at compile-time and load it at
    // runtime. This approach is recommended for most real-world use cases. If you would
    // like to specify the eBPF program at runtime rather than at compile-time, you can
    // reach for `Bpf::load_file` instead.
    #[cfg(debug_assertions)]
    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/debug/cgroup-skb-egress"
    ))?;
    #[cfg(not(debug_assertions))]
    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/release/cgroup-skb-egress"
    ))?;
    let program: &mut CgroupSkb =
        bpf.program_mut("cgroup_skb_egress").unwrap().try_into()?;
    let cgroup = std::fs::File::open(opt.cgroup_path)?;
    // (1)
    program.load()?;
    // (2)
    program.attach(cgroup, CgroupSkbAttachType::Egress)?;

    let mut blocklist: HashMap<_, u32, u32> =
        HashMap::try_from(bpf.map_mut("BLOCKLIST").unwrap())?;

    let block_addr: u32 = Ipv4Addr::new(1, 1, 1, 1).try_into()?;

    // (3)
    blocklist.insert(block_addr, 0, 0)?;

    let mut perf_array =
        AsyncPerfEventArray::try_from(bpf.take_map("EVENTS").unwrap())?;

    for cpu_id in online_cpus()? {
        let mut buf = perf_array.open(cpu_id, None)?;

        task::spawn(async move {
            let mut buffers = (0..10)
                .map(|_| BytesMut::with_capacity(1024))
                .collect::<Vec<_>>();

            loop {
                let events = buf.read_events(&mut buffers).await.unwrap();
                for buf in buffers.iter_mut().take(events.read) {
                    let ptr = buf.as_ptr() as *const PacketLog;
                    let data = unsafe { ptr.read_unaligned() };
                    let src_addr = Ipv4Addr::from(data.ipv4_address);
                    info!("LOG: DST {}, ACTION {}", src_addr, data.action);
                }
            }
        });
    }

    info!("Waiting for Ctrl-C...");
    signal::ctrl_c().await?;
    info!("Exiting...");

    Ok(())
}

Loading the eBPF program.
Attaching it to the given cgroup.
Populating the map with remote IP addresses which we want to prevent the egress traffic to.

The third thing is done with getting a reference to the BLOCKLIST map and calling blocklist.insert. Using IPv4Addr type in Rust will let us to read the human-readable representation of IP address and convert it to u32, which is an appropriate type to use in eBPF maps.

Testing the program

First, check where cgroups v2 are mounted:

$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

The most common locations are either /sys/fs/cgroup or /sys/fs/cgroup/unified.

Inside that location, we need to create our new cgroup (as root):

# mkdir /sys/fs/cgroup/foo

Then run the program with:

RUST_LOG=info cargo xtask run

And then, in a separate terminal, as root, try to access 1.1.1.1:

# bash -c "echo \$$ >> /sys/fs/cgroup/foo/cgroup.procs && curl 1.1.1.1"

That command should hang and the logs of our program should look like:

LOG: DST 1.1.1.1, ACTION 0
LOG: DST 1.1.1.1, ACTION 0

On the other hand, accessing any other address should be successful, for example:

# bash -c "echo \$$ >> /sys/fs/cgroup/foo/cgroup.procs && curl google.com"
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

And should result in the following logs:

LOG: DST 192.168.88.10, ACTION 1
LOG: DST 192.168.88.10, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1