Suchakra: Analyzing KVM Hypercalls with eBPF Tracing

I still visit my research lab quite often. It’s always nice to be in the zone where boundaries of knowledge are being pushed further and excitement is in the air. I like this ritual as this is a place where one can discuss linux kernel code and philosophy of life all in a single sentence while we quietly sip our hipster coffee cups.

In the lab, Abder is working these days on some cool hypercall stuff. The exact details of his work are quite complex, but he calls his tool hypertrace. I think I am sold on the name already. Hypercalls are like syscalls, but instead of such calls going to a kernel, they are made from guest to the hypervisor with arguments and an identifier value (just like syscalls). The hypervisor then brokers the privileged operation for the guest. Pretty neat. Here is some kernel documentation on hypercalls.

So Abder steered one recent discussion towards internals of KVM and wanted to know the latency caused by a hypercall he was introducing for his experiments (as I said he is introducing new hypercall with an id – for example 42). His analysis usecase was quite specific – he wanted to trace the kvm_exit -> kvm_hypercall -> kvm_entry sequence to know the exact latency he was causing for a given duration. In addition the tracing overhead needs to be minimum and is for a short duration only. This is somewhat trivial. These 3 tracepoints are there in the kernel already and he could latch on to them. Essentially, he needs to look for exit_reason argument of the kvm_exit tracepoint and it should be a VMCALL (18), which would denote that a hypercall is coming up next. Then he could look at the next kvm_exit event and find the time-delta to get the latency. Even though it is possible by traditional tracing such as LTTng and Ftrace to record events, Abder was only interested in his specific hypercall (nr = 42) along with the kvm_exit that happened before (with exit_reason = 18) and kvm_entry after that. This is not straightforward. It’s not possible to do such specific tracing with traditional tracers at a high speed and low overhead. This means the selection of events should not just be a simple filter, but should be stateful. Just when Abder was about to embark on a journey of kprobes and kernel modules, I once more got the opportunity of being Morpheus and said “What if I told you…

The rest is history. (dramatic pause)

Jokes apart, here is a small demo of eBPF/BCC script that allows us to hook onto the 3 aforementioned tracepoints in the Linux kernel and conditionally record the trace events:

from __future__ import print_function
from bcc import BPF

# load BPF program
b = BPF(text="""
#define EXIT_REASON 18
BPF_HASH(start, u8, u8);

TRACEPOINT_PROBE(kvm, kvm_exit) {
    u8 e = EXIT_REASON;
    u8 one = 1;
    if (args->exit_reason == EXIT_REASON) {
        bpf_trace_printk("KVM_EXIT exit_reason : %dn", args->exit_reason);
        start.update(&e, &one);
    }
    return 0;
}

TRACEPOINT_PROBE(kvm, kvm_entry) {
    u8 e = EXIT_REASON;
    u8 zero = 0;
    u8 *s = start.lookup(&e);
    if (s != NULL && *s == 1) {
        bpf_trace_printk("KVM_ENTRY vcpu_id : %un", args->vcpu_id);
        start.update(&e, &zero);
    }
    return 0;
}

TRACEPOINT_PROBE(kvm, kvm_hypercall) {
    u8 e = EXIT_REASON;
    u8 zero = 0;
    u8 *s = start.lookup(&e);
    if (s != NULL && *s == 1) {
        bpf_trace_printk("HYPERCALL nr : %dn", args->nr);
    }
    return 0;
};
""")

# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "EVENT"))

# format output
while 1:
    try:
        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
    except ValueError:
        continue

    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))

The TRACEPOINT_PROBE() interface in BCC allows us to use static tracepoints in the kernel. For example, whenever a kvm_exit occurs in the kernel, the first probe is executed and it records the event if the exit reason was VMCALL. At the same time it updates a BPF hash map, which basically acts like a flag here for other events. I recommend you to check out Lesson 12 from the BCC Python Developer Tutorial if this seems interesting to you 🙂 In addition, perhaps the reference guide lists the most important C and Python APIs for BCC.

To test the above example out, we can introduce our own hypercall in the VM using this small test program :

#define do_hypercall(nr, p1, p2, p3, p4) 
__asm__ __volatile__(".byte 0x0F,0x01,0xC1n"::"a"(nr), 
    "b"(p1), 
    "c"(p2), 
    "d"(p3), 
    "S"(p4))

void main()
{
    do_hypercall(42, 0, 0, 0, 0);
}

While the BPF program is running, if we do a hypercall, we get the following output :

TIME(s)            COMM             PID    EVENT
1795.472072000     CPU 0/KVM        7288   KVM_EXIT exit_reason : 18
1795.472112000     CPU 0/KVM        7288   HYPERCALL nr : 42
1795.472119000     CPU 0/KVM        7288   KVM_ENTRY vcpu_id : 0

So we see how in a few minutes we could precisely gather only those events that were of interest to us – saving us from the hassle of setting up other traces or kernel modules. eBPF/BCC based analysis allowed us to conditionally trace only a certain subsection of events instead of the huge flow of events that we would have had to analyze offline. KVM  internals are like a dark dungeon and I feel as if I am embarking on a quest here. There are a lot more upcoming KVM related analysis we are doing with eBPF/BCC. Stay tuned for updates! If you find any more interesting usecases for eBPF in the meantime, let me know. I would love to try them out! As always, comments, corrections and criticisms are welcome.


Source From: fedoraplanet.org.
Original article title: Suchakra: Analyzing KVM Hypercalls with eBPF Tracing.
This full article can be read at: Suchakra: Analyzing KVM Hypercalls with eBPF Tracing.

Advertisement


Random Article You May Like

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*