- Original post can be found at NetBSD Blog
For GSoC 2018, I’m working on the Kernel Undefined Behavior Sanitizer (KUBSAN) project for the integration of Undefined Behavior regression testing on the amd64 kernel.
This article summarizes what has been done up to this point (Phase 1 Evaluation), future goals and a brief introduction to Undefined Behavior.
First things first, let’s get started.
The mailing list project presentation
For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we have to run it and see.
Undefined Behavior in C is basically what the ANSI standard leaves unexplained. Code containing Undefined Behavior is ANSI C compatible. It follows all the rules explained in the standard and causes real trouble. In programming terms, it involves all the possible functionalities C code can run. It’s whatever the compiler doesn’t moan about, but when run it causes run-time bugs, hard to locate.
The C FAQ defines “Undefined Behavior” like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
- A great blog post explaining more than mere mortals might need
The important and scary thing to realize is that about any optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will and a significant part of their reason for existing is to expose secondary optimizations like the ones above.
What we can do to find undefined behavior errors in our code, is creating a Sanitizer.
Hopefully both CLang and GCC have taken care of such “dream” tools, covering the majority of undefined behavior cases in a meaningful manner.
They allow us to parse the -fsanitize=undefined option when we build our code and the compiler “spits out” simple warnings for us to see.
- The CLang supported flags (same as GCC’s but they don’t have such extensive explanation docs).
This was my first deliverable for the integration of KUBSan. The concept was to include tests causing simple C programs to portray Undefined Behavior, such as overflows, erroneous shifting and out of bounds accessing of arrays (VLAs actually).
The ATF framework is not a real “sweetheart” to learn, so it took me more than expected to complete this preliminary step to the project. The good news was that I had enough time to understand Undefined Behavior to a suave depth and make my extensive research for ideas.
- The initial commit of the tests cleaned up and submitted by my mentor Kamil Rytarowski.
Next on our roadmap was the understanding of NetBSD’s loadable kernel modules. For this, I created a kernel module parsing a string from a device named /dev/panic and calling the kernel panic(9) with it as argument, after syncing the system. This took a long time, but in the process I had the priviledge of reading
- FreeBSD Device Drivers: A Guide for the Intrepid, which is the single book in close resemblance to our kernel module infrastructure.
- The panic_string module commit revised, corrected and uploaded by Kamil.
Compiled the kernel with the proper option to catch UB bugs. We got one. Which was reported to the tech-kern mailing list in this Thread.
At last what was our last deliverable for GSoC’s first evaluation, was getting the amd64 kernel to boot with the KUBSan option enabled.
This was a trick. We needed the appropriate dummy functions, so we could use them as symbols in the linking process of a kernel build.
At first I created KUBSan as a loadable kernel module, but the chaotic structure of our codebase was to much for me. This means that I searched for 4 whole days a way to link the exported symbols to the kernel build and was unsuccessful :(
But everything happens for a reason, because that one failure ignited me to search for all the available UBSan implementations and I was able to locate the initial support of the KUBSan functionality for:
Which in turn, made me realise that the module was not necessary, since I could include the KUBSan functiuonality to our /sys infrastructure. Which I did and which was successful and which allowed me to boot and run a fully KUBSan-ed kernel.
It hasn’t been uploaded to upstream yet, but you can have a look at my local (and totally messy) fork.
This first month of GSoC has been a great experience. Last year I participated again with project trying to “revamp” support for Scheme R7RS in the Eclipse IDE (we later tried to create a Kawa-Scheme Language Server-LSP, but that’s a sad story) and my experience was not the best (I had to quit mid-July).
This time collaboration follows a much friendlier, cooperative and result-producing manner.
I’m incredibly happy about that.
A brief summary is that: the Kernel booted with KUBSan and I’m in knowledge of all the tools needed to extent that functionality.
That’s all ye need to know up to this point.
Future goals include:
- Making a full implementation of KUBSan, with an edge on surpassing other existing implementations,
- Clear up any license issues,
- Finish the amd64 implementation and switch focus to the i386,
- Spread the NetBSD hype
At last, I would like to deliver thanks to my mentors Kamil and Christos for their advices and help with the project, but most of all for their incredible behavior towards the problems I went through this past month.
Much love :)