Oops! I Fixed the Linux Kernel

Linux development

When Linux crashes, users don’t get a Blue Screen like they do on Windows. Instead, Linux generates an “oops” — a crash signature that can help developers to figure out what went wrong.

The feature may have a silly name, but it’s increasingly serious business.

Keeping track of the “oopses” is the duty of the Kerneloops.org project, and according to supporters, its efforts have improved kernel quality and fixed a large number of bugs — a thrust that’s critical for Linux as it angles for even greater adoption in the enterprise and elsewhere.

“Linux calls it ‘oops,’ but it’s basically equivalent to a Windows ‘Blue Screen,'” Arjan van de Ven, of Intel’s Open Source Technology Center, told InternetNews.com “It’s kind of the same thing in terms of what causes it and what it does, except we don’t make it blue — we just print the message.”

Van de Ven runs the Kerneloops.org project site himself, although the collection mechanisms of oops detection and reporting are mostly automated. Kerneloops chiefly collects oops records from a client installation that is available to Fedora, OpenSUSE and Debian users.

Such features are growing every more useful as the market for Linux grows and the OS continues finding its way into the hands of non-technical business and consumer users. Red Hat’s Fedora Linux includes the oops client by default, for instance.

“A lot of people don’t know how to detect a problem, what to send, or where to send it too,” Van de Ven explained. “If you can just click a button, it’s so much easier and people are more inclined to do it.”

Fedora Project leader Paul Frields told InternetNews.com that the Kerneloops package automatically delivers the messages the kernel dumps into a repository the kernel maintainers can use to prioritize, diagnose and fix problems.

“Fedora is involved because we track the kernel very aggressively,” Frields said. “The Kerneloops capability also supports our dedication to a healthy cooperation with upstream software providers like the kernel developer community. It leverages the widespread use of Fedora for the direct benefit of that community, who can see measurable results of their work and shift resources as needed to target frequent or important issues.”

Kerneloops also collects records from the Linux Kernel Mailing List (LKML) — the key technical discussion list for kernel bugs and design. The project also sends out a list of the top problems to the LKML on a periodic basis.

Van de Ven noted that if there is enough data about a specific problem, kernel developers often tend to go after it to fix it.

“In general, kernel developers are open to Kerneloops, since the more reports I have, the more data they have,” Van de Ven said. “If you have one report, it could just be a fluke, but if you’ve got 500 reports of the same pattern, you know it’s a real bug.”

As a result, Van de Ven sees Linux developers fixing bugs thanks to those reports — thereby making an impact on overall kernel quality. The exact numbers are difficult to quantify, however, as the number of reports that Kerneloops.org gets on any given kernel release varies, as does the occurrence of repeating oops reports.

“We are fixing the bugs that have a lot of people hitting them,” Van de Ven said. “If you look at the number of unique bugs, the numbers can be confusing. For the 2.6.25 kernel, there were 1,300 bugs, only half of which only happened once. We do fix a lot of bugs, but if you look at what we fix, it’s the ones that actually matter.”

On the current 2.6.27 Linux kernel release candidate, Van de Ven is already seeing some early trends on the top oopses. The big one right now involves a problem when a USB drive is removed while still in use, he said — a condition currently responsible for up to five of the top 20 oopses.

“At this point, it’s the hottest bug,” he said. “In a few months from now it, might be something different.”

The effort to improve overall Linux kernel quality has increasingly found itself in the limelight. Recent efforts by the Linux Foundation aimed to simplify the task of contributing to the kernel in the hopes of improving the quality of driver code from vendors, among other things.

[cob:Special_Report]Still, not everyone is in agreement on how best to avoid buggy code — whether effort is best spent on ensuring quality code during development or in fixing bugs afterward.

“What Kerneloops is doing is wonderful in helping the upstream kernel developers track what is going wrong and what needs to be fixed,” Greg Kroah-Hartman, a Novell fellow in the company’s Open Platform Solutions told InternetNews.com. “Of course, more users testing the nightly kernel releases of the main kernel.org tree is very helpful in finding problems that happen during the development cycle. But that being said, the Linux kernel has a very good record of quality and stability over time.”

Van de Ven, meanwhile, said he’d prefer to see more people working on fixing bugs.

At the same time, the Linux Foundation takes a wider view — looking at why the bugs occur in the first place.

“It’s important to track bugs and get visibility into what is causing them,” Amanda McPherson, vice president of marketing and developer programs at the Linux Foundation, told InternetNews.com. “But it’s also just one arrow in our quality-improvement quiver.”

McPherson argued that preventing buggy code from finding its way into Linux must also be a key initiative for kernel development. She noted that a lot of focus on quality, particularly efforts by kernel maintainers Andrew Morton and Linus Torvalds, has been around preventing bugs from entering in the first place.

She also said the Foundation is looking at other ways of encouraging better-quality code.

“Our technical advisory board and Andrew Morton recently starting tracking the “Reported-by” tag in Git,” she said, referring to the kernel’s version control system. “We plan on publishing a paper in the future that ranks the greatest bug finders and fixers by name.”

“So just like we recently wrote a ‘who writes the Linux kernel’ paper that tracked contributions, we’ll also have a paper that calls out great achievements in quality improvements.”

News Around the Web