February 18, 2005

I’ve been thinking about static analysis for finding bugs off and on for the past 18 months or so; recently, I’ve been looking for a good open source static analysis tool. Unless I’ve managed to miss it, ISTM there isn’t one.

Uno is the closest I’ve found, but it is pretty unpolished, and I don’t believe it is DFSG free.

sparse, Linus’ checker, may or may not be cool; I’ve tried to see what it’s capable of, but wasn’t able to make it catch anything more significant than minor stylistic errors in C code (e.g. extern function definitions, 0 used in a pointer context (rather than NULL), that sort of thing). (Side note: sparse doesn’t even have a website, and it’s primarily available via bk. Does Linus not want people to use his software?) I’ll definitely take a closer look at this, anyway.

There are some more academic tools — like Uno only even less practical). There’s also Splint, but last I tried it, it emitted way too many bogus error reports, and required tons of annotations to be of any use.

Some random thoughts about the design of an open source static analysis tool:

  • A tool that hides a handful of legitimate error reports within thousands of bogus ones is essentially useless. Given the choice, it is better to miss a few problems than to warn the user about everything that might be bogus — false positives are bad.
    • A reasonable substitute would be some effective means of sorting error reports by their likelyhood of legitimacy; if the tool generates thousands of bogus errors but places the legitimate errors at the top of the list, I’d be willing to live with it.
  • It ought to be easy to check an arbitrary base of code. That means understanding C99 plus all the GNU extensions, and providing an easy way to run the checker on some source (while getting header #includes right, running the necessary tools to generate derived files, and so on). Perhaps the easiest way to do that is have the checker mimick the standard $CC command-line arguments; then the user could run the checker via make CC=checker.
    • This also means no annotations. They are ugly, they tie the source into one specific analysis tool, and they are labour intensive; the whole point is to find bugs with the minimum of labour by the programmer.
  • It ought to be possible to write user-defined extensions to the checker, to check domain-specific properties of the source. I’ve got no problem with annotations in this context — that’s a sensible way to inform your checker extension about domain-specific properties.
  • The theory behind Dawson Engler’s work on MC is a good place to start; it is more or less the start of the art, AFAIK. Unfortunately the tool they developed was never released publicly (from what I’ve heard it was somewhat of a kludge anyway, implementation-wise), and Engler’s now commercialized the research at Coverity.
  • ckit might be worth using. Countless people have implemented C compiler frontends in the past, so it would be nice to avoid needing to reinvent that particular wheel.

Speaking of tools for finding bugs, I’ve got to find some time to make valgrind understand region-based memory allocation.


Leave a comment

Filed under Advogato, PostgreSQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s