There is an apocryphal story in math circles about a graduate student who writes a Ph.D. thesis in analysis. He has discovered an interesting new class of functions for which he was able to prove many very strong theorems. The exact class of functions seems to be lost to the mists of time. During his thesis defense, somebody asks him for an example of such a function; after much deliberation, it turns out the only such functions are boring old constants.
Presumably the defense does not end well.
True or not, the story does induce a certain amount of queasiness for anybody working in a theoretical field — or, like me, working on concrete tools built upon lots of theory.
Static analysis: trivial or impossible?
I came to GrammaTech along a nonstandard trajectory. My Ph.D. is in pure mathematics, with no whiff of a connection to program analysis. I studied geometric transformations of partial differential equations with Dr. Hubbard at Cornell. Hubbard is well-known for asking questions which cut away the cruft and get right to the heart of the matter (much to the horror of many a first-year graduate student attempting to give a presentation).
Our CEO got to experience this after my interview at GrammaTech, when he decided to call Hubbard as a reference. Hubbard didn’t know what I was applying for ahead of time, so he was a bit surprised:
Hubbard: So Matt is applying for a software engineering job? Software engineering doesn’t seem very interesting – after all, isn’t software development a completely trivial problem?
CEO: Well, no! There are many interesting problems to solve! For example, we write software to analyze other programs for bugs. That is…
Hubbard: What?! No. I don’t believe it for a second. By the halting problem, everybody knows that is completely impossible!
Mission… impossible?
So who is right? Is nontrivial static analysis really impossible?
In a sense, perfect static analysis really is impossible. This is codified in Rice’s Theorem, which says that for just about any interesting property of a program you can think of, it is not possible to make an algorithm which decides whether or not programs have that property. The usual proof is by reduction to the halting problem, but a simple diagonalization trick gives the right intuition.
For example, suppose we want to prove that a program never will crash by dereferencing NULL when given a certain input. Any static analysis company worth its salt would love to sell you a function never_dereferences_NULL(f, x) that can tell you if running the program f on the input x will result in a null pointer dereference or not.
All would be well and good until some wise guy inevitably comes along with this program:
def tricky(f):
if never_dereferences_NULL(f, f):
x = NULL;
y = *x; // oops, I dereferenced null!
return
Mr. Wise Guy sits down and asks our precious tool if his program has a null pointer dereference when run on its own source code:
<wiseguy@miskatonic ˜> never_dereferences_NULL(tricky, tricky)
What answer should he expect?
- If it returns True, then the tool is claiming that computing tricky(tricky) will not cause a null-pointer dereference. But then the if-statement in tricky(tricky) will get executed, causing a null-pointer dereference after all. Sounds like a contradiction.
- If it returns False, then the tool is claiming that computing tricky(tricky) will cause a null pointer dereference. But all tricky(tricky) is doing in that case is calling out to never_dereferences_NULL, which we have been told is 100% accurate (and presumably bug-free). A contradiction again — either the tool is lying, or it is buggy!
All of this demonstrates that there is no perfect never_dereferences_NULL tool out there waiting to be discovered by static analysis companies. So is Hubbard vindicated? Is GrammaTech about to vanish in a puff of mathematical paradox?
Open the escape hatch!
Luckily, there are is an escape hatch built into the contradiction governing Rice’s theorem.
Let us first take a step back and think about how we got into this mess: we posited the existence of a perfect tool which always correctly told us whether or not a program contained a null pointer dereference. That got us in trouble with the analysis of Mr. Wise Guy’s tricky(tricky) program.
But do we really need our tool to be able to analyze every conceivable program, even those monstrous eldritch things which would require more storage than there are atoms in the universe? What if we open the escape hatch and allow for the possibility that the tool may — sometimes, perhaps very infrequently — not always give the correct answer?
Once we weaken the guarantee that the program always be correct, then the contradiction evaporates. There doesn’t seem to be anything stopping us from writing an approximation to never_dereferences_NULL which is as accurate as you please, though we know it can never be perfect.
Granting that any static analysis tool must occasionally go wrong, we can classify the sort of mistakes it might make:
- False positive: Tool says there is a problem, but there really is not.
- False negative: Tool says there is not a problem, but there really is.
High-quality static analysis tools minimize both kinds of errors. Too many false positives mean that real bugs get lost among erroneous reports. It is harder to see the “too many false negative” scenario, but it is just as bad — it means that too many bugs are being missed.
Luckily, there doesn’t seem to be any sort of limit to how precise a static analysis tool is allowed to get. By designing clever new techniques, static analysis tools can continue to find more hidden bugs (reducing false negatives) and simultaneously reduce the noise of false positives. We know (with proof!) that we can never achieve perfection, but we can get close. Call it the wabi-sabi of static analysis.