This is the fourth in a series of posts about adding additional language bindings for the CodeSonar API.
[Read the first part | second part | third part | fourth part | fifth part]
Example #4
Create and use AST pattern that matches “x + K” for any integer literal K. Print K.
// C cs_ast_pattern *pat; cs_result r; const char *err, *err_location; cs_ast_binding bindings[1]; r = cs_ast_pattern_compile( "(c:+ :2 (c:integer-value :value ?K))", &pat, &err, &err_location ); if( r != CS_SUCCESS ) { fprintf( stderr, "oops: %s near `%s'n", err, err_location ); fflush(NULL); abort(); } r = cs_ast_match( node, pat, bindings, sizeof(bindings), &bn ); switch( r ) { case CS_SUCCESS: if( bn == sizeof(bindings) && bindings[0].f.type == csft_int32 ) printf( "Matched; K=%dn", bindings[0].f._.int32 ); else printf( "Type not int32?n" ); break; case CS_ELEMENT_NOT_PRESENT: printf( "No match!n" ); break; default: abort(); } |
// C++ // Make it static so we only compile it once. Could be in top-level scope. static ast_pattern pat("(c:+ :2 (c:integer-value :value ?K))"); ast_bindings bs = pat.match_with_bindings(node); if( !bs ) cout << "No match!" << endl; else cout << "Matched; K=" << bs["K"].as_int32() << endl; |
# Python pat = ast_pattern("(c:+ :2 (c:integer-value :value ?K))") bs = pat.match (node) if bs: print "Matched; K=", bs["K"] else: print "No match!" |
|
// Java ast_pattern pat("(c:+ :2 (c:integer-value :value ?K))"); ast_bindings bs = pat.match_with_bindings(node); if( !bs.matched() ) System.out.println("No match!"); else System.out.println("Matched; K=" + bs["K"].as_int32()); |
Fuzzing
Testing an API comprised of over 1000 methods manually would be a time consuming task, so I’ve created a fuzzer to test the interface with a variety of inputs.
Fuzzing is the art of automatically producing semi-random inputs in order to test a piece of software. The python API in particular is highly amenable to fuzzing because reflection is easy in python. In order to test things, I produced a python plugin for CodeSonar that uses reflection to discover the API, figure out all the method signatures (including overloads), and then start hammering them until all had been exercised and produced non-empty, non-exceptional results. It usually takes somewhere between 100 and 500 calls to every function in the API to achieve this coverage, which is often less than a minute. The longer it runs, the more inputs it tests.
The fuzzer manages to exercise the python, C++, and C APIs since the python API wraps the C++ API which wraps the C API. It does not test the Java or C# APIs. I have manually tested portions of these APIs that were likely to have language-specific problems, and count on SWIG to be consistent for the rest of them.
How it works
The fuzzer starts by invoking python’s dir builtin on the module and each class. It then attempts to invoke every function, and screen scrapes the raised exceptions to determine the parameter types. The fuzzer maintains pools of objects of every type. At first, only the pools for primitive types such as int and str are non-empty. The fuzzer invokes functions for which it has instances of the required inputs. Each time a method returns a new instance of a type, it is potentially added to a pool. Pool sizes are capped, and old elements are aged out. Each time an instance of a type is needed, it is selected semi-randomly from the appropriate pool. Composite types such as pairs and vectors are assembled by repeatedly grabbing leaf types.
Bugs!
The fuzzer discovered many bugs.
The vast majority were with the new C++ code:
- Places where templates had not been instantiated for SWIG
- Places where templates had not been instantiated in the correct order for SWIG
- Various problems lifting exceptions and directors
- gcc 4.4.1 and 4.2.1 (Apple variant) miscompiled ((size_t)inner) < ((size_t)7) as if it were ((size_t)inner) != ((size_t)7) in some contexts
- Validation errors
- Some corner cases with iterators
- A few silly errors, like using the wrong enum value here and there
Latent bugs in the old C API:
- various corner cases where, for example, passing MAX_INT to functions did not end well
- some performance explosions on certain inputs
- a serious error in CodeSurfer’s chop algorithm that posed problems for both correctness and stability, but did not affect CodeSonar
- Requesting line padding on whole-analysis warnings that do not correspond to any specific source code caused a crash
Many fatal errors were made less severe in order that API users will no longer cause the analysis to terminate, or discard blocks of warnings:
- Warning class names over a kilobyte long caused errors when postgres attempted to index them
- Invalid XML in certain strings that claimed to be valid XML had overly draconian consequences
- Retracting the same warning multiple times had overly draconian consequences
- Various CodeSurfer-only functions now fail softly if invoked in CodeSonar
- Reporting warnings using non-sensical CFG paths, such as a sequence of the same exit vertex over and over, caused assertion failures.
The fuzzer now runs as part of our build system’s “check” target every time someone makes a commit. Since its introduction, it has caught regressions, portability issues, and many bugs with newly introduced APIs.
Continue to the final post, about documenting the new APIs.