I had the privilege of attending the European Dependable Computing Conference (EDCC 2014) this month in Newcastle upon Tyne, in which GrammaTech co-authored a paper with our University of Virginia collaborators.
One standout speaker was Ian Phillips, Principle Staff Engineer at ARM. His full talk, Where Did All The Errors Go?, is certainly worth watching. In a startling moment near the end of the presentation, he concludes that reusing unreliable commercial components in dependable systems is inevitable, and we need to “get over it” — in both senses of the phrase. In addition to being highly relevant in the context of modern cyber-security, it was a spirited and accessible pep talk for the engineering profession.
Although my camera sadly misbehaved in Newcastle, I was able to capture Edinburgh Castle, 90 minutes north by rail.
Of the many technical papers presented, one that I found especially interesting was On the Soundness of Silence: Investigating Silent Failures Using Fault Injection Experiments by Erik van der Kouwe, Cristiano Giuffrida, and Andrew S. Tanenbaum. Erik presented how he used fault injection experiments to investigate “silent” failures, defined as deviant program behavior which does not result in a fail-stop condition (program hang, signal, or error code).
To summarize, they used a fault-injection front-end (EDFI) to inject a variety of faults into a collection of base programs exercised by test suites. They used ptrace to intercept system calls and remove certain sources of nondeterminism between “golden” runs and fault-injected runs, and a matching strategy to detect true deviance in behavior between the runs. This turned out to be successful for the base programs in their test suite (which included an editor, shells, compression utilities, and webservers), and the overall finding was that 13.8% of activated faults resulted in silent failures, showing that the fail-stop assumption is unsound.
Of course, it’s no surprise that silent failures do happen, but this paper made a quantitative, methodical contribution to characterizing silent failures, which was particularly interesting to me, as a software engineer at GrammaTech. Our static analysis software helps reveal the subtle silent errors that may not readily manifest in test suites, and we also are in the midst of research in automated test generation, where silent failures are directly important.
The famous Gateshead Millennium Bridge over Tyne to Newcastle, a remarkable pedestrian suspension tilt bridge. I did not catch it in the act of tilting… so this photo credit goes to Wikimedia.
I learned much from the many papers presented across a variety of disciplines, and am thankful to the EDCC organizers and participants for putting this successful event together, as well as to GrammaTech Research, for providing me with these extraordinary opportunities to learn from great minds around the world. If you ever get a chance to visit Northumberland, or EDCC 2016 in Paris, definitely go!