Team TechX won second place in the DARPA Cyber Grand Challenge (CGC) last week. TechX is a collaboration between GrammaTech and the University of Virginia, and we are all very proud. Over the next few weeks, we will be writing about the technologies we leveraged and analyzing some of the results.
This Whole Capture-the-Flag Concept
To kick off, it seems useful to outline what people actually mean when they say that the competition is a “Capture The Flag contest.” I, for one, never played Capture The Flag (CTF) as a child. Probably this is because I grew up in New Zealand and spent most of my formative years fleeing orcs and zombie sheep instead. I’m sure many others are in similar positions, and many more did play in childhood but since forgotten how it worked. So let’s start with a brief overview of that game.
A classic CTF game out here in the physical world involves two teams. Each team has a ‘territory’ containing a ‘base’ in which they fly a flag; the winning team is the one that is first to obtain the other team’s flag and bring it back to base. Team members that are tagged in the other team’s territory are ‘jailed’ — though they may be subsequently freed by a mechanism that depends on local customs (for example, by being tagged by one of their teammates, or by waiting until an agreed time limit has elapsed). The game thus necessarily involves a heavy strategy component, including decisions about how many team members should be allocated to each activity (obtaining other flag, protecting own flag, freeing teammates from jail where applicable) and how those allocations should evolve in response to the other team’s behavior.
There are lots of variations of this game, evolving in response to factors like the local terrain and the interests of the competitors. If someone says “a CTF game with pumpkins instead of flags” or “a CTF game underwater,” that’s easy enough to picture. It’s not necessarily so obvious what “a CTF game with computer network services” could mean.
Capture-the-Flag… at a Laundromat?
Let’s think first about a physical-world service: say, a laundromat. Simplifying away issues like detergent, and scientists with freeze rays and frozen yogurt, a laundromat is a place where a person can turn up with some dirty clothes and a pocket full of quarters and leave an hour or so later with the same clothes in a clean state. A laundromat is not useful if your clothes don’t get clean, or if the process takes years, or if they impose too many additional requirements. Nobody wants to turn up to the laundromat and stand in a TSA-style line for the fun and personal fulfilment to be found inside, or to find out once in the door that the machines only take $2 bills.
What would a CTF-type game look like, transferred to a laundromat? It turns out that there are quite a lot of things that might be considered analogous to capturing a flag. You could break into an opponent’s machines and steal all their quarters. You could sneakily post a fake “closed” sign so all their customers go to your laundromat instead. If they live above the store, you might even be able to find a way through the connecting door to their home, whereupon you could break their chair and eat their porridge. Meanwhile, you would have to protect your own quarters/signs/porridge from reciprocal incursions.
Teams, Strategy, and a Scoring Rubric for Laundromats (but really also for CGC)
Choosing a team for this kind of contest is an art in itself. You might want a structural engineer to analyze the current state of the building and design improvements. A builder who can implement these plans. A bit of muscle to repel incursions. Someone with lots of experience in finding security holes: maybe from a criminal background, maybe just a puzzle fan, maybe both. Someone to hold everything together. The total? Could be one person, could be dozens.
For standardized scoring, a governing body might set up two identical laundromats and assign you one each. In fact, there’s no particular requirement that there be two: there can be N laundromats and N teams. These standard laundromats might include some deliberate security weaknesses, so that a team’s score can depend or partly depend on how many of those weaknesses they manage to find and leverage. Standardized laundromats have an interesting consequence: if you find a weakness in your own laundromat, you know that the other teams started out with that weakness too, and if they haven’t found it and fixed it yet you can use it to attack them.
The governing body can – of course – set up whatever scoring rules they want. Technically, they can set up whatever scoring rules I want, and since what I want is to illustrate what went on in CGC, we’ll look at the laundromat equivalents of those scoring metrics. There are three:
- Security: Finding and eliminating the vulnerabilities in your own laundromat.
- Evaluation: Finding and exploiting vulnerabilities in competing laundromats.
- Availability: Keeping your laundromat open and your customers satisfied.
A scoring rubric for laundromat Security might have questions like, “What percentage of attacks from competitors have been successful?” and “What percentage of the ‘reference attacks’ designed by the competition organizers would be successful?” The fewer successful attacks there have been against your laundromat, the higher your Security score.
Similarly, a scoring rubric for Evaluation might ask “How many other laundromats has this team successfully attacked?” For Evaluation, attacks against a larger number of opposition laundromats will give a higher score.
Meanwhile, scoring rubric for laundromat Availability might have rather more questions. Could I get in the door? Was I in there for a reasonable amount of time? Has my net worth been adjusted by the correct number of quarters? Do I have clean clothes? Are they MY clean clothes? Are they ALL of my clothes??
Answering “no” to one or more of these questions would indicate that something is wrong with laundromat operations, either because of a successful attack by another team or because the laundromat’s owners accidentally messed something up while upgrading their security.
“Could I get in the door?” is an especially interesting question because in order to prevent other teams from exploiting security breaches you have to fix those breaches, but frequently this entails closing the shop. Teams must carefully plan their security upgrades: too late and they may be successfully attacked (lowering their Security score); too early and they will lose Availability points even though other teams may never have even noticed the problem.
Following CGC, the overall score is computed as a product: Availability × Security × Evaluation.
This scoring system eliminates certain strategies altogether. You can’t win by just pulling down the security gate and eliminating access to your laundromat while you carry out attacks against everyone else, even if those attacks are wildly successful, because then Availability is always zero and so the overall score is also zero.
Making More Work for the Organizers
If this isn’t a challenging enough competition, we could graduate to assigning a whole standardized town to each competitor. This means more targets to attack and defend, and nonidentical targets at that. A strategy designed for a laundromat will not apply perfectly to a carwash (although it’s quite possibly better than nothing). A town isn’t just a jumble of services sitting in the countryside, either. It has infrastructure. Setting up an entire standardized town will involve building roads as well as laundromats; water lines as well as coffee shops; street lights as well as gas stations. A more exciting contest for competitors means more work (a lot more work) for the organizers.
And finally, we can transform this into a computer-based CTF game. The town becomes a computer; the town services become network services, such as web servers and email servers; the team members become experts in software analysis, security, and so forth. They might also take on names that include numbers as well as letters, but this is not strictly compulsory.
There is a long tradition of such games in the hacker community, with a number of variations, especially with respect to exactly what constitutes a ‘flag’ and a ‘capture.’ The annual DEF CON convention has a particularly famous CTF contest.
The DARPA Cyber Grand Challenge was about taking the next step along this path: a computerized CTF whose competitors were not – or not directly – teams of humans, but pieces of software. Each of the Cyber Reasoning Systems played all of the roles that the members of a human team would play: analyzing software components, designing and implementing security fixes for its own system, designing and implementing attacks against other systems, and strategizing.
DARPA and their collaborators had the enormous task of setting up the competition environment (town) and challenge problems (services), building in multiple layers of redundancy and some super-snazzy tools for monitoring what was going on without interfering.
And they did it with style:
Stay tuned to read about the relation of formal methods to CGC.