Our AI Journey at GrammaTech: Machine Learning, LLMs, and Beyond

At GrammaTech, AI has been an integral part of our journey in advancing state-of-the-art software analysis and security research. Our AI work spans a wide range of traditional AI/ML and statistical analysis-based methods, as well as more contemporary generative AI/LLM-based approaches. Our expertise in applying AI to complex software challenges has enabled us to stay at the forefront of research and innovation in this field.

Machine Learning for Binary Analysis

Binary code, the fundamental language of computers, is deceptively simple – just ones and zeros. But analyzing or reverse engineering binaries to extract meaningful information is a challenging task. When the source code of a program is not available – due to proprietary restrictions, legacy systems, or malicious intent – binaries are the only way to understand how the software operates. Analyzing binaries can help us uncover security vulnerabilities that may be exploited by attackers and identify code that behaves maliciously. Binary rewriting is a proven effective way to harden binaries against security vulnerabilities.

Unlike some other domains such as images and audio, executable binary code is highly structured to allow the machine to interpret and execute the code correctly. Researchers applying ML approaches to binary analysis must be cognizant of this structure, and ML approaches are often most useful in conjunction with static and dynamic analysis techniques than can extract useful features for evaluation and modeling – the kinds of techniques that have historically been core strengths for GrammaTech.

Decompilation, the process of recovering source code from a binary, is useful for generating source code that can assist in understanding code semantics. In this paper, we describe our novel decompilation technique using Recurrent Neural Networks (RNNs) to recover source code from binary code snippets. The technique enables generation of decompiled code that is more readable and closer to human-written code, addressing issues with mainstream decompilation methods that produce hard-to-understand code. To date, the GrammaTech method has been trained and evaluated on C source code snippets, but is language-agnostic and requires minimal domain knowledge, making it adaptable to various languages and use-cases.

Analyzing the composition of binary code and identifying the similarity between two pieces of code is an important but challenging problem with applications in many security-critical fields, such as reverse engineering software, detecting and patching vulnerabilities in third-party libraries, and malware clustering and attribution. GrammaTech developed Discover, a binary composition analysis tool to scan software binaries and find known libraries present in them in order to identify the presence of known vulnerable components. Discover works using a combination of lightweight binary analysis and machine learning, where a Siamese neural network is trained to produce high-dimensional function embeddings that have the property that semantically similar functions have a smaller distance to each other compared to semantically dissimilar functions. It has proven highly effective in commercial application.

Binary rewriters modify the behavior of binary executables without access to their source code, but evaluating them is challenging and time-consuming due to their varying capabilities and high computational costs. In this paper, we define a machine learning-based approach to predict the success of binary rewriters for x86-64 binaries, enabling faster and more informed decisions on their effectiveness. The research offers insights into learning algorithms, feature representations, and the generalizability of models for predicting rewriting success, aiding users, tool developers, and researchers in selecting appropriate rewriters and understanding their strengths and weaknesses.

GrammaTech also has ongoing work that extracts features for malware classification of Windows binaries using binary analysis. We perform static analysis using DDisasm, our state-of-the-art tool for binary disassembly and rewriting, and dynamic analysis using TBDisasm, a tool that monitors a target binary’s execution to produce runtime disassembly. We extract a multitude of features for malware data sets at scale and use these to train and evaluate a variety of models, including gradient boosted trees and graph neural networks. The trained models are then analyzed using various explainable AI approaches, such as SHAP, to identify important features for malware detection and classification.

LLMs at GrammaTech

In recent years, the rise of generative AI approaches and large language models (LLMs) has changed the landscape of AI and the kinds of applications that can be targeted using AI. LLMs have introduced new possibilities for translating human intent from natural language into code and understanding code semantics, in a way that was previously unattainable. At GrammaTech, we have been quick to explore how this technology can enhance software research.

GrammaTech, along with researchers from Carnegie Mellon University and the University of Virginia, was selected as one of the winners of the DARPA AI Cyber Challenge (AIxCC) Small Business Track Competition. Our team, VERSATIL, scored in the top seven. The team received $1 million in prize money and the distinction of competing in the AIxCC Semifinal Competition (ASC) as a Small Business Track competitor. The system uses LLMs to find potential vulnerabilities, guide and customize vulnerability discovery analyses, and generate patches to fix the vulnerabilities. The system is intended to automatically find and repair vulnerabilities at scale.

We have also performed small-scale studies examining the use of LLMs for program analysis and reverse engineering. These include: (i) LLMs for extracting structure and traceability from software development artifacts, as part of Spec-Map, a project funded by DARPA under the OPS-5G contract, (ii) LLMs for language-to-language source code translation, as part of CRAM, our technology funded by DARPA under the LiLaC contract to automatically migrate highly vulnerable C++ code to memory safe(r) Rust, (iii) LLMs for genetic programming, as part of BREW (Binary Rewriting Evolution Workbench), and MEEN (Malware Evolutionary Engine) projects.

What’s Next for AI at GrammaTech?

The progress we have made with AI so far excites us, but we are far from done. Looking ahead, we envision a continuation of our research efforts into AI, in particular for addressing challenges in software security and analysis, and a deeper integration of AI into our tools and services. There are several areas where we aspire to grow:

Expanding Code Similarity Detection: Detecting the similarity between two snippets of code is a challenging task, particularly in the binary domain where semantically similar code may have widely different compiled representations. We’re furthering our efforts into code similarity, including cross-architectural similarity detection, authorship attribution, firmware analysis, and the use of LLMs to generate function signatures.
Fuzzing and Vulnerability Detection using LLMs: Fuzzing is a widely used tool in software testing, but efficient fuzzing often requires an understanding of the code being tested. LLMs offer the possibility of understanding code semantics and generating code automatically, and we plan to further examine the use of LLMs for the task of generating fuzzing harnesses and inputs to trigger vulnerabilities.
Handling Concept Drift: Studies show that the accuracy of ML models may degrade over time as the real-world distribution of data evolves, a phenomenon known as concept drift. This problem affects models across domains, from malware analysis to medical devices. We aim to use a combination of approaches, including continuous monitoring and ensemble modeling, to identify and mitigate concept drift.
Model Assurance: Models can be trained to learn rich feature representations and model complex phenomena, but their use in many security-critical applications requires an understanding of their decision-making and guarantees about their behavior. We are investigating approaches to reason about complex models and to generate assurances that their behavior conforms to user intent.
Improving Efficiency: We aim to examine approaches in support of enabling AI applications, particularly in edge computing scenarios. These include leveraging data sparsity as well as new hardware architectures for efficiently performing AI computations on hardware accelerators.

At GrammaTech, we remain committed to driving research and innovation in software analysis and security. As we continue to explore new horizons, we look forward to seeing how we can use AI to help us achieve our goals of building more secure and resilient software in the future. Stay tuned as we dive deeper into this exciting field and continue to push the boundaries of what is possible with AI!

Our AI Journey at GrammaTech: Machine Learning, LLMs, and Beyond

Machine Learning for Binary Analysis

LLMs at GrammaTech

What’s Next for AI at GrammaTech?

Contact Us

Company