Originally published by the High Confidence Software and Systems Conference
High Confidence Software and Systems Conference, Annapolis, MD, May 8-10, 2023
Authors:
Thomas Wahl
Abstract:
We present work performed at GrammaTech on semi-automated migration of legacy C++ code to Rust. A widely used systems programming language, C++ offers a historically rich set of language features, including low-level memory manipulation capabilities inherited from C. While such features can give rise to efficient programs, memory managed freely by the programmer often results in hard-to-debug program crashes and vulnerabilities.
The Rust programming language regulates memory manipulation by the programmer. It targets a sweet spot between eliminating the potential for memory corruption and retaining the usability of the language. A key concept is that of ownership: each piece of heap-allocated memory is managed by exactly one variable, whose lifetime determines the validity of that memory. Rust’s memory safety guarantees make automated garbage collection unnecessary, paving the way for safe and efficient systems programming of the future. The question remains how to deal with legacy C++ code in active use. Manually rewriting all such code into Rust is not practical.
To address this apparent impasse, GrammaTech is developing an open-source tool called CRAM for the semi-automated migration of C++ code bases into Rust. The goal is to generate Rust code that is not only functionally equivalent to the C++ source, but also human-readable and in fact Rust-idiomatic, enabling further development by human programmers. This combination of goals makes language migration a particularly hard problem. To ensure feasibility, we assume well-designed C++ code as input, where form reflects—not disguises—intent. We solicit assistance from the user in cases where the extraction of intent from the C++ code is ambiguous. Finally, we are developing methods to assure the programmer of the correctness of the migration, ranging from migrated test cases to automated equivalence proofs for code fragments.
CRAM divides the migration process into two stages. The first stage remains entirely in C++ and refactors the code so as to enforce Rust’s memory usage rules in C++. For example, CRAM breaks up nests of mutable (non-const) aliases to the same memory location in C++, preparing the code for migration into a language where such nests are forbidden. These refactorings implement our vision that, while C++ does not enforce safe programming, it does not prohibit it either. Experienced C++ programmers will avoid reference nests. Rust elevates this recommendation to a rule, which we now enforce on the C++ side. Pre-migration code refactoring has significant additional advantages:
- It creates a valuable migration by-product: a hardened version of the original C++ code.
- It permits intra-language assurance analysis: the refactoring does not cross language boundaries.
During the second stage, migration libraries developed as part of the CRAM project map formal descriptions of programmer intent lifted from C++ code segments to human-readable, idiomatic Rust code. We envision a programmatic but language-agnostic notion of intent. For example, CRAM recognizes loops iterating over data containers, how many iterators they use, whether they iterate forward or backward, etc. Such intent representations can easily be retargeted idiomatically to a language like Rust.
CRAM currently handles a subset of (well-designed) C++, in a largely automated fashion. In experiments, both the refactorings of the C++ code and the migration into Rust have resulted in efficiency improvements of the code. Notably, one Rust program obtained via manual migration by an expert exhibited about the same performance as the CRAM-generated Rust program, although the two programs were rather different.