Programmed by computers
ETH Professor Martin Vechev is automating the computer programming process: he is one of the first researchers ever to teach computers how to write their own software. This makes him one of the founders of a new field of research that is growing rapidly.
Computer software written by computers: a fascinating idea. And a realistic one at that, says Martin Vechev, Professor of Computer Science. He’s one of the founders of a new field of research in which computer scientists are seeking to largely automate the programming process. Assistance programs are already helping to facilitate software developers’ work, and it won’t be long before such programs enable normal developers to program to the same standards that today are achieved only by leading programming experts, says Vechev. “In ten years’ time, the degree of automation will be advanced enough for computers to be able to write short programs autonomously,” he predicts.
This is made possible thanks to the concept of machine learning and the huge public software databases already in existence today. “The way we see it, programming doesn’t have to involve reinventing the wheel each time. Rather, it’s about learning from good examples and tapping into the large pool of these existing examples,” Vechev explains.
Public databases offer access to millions of computer programs containing several billion lines of program code, a colossal collection Vechev refers to as “big code”. Such inconceivably large volumes of data can be overwhelming for software developers, but with the help of computers, this data can be analysed and processed to make it fit for practical use.
Computers can recognise patterns in existing code and learn which ones are used in which context. In doing so, they capture not only individual characters and commands, but also their meanings and the rules for how they are used. The way computers learn these rules is similar to how machine translation works, of which Google Translate is a well-known example. “Translation tools also use intelligent machine learning technology to analyse words in their given context and then draw conclusions about their meaning and usage and about grammar rules,” Vechev explains.
In the future, assistance programs for developers will work in a similar way to the auto-complete functions we use today for writing text messages on smartphones. For example, a software developer writes the first hundred lines of code, which the assistance program then analyses and compares with the existing code in the database. Based on the results, the computer then makes suggestions for how to continue the code, which the developer can either accept or reject. The computer also uses this feedback to understand the programmer’s objectives and to continuously improve the suggestions it makes.
At the core of these new kinds of assistance programs sit what are known as probabilistic models. These probabilistic models are formed by learning from a vast number of already available programs and programming fragments. This allows the assistance program to use the probabilistic model to present the user with the most probable continuation options. The ETH professor is working to come up with ever-better probabilistic models. His group recently developed one such model, named PHOG, which is currently the most precise code analysis model there is. The model is applicable to data sets beyond code – for example, to natural language. In addition and in contrast to other models, it provides not only answers, but also makes the choice of these answers comprehensible for the users. “Anyone interested in building such assistance tools can use the PHOG model as a basis,” says Vechev.
Vechev and his team also develop such assistance programs. Examples of their tools are solutions called JS Nice and APK Deguard, which are freely available online. These are a sort of correction program. Developers can use them to check their programs and to obtain suggestions on how to transform that program so that it is more easily understood by outsiders. The tools can also be used to decipher algorithms that were deliberately programmed in an obfuscating way, for example in order to obscure malware. More than 200,000 developers and IT security professionals worldwide have used JS Nice since its release.
Last year, Vechev and his former PhD student Veselin Raychev founded DeepCode, an ETH spin-off. The company has set itself the task of creating new assistance programs for developers based on a direction explored in Vechev’s SRL laboratory. This opens up opportunities for further applications down the line, such as programs that detect programming errors and propose fixes.
“We’re seeking to develop software capable of solving challenges in software better than people can,” says Vechev. “A few years ago, we were one of the first groups to set ourselves the goal of learning from ‘big code’. Today, this area is attracting interest from many colleagues as well from various software companies. It’s an interesting and fast-growing field of research.”
This article appears in the current issue of Globe.