Mathieu Nayrolles’ Keynote at the ASE conference in San Diego

We had the opportunity to present a keynote talk at the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). The annotated slide deck is available here.

As a world leader in the video game production, we know how much effort and time it takes to produce AAA games. It oftentimes requires hundreds of developers, testers, artists, sound engineers that, collectively, produce tens of thousands of commits scattered in hundreds of thousands of files.

A video showcasing the kind of games Ubisoft produces (Slide 2)

Production pipelines to create these games are in constant evolution to meet our millions of players’ evolving expectations. For example, the Games as a Service (GaaS) paradigm requires constant addition of new high quality content that cannot disrupt the code or create new bug while being implemented.

To improve our pipelines we are using known techniques from the software as a service world (SaaS), landmarks of the software engineering scientific literature and our own research (MSR’18, ) . We are also actively collaborating with several Canadian universities (Concordia Montreal, Polytechnique Montreal, ETS, McGill, UQAC) and the Mozilla Foundation, to further automate our production pipeline.

Bugs and regressions can cause a massive slow down to this complex workflow, so eliminating them as fast as possible has become a critical activity.

During the software development process, commits (or code-contribution) are created, bugs are found and fixed. To fix a bug we, usually, have to modify existing code. One way to represent code that goes beyond that text the Abstract Synthax Tree (AST). The AST is a tree containing all the tokens in your code. Here are the CPP and C# ASTs of int foo = 3;.

[La Forge] Mathieu Nayrolles’ Keynote at the ASE conference in San Diego - ast-768x332

While the code is the same, because each programming language is different, the AST representation differs from language to language. Because of the 30 years of Ubisoft’s development experience, we have a gigantic dataset of AST transformation from which we can learn how to bugs were fixed in the past.

A video showcasing AST-fixes (Slide 61)

While intuitively we could think that fixing a bug takes only one AST transformation, we can see that fixes can require several attempts. These attempts are required either because the bug is hard to fix or that the fix itself introduces and new regression and has to be … fixed.

It exists several technics in academia and industry that aim to learn from these transformations and we proposed one last-year called CLEVER (Combining Levels of Bug Prevention and Resolution techniques) [1].

[La Forge] Mathieu Nayrolles’ Keynote at the ASE conference in San Diego - msr-768x557

From MSR’18: *Figures 1, 2 and 3 show an overview of the CLEVER approach, which consists of two parallel processes. In the first process (Figures 1 and 2), CLEVER manages events happening on project tracking systems to extract fault-introducing commits and commits and their corresponding fixes. For simplicity reasons, in the rest of this paper, we refer to commits that are used
to fix defects as fix-commits. We use the term defect-commit to mean a commit that introduces a fault.

The project tracking component of CLEVER listens to bug (or issue) closing events of Ubisoft projects. Currently, CLEVER is tested on 12 large Ubisoft projects. These projects share many dependencies. We clustered them based on their dependencies with the aim to improve the accuracy of CLEVER. This clustering step is important in order to identify faults that may exist due to dependencies while enhancing the quality of the proposed fixes. In the second process (Figure 3), CLEVER intercepts incoming commits […] Once the commit is intercepted, we compute code and process metrics associated with this commit. […]. The result is a feature vector (Step 4) that is used for classifying the commit as risky or non-risky. If the commit is classified as non-risky, then the process stops, and the commit can be transferred from the developer’s workstation to the central repository. Risky commits, on the other hand, are further analyzed in order to reduce the number of false positives (healthy commits that are detected as risky). We achieve this by first extracting the code blocks that are modified by the developer and then compare them to code blocks of known fault-introducing commits.*

Other researchers have looked at using neural machine translation (NMT) to learn the bugs and translate them into fixed-code, very much like translating French into English. One good example was published by Michele Tufano et al [2].

Because of the foundation they rely on, we believe that NMTs are ill-suited to learn how to fix bugs, at least complex video game bugs according to our internal experimentations. Indeed NMTs are only considering two versions of the code, the bugged and fixed ones and not chains of fixes.

We suggest a new way of using deep learning for handling chains of fixes. The approach would work like so where we have an AST, transform it into a graph and embed it. Then, in the latent space of our deep-learning encoder, we can do a KNN and find known bugs that are similar to the proposed AST. It operates somehow like a near-miss deep-learning clone finder that only focuses on bug-introducing commit. Then, we can follow the fix-chains we found and apply them to the proposed AST.

[La Forge] Mathieu Nayrolles’ Keynote at the ASE conference in San Diego - deep-768x291

An added benefit is the traceability of the suggested fixes as they are mined from and linked to human fixes. We can, therefore, link the past-bugs and their fixes to any patch recommendation. This would allow developers to understand why a given patch is proposed and the context surrounding its creation.
Another way of identifying regressions in games is to train bots to play the games and report their findings. This will be the topic of an incoming in-depth article.
The annotated slides presented at ASE are below and a playlist of the videos contained in the slides can be watched here:

References:

[1] Mathieu Nayrolles and Abdelwahab Hamou-Lhadj. 2018. CLEVER: combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). ACM, New York, NY, USA, 153-164. DOI: https://doi.org/10.1145/3196398.3196438

[2] Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk in 2019. On learning meaningful code changes via neural machine translation. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Piscataway, NJ, USA, 25-36. DOI: https://doi.org/10.1109/ICSE.2019.00021.

Author
Mathieu Nayrolles (Ubisoft La Forge)