7/19/2019

Dev Blog: Exploit Postmortem

In this Dev Blog, we will take a look at the exploits, how we resolved it, and what that fix means for future exploits.

TIMELINE

MAY 11-12

  • In early May, three exploits gained traction within the Rainbow Six Siege community – the Clash shield exploit, Claymore exploit, and the Deployable Shields exploits. 
  • As the team monitored the situation over the weekend, it became clear to us early on that this was a priority issue.

MAY 13

  • The team began our investigation into the exploits.  Our first step was to accurately and consistently reproduce the exploits in-house to understand the problem and then begin to identify the root cause.
  • The initial investigation revealed that we would need time to understand the scope of the problem and a quick fix would not be possible.
  • As the exploit continued to spread throughout the community, we began to discuss plans to mitigate the impact in the short-term.

MAY 15

  • Rainbow Six Siege’s project leadership sat down to discuss the exploits. We needed to determine, quantitatively through data, the exact scope and impact of the exploits in-game.

MAY 16

  • The morning of the 16th, after reviewing the data, we made the decision to disable Clash, Claymores, and Deployable Shields.
  • Various teams put their typical work on hold as we explored options to create and deploy a solution that would disable Clash, Claymores, and Deployable Shields.
  • We began testing each possible solution as they were completed. We identified five potential ways to approach the issue and after a number of failures finally found success with our last approach.

MAY 17

  • After an expedited first and second round of tests came back successful, we activated the switches that disabled Clash, Claymores, and Deployable Shields as an emergency first response.

MAY 24

  • We realized that a full fix would require more time as it touched upon core system processes, primarily the order in which packets are sent to the server. However, with the understanding that the removal of Clash and key gadgets severely impacted gameplay, teams had begun to work in parallel to create various fail-safes that could be safely introduced.

JUNE

  • Over the next few weeks exploit reports once again begin to surface.
  • However, with the switches already in place, we were able to react quickly to the new reports and disable Clash and IQ.
  • At this point the team had already finalized a complete and operational fix for the underlying cause of the exploits. Internal testing had already shown positive results, but we needed to push it onto the Test Server first for more large-scale testing before we could confidently deploy it on live.

JUNE-JULY

  • In late June the Global Ordering goes out to PC first and later to Console in July.
  • During this time we monitor the performance of the global Ordering Change as well as check for signs of any regression.

SWITCHES + FAIL-SAFES

As part of the team worked on a fix, another part of the team had also begun development on several short-term fail-safes early on in the process with the goal of creating a more sustainable short-term solution. These switches and fail-safes, which targeted the exploit methods only, were never intended to be a full fix. Instead, it was a call made by the team after weighing the cost of the removal of Clash and two gadgets on gameplay health. They were released to the live servers soon after with the challenge to the community to report any further occurrences.

The decision to remove Operators and configure loadouts is not a decision we make lightly. We need to respect the time and effort players have spent towards unlocking in-game content, as well as be cognizant of the massive impact that removing core gameplay mechanics can have on the meta and ecosystem of Rainbow Six Siege. However given the situation, we felt waiting for the full fix was not an acceptable solution. We needed to nullify the exploits and normalize gameplay as safely and quickly as possible for the players.

The fail-safe solutions needed to satisfy several requirements, they needed to be: surgical so that it would not cause any type of regression or unintended collateral damage, easily modifiable for rapid response, and require minimal testing for fast delivery. The resulting Operator and gadget switches were our first response to rapidly address the exploits with the understanding that it was a last-resort and extremely short-term solution as we prepared and tested our fail-safes.

WHAT IS GLOBAL ORDERING?

Each Operator, gadget, grenade, shield, camera, wall and plant in Rainbow is an object. Every object in Rainbow Six Siege sends a message (packet) to the server when any step of an action is performed. For example, when you use your gadget your Operator will send the messages “I am equipping my gadget” and “I am using my gadget” in sequence. At the same time, your gadget is also concurrently sending messages to the server such as “gadget activated” and “gadget deployed”. The server sends these messages to the other players in your match.

Previously, the order in which these messages were delivered to the server were only guaranteed per game object. This meant that there was also a chance for failure when messages go missing or are delivered out of order to the server. The exploits took advantage of that flaw in our network protocol by spamming actions simultaneously, which increased the probability that these network messages would fail to be properly received. The result was that mistaken replication over the server meant players’ games would fail to correctly display the proper and intended action.

Our solution to the problem, Global Ordering, modifies the network engine to globally order all messages. Messages are also sent in multiples, to ensure data is sent reliably. Now, this means that every game will receive the same sequence of messages and replication will be synchronous across each player’s game.

When we recognized that the all the ongoing exploits were related to this core process, we had to be conscientious in how we approached Global Ordering. We needed to ensure there would be no harmful side effects from the change. Ultimately, while the change to Global Ordering imposes some additional burden on bandwidth, we felt that the benefits of Global Ordering were worth the small increase in bandwidth costs and made the decision to proceed.

WHAT THIS MEANS FOR THE FUTURE

With the new Global Ordering in place, actions are uniformly replicated for all players across a match as they are now ordered at the global level, as opposed to client level. We have evaluated and weighed the additional bandwidth cost and while it will increase slightly, it will not affect your gameplay experience. This should prevent any further issues with out-of-order delivery of messages and any similar exploits.

The exploits and their impact on the community have also highlighted the value of Operator and gadget switches, and the need for us to have greater control over Operator and loadout configuration on the live servers. We have since begun to dedicate more resources towards fulfilling this need to ensure that we can respond quickly to the situation in the event of future exploits.

Visit Other Social Channels

facebook icontwitter iconyoutube icontwitch icon