Chaos Dev Diaries Episode 1:

The Phantom Timer Menace

The release of Chaos Agents v1.8.6 has been hotly anticipated by both the Chaos Community and the Popularium team for the many “blockbuster” features that it will bring.

Now that it’s been a few weeks since the announcement, it’s natural to wonder “what’s taking so long?” “Why aren’t we partying on v1.8.6 already? “Where are my sweet, sweet drop locations?” “When can we stop hearing match hosts say “everybody ready…?” for the 50th time while the 30s timer is at 50s?” “When, for that matter, can we start our own matches?”

And most importantly, how much longer must we endure Agents on Survive strategy making ‘brilliant’ decisions that cause them to turn around and right run into a “Random Attacker” or even better, Hamfist?

Great questions! The Chaos Agents dev team wanted to give you an inside look into the many trials and tribulations along the exciting journey to shipping v1.8.6, while also providing you with a classic example of how software development, despite best intentions and efforts, can sometimes devolve into a domino of causes and effects that, while being painful for us devs, can provide wonderful entertainment for the rest of the community. So grab your popcorn and get ready for a ride!

You’re probably aware that the software of Chaos Agents is split into 2 main applications: the server application (and supporting services, together called the MSA) and the client application (or the MCA). The MCA is developed in Unity and is flexible enough to be able to run on Windows, Mac OS, Linux, or through a Web browser using WebGL/Web Assembly (which is the build we use for the Playtest.)

We typically like to handle 1 major feature per release (for example, 1.8.5 was focused on stability, given the frequent crashes with the 1.8.4x versions), but for v1.8.6 we decided to take on a few “semi-large” features that were all clearly burning needs for our community and its growth:

Drop Location Selection: Select where your Agent drops on the map at the start of a match, while being able to view shard distribution for Round 1.
Automated Round Timers: Automated timers for Prep Phases so hosts don’t have to manually start the Battle Phase.
Better Survive AI Pathfinding: Please, o please dear Agent, don’t run into an enemy when you’re trying to survive?
Survive AI Stat Buffs: Giving the Agent buffs when they switch to Survive strategy (base health regen, armor etc.)
Start and Join Matches: Enable players to join matches that they haven’t been invited to, and to start their own (private and public) matches.

Our thinking behind this plan was that each of the above features builds on existing well-tested and modularized code, so we should be able to develop these in parallel without impacting any of the other features.

And to our great joy, things were mostly on course during development, where the core code for the features came together efficiently and on time. But then, we started deploying the code to live environments, and that’s when the real fun started.

During development, our code travels from Github repositories to:

A Development (Dev) environment: This is where basic testing is done.
A Staging environment: This is a pre-production environment where the build is stabilized.
Production environment: This is where players can access the build.

The challenges that we have faced with shipping v1.8.6 were primarily due to the fact that the set of features we were developing, while clearly separated from a code perspective, had cascading effects on each other that were not apparent till we deployed the build to the Dev environment — and even worse, in some cases not until the build went to the Staging environment. And because the issues were due to a combination of factors, they resulted in our having to go back into some of the core plumbing of the Chaos Agents code base to fix them.

So what should have been a 7-10 day testing schedule turned into additional development work that by itself could have been an entire release!

The good news is that we see the light at the end of the tunnel — the current build on staging seems to be running like a champ, so this is a great time to look back and learn from the amusing and bemusing series of events that led to the v1.8.6 adventure.

But before we do that, we wanted to thank the Chaos Community from the bottom of our hearts. Your bug reports, feedback, and engagement with us on Discord and during matches has played the most critical role in helping us track down and resolve some incredibly complex issues — issues that otherwise would have taken us several more devs / several more weeks to resolve without your support.

We cannot thank you enough for your support and also for your patience and encouragement as we worked through these issues to bring you what we think is going to be a banger of an update that we’ll enjoy for many weeks to come.

“v1.8.6 Launch Community Award” — Every single playtester Trainee who has been actively involved in the community over the past 2 months, and has helped us very tangibly with v1.8.6 deserves not just our gratitude and appreciation, but also tangible rewards! So, just reach out to Catalyst, RocketLunch, or Schumaniac to claim your “v1.8.6 Launch Community Award” (it’s a surprise, and we want you to ping us for it as it’ll be slightly different for each community member!)

Now, if you want to learn more about the various scary bosses slowing down v1.8.6 and the strategies used to beat them, read on!

- Schumaniac, Raventhon, Draknab, and Catalyst

Chapter 1: Start of the Adventure — Automated Timer Synchronization

The main culprit for the delay of 1.8.6 has been the implementation of the automatic round timer, which has caused a series of issues that were not apparent till we ran the timer code in a live environment with multiple clients.

We originally thought we’d be able to handle it in a simple way that would let the MSA (server app) and MCA (client app) keep their own individual timers and then sync up at the end of the Prep Phase. However, after initial testing, it quickly became clear that keeping the server time in sync with the client timer required a rewrite of the code to make the server the definitive timekeeping authority in all cases.

The primary reason behind this was that when Chaos Agents v1.0 was being architected (in late 2022), we wanted to get to a proof-of-concept for the core game design and technology. We also wanted to ensure that the game could run on a browser, which led to the creation of an intermediary REACT application that acts as a broker between the Unity-based MCA and the Python/C++ based MSA.

Due to these two facts, the MCA implemented a polling (or “pull”) mechanism where it would query the server when it needed data (for example, the next batch of actions than an Agent should perform over the next 20 seconds), which was more efficient to set up, but not as scalable as an “event-based” or “push” architecture where the server (MSA) would sent data to the client (MCA) when it was ready (for example, with the next batch of actions for an Agent.)

The polling mechanism, however, was interacting poorly with the automated timer system, as it was causing severe issues in ensuring that the MSA was able to know when all the clients are ready (or not) to start the next Battle Phase (and several other related issues.)

Chapter 2: The First Boss appears — in form of Drop Location Selection

To make matters worse, the addition of the Drop Location Selection feature added a new, additional phase that the automated timer needed to keep in sync with the rest of the match.

The polling mechanism described above was pulled into even more esoteric interactions due to the addition of the Drop Location phase to be handled by an automated timer.

This led to a situation (in early January), where every time we would fix one complex interaction bug, it would lead to another bug being revealed, devolving our development process into a giant game of whack-a-mole.

Chapter 3: An Epic Weapon is revealed—The legendary orange-tier SSE (Server-side Events!)

After a long and thorough review, we made the call in mid-January to switch to a system where the server broadcasts events (SSE) to the clients, instead of continuing the fiddly and difficult to bugfix process of evolving the polling architecture.

Our dev team was able to undertake and finish this extremely complex project in a matter of days. On deploying the first SSE builds to the dev and staging environments, we were delighted to find that the SSE solved almost all of the thorny issues that were being caused by the automated timer and its cascading effects.

Chapter 4: A Second Boss joins the first! In the form of Finding Matches and Agent Selection!

In response to substantial player feedback, we’ve taken our first steps towards a more full-featured match-joining and agent selection experience, enabling several key features for 1.8.6 like the ability to assign Agents to specific users and an improved and more flexible method of joining matches that alleviates the current substantial downtime in tournament play while the playtest managers explicitly add users to each pod.

This has introduced additional complexity that only surfaced once we were in a multi-user test environment. Pre-1.8.6, once the user finished loading into the match, the MCA requested the current state of the match - which users are connected and which agents they’re using.

For 1.8.6, given the switch to the new architecture, the MSA server now handles broadcasting match state to users whenever it updates.

Additionally, in order to ensure that the same Agent doesn’t get selected multiple times for the same match, the server receives the match update information as soon as the user hits ‘join match’ in the launcher. This caused users to only receive match updates after another user joined or when the match was started, causing further synchronization issues among clients prior to the start of a match.

Chapter 5: … And here comes a Mini-boss in the form of “Survive” strategy changes!

A side-effect of the switch to SSE and other major changes to the code was that the code that changed the behavior of Agents under Survive strategy to produce better “escape paths” and also give on-the-fly buffs to Agents on Survive, could not be tested in the live environment till after the SSE code had been implemented and stabilized.

Why so, in the name of the Imperious Bonecrusher, you ask? This is because the AI and Effects logic that needs to be tested interacts in interesting ways with the Prep Phase and Battle Phase transitions, and would also be affected by the automated timer.

So, while we could test these features in isolation, it was not possible to start testing them on the Staging environment until after the SSE code had been deployed and stabilized.

Chapter 6: The Final Boss and Upgrades to the Epic SSE Weapon

But the adventure was not yet over. The change to SSE solved several problems, but caused a few of its own due to the major nature of the transition:

Server-broadcasted events weren’t properly handling cases where disconnects happened during battle phases, resulting in lost agent action data. Due to the pre-1.8.6 design, the frontend required all battle phase data to be delivered and processed before advancing to the prep phase, resulting in matches in some cases refusing to advance to the prep phase with 0:01 or 0:02 left on the battle phase clock.
As a result of the switch to server-broadcasted events, we’ve had to update the methods that we use to process data within Unity. The WebGL build we all use to play the game on our servers uses the custom React app as a broker interface between the frontend and the backend, but when we’re developing and testing purely within Unity, we don’t have access to any of the React logic.
To help with the Unity testing problem described above, we originally attempted to maintain our past polling structure (where the MCA sends calls to the MSA and receives responses containing the data to render, which is what we’ve been using in Unity) alongside the new broadcast events. Unfortunately, this caused some issues where data would be processed in an incorrect order, causing hangs. To fix this, we’ve implemented a new parser for the MCA and we’re planning on making further improvements for the next release as well.
When selecting drop locations for units, network delay was causing units to initially be visually positioned properly before rubber-banding back to their previous location briefly.

We have just in the past 48 hours implemented and tested fixes for all of the above. So, while v1.8.6 is not quite ready yet, unless there’s a “Surprise End Boss” hidden somewhere, we should be on the home stage of our grand adventure.

Once again — thanks to each and every member of the Chaos community for helping us navigate these epic quests to bring home some epic rewards for us all!