Multi-Armed Bandits - Best for Mobile Game Optimization

Andre Cohen |

Multi-Armed Bandits - Best for Mobile Game Optimization

For game developers, A/B testing is part of the standard process for deploying new game features. With a well designed A/B test, developers can get insights about the impact of said feature, how players respond, and if the feature takes the game in the intended direction. However, practically speaking, A/B testing leaves a lot of money on the table because it fundamentally assumes that there is a “best” variant. And, if there is a “best” variant, it must consequentially be applied to the entire user base.

This industry-wide problem is particularly apparent in testing offers and promotions in games. Testing whether the “Thanksgiving Sale” at $9.99 is better than the “Black Friday Sale” for $4.99 using traditional A/B testing methods does not accomplish the goal of maximizing revenue for the developer. First, the A/B test very likely won't reach statistical significance for at least 6 weeks (making the test completely irrelevant). Second, while conducting the test, revenue is not being maximized, costing the developer thousands of dollars by offering poor converting variants. Third, the test assumes that there is a “best” offer to show all players in the game regardless of spending and playing behavior.

The Multi-Armed Bandits (MAB) framework solves these three shortcomings by not assuming that there is a best variant and reaching optimal performance faster than traditional A/B tests. Here is why Gondola migrated to MAB:

Learn vs. Earn

What is often forgotten is the opportunity cost of A/B tests. While an A/B test is running, there is a cost associated with offering less optimal variants as often as the best variants. MAB, on the other hand, continuously switches between the learning phase (used to learn how different variants perform) and earning phase where optimal variants are used to maximize performance.

More Than One Solution

Anyone that has tried to run an A/B test to compare similar (but different) offers associated with an in-game event knows that the results are not binary. Though while planning an experiment it was idealized that players would all gravitate towards one of two variants, the reality is that the difference in conversion between the variants is less than 5%. There are a hundred reasons why this is the case. The most common being that two large player segments with very different preferences for variants cancel each other out in an A/B testing scenario (in mobile games the textbook example is how Android and iOS users show different preferences for offers). The fact of the matter is, there are two good variants, and it is unreasonable to expect the game’s Product Manager (PM) to choose just one. This is where MAB comes in because it never phases out a variant completely. Contextual MAB, which considers the player’s profile, can learn to channel different variants to different players.

Instead of treating every player the same way, contextual MAB attempts to make the best decision regarding learn vs. earn and which variant to present each player based on their attributes/behaviors. Player attributes can include session count, lifetime spending, player progress in the game, previously purchased offers, country, and pretty much any analytics data the game has been recording.


The conversion funnel that occurs in the first 10 minutes of gameplay is thoroughly optimized at launch. However, games are “living creatures" that are continuously growing with new levels, additional game modes, and in-game content for purchase. Externally, the game market is also evolving with new games competing for the same players, User Acquisition (UA) sources coming and going, and the fluctuating quality of players acquired. How optimal are the results from the original A/B test a year down the road? Yes, the game might still be converting players, but chances are that the funnel is no longer optimal.

With MAB, there is never a point when the algorithm determines the absolute best variant in an optimization. This characteristic is a big difference from A/B testing, where there is a point at which “statistical significance” is reached, and the PM can choose the variant with the most favorable outcome with certainty (not a chance occurrence). While optimizing a feature or offer with MAB, there is some traffic allocated to “double check” if non-optimal variants are performing better. This process is captured by the concept of regret: the difference between the total potential reward (ex. revenue) if the optimal variant was selected and the reward accrued from the actual choices made thus far in the optimization. Since MAB is always trying to minimize regret, it naturally limits the amount of traffic allocated to learning.


The game industry is at the forefront of game optimization. The reason A/B testing is widespread is that it offers the ability to reduce the risk of releasing game features and offers while at the same time delivering assurance that the results are not by chance. However, A/B testing is hard to execute due to the volatile player base, low conversion rates (often increasing the minimum number of players to impractical levels), and a long time to reach statistical significance. Today, MAB benefits from the decades of research and algorithms that deliver faster times to optimal performance, do not require the assumption that there is one best variant for all players, and continuously reevaluate to ensure the right variant is used at all times.