Multi-Armed Bandit in C#

By on 6/1/2012

After reading this fantastic blog post about how to do better than A/B testing (http://stevehanov.ca/blog/index.php?id=132), I decided to create an implementation of the algorithm in C#. As I already had a library dealing with statistical decision making on GitHub, I've added this functionality the nBayes library, which you can find here:

https://github.com/joelmartinez/nBayes - specifically, in the Optimization namespace.

I won't reiterate the pros/cons, and explanation of this algorithm, as Steve does a fantastic job over on his blog ... so I'll just present the sample program I wrote to go along with it. This program uses the API to create three options (the same used in the blog post), and then proceeds to simulate users who prefer to act on one option, then partway (35%) through the exercise changing preference. And you can see how the algorithm correctly begins to primarily show the new choice about halfway through.
/// <summary>This sample program has 3 options, and will use the the optimizer
/// to run a test scenario. Two options are picked, and we will simulate user interest in one,
/// then user sentiment changing to favor the second option.</summary>
private static void TestOptimizer()
{
  var optimizer = new Optimizer();

// define the available options Option[] options = new Option[] { Option.Named("Orange"), Option.Named("Green"), Option.Named("White") }; optimizer.Add(options[0]); optimizer.Add(options[1]); optimizer.Add(options[2]);

// pick two options, and define when user interest will change var firstWinner = options[2]; // white var secondWinner = options[1]; // green var switchRatio = .35f; // 35% of the way through the test set int tries = 100; // the test set

for (int i = 0; i < tries; i++) { Option selected = optimizer.Choose().Result; // don't care about asyncrony in this particular example Console.WriteLine("Choosing: {0}", selected);

// decide which chosen option 'the users' will prefer bool isFirstBatch = (float)tries * switchRatio > i; if (isFirstBatch && selected == firstWinner) firstWinner.IncrementSuccesses(); else if (!isFirstBatch && selected == secondWinner) secondWinner.IncrementSuccesses(); }

Console.WriteLine("\nResults! We expect that ({0}) will have the highest success rate, and ({1}) will be in second place", secondWinner, firstWinner); Console.WriteLine("\nThis is the final result after {0} tries\n{1}", tries, optimizer); }
There are multiple strategies to solve the 'multi-armed bandit' problem. This particular implementation is using 'Epsilon-Greedy'. It's still early, and I have other ideas on how to structure the code in order to facilitate multiple strategies ... but for now, let me know if you've got any thoughts or feedback on this bit of code. Thanks! :-)

See more in the archives