The personal website of Chris Coyier

Why A/B Testing is Confusing to Me

Let’s say you have two versions of a design and you want to see which design converts users better. You show “Design A” half the users and “Design B” to the other half. Say “Design B” does better. Now you have your answer: use “Design B”.

But let’s say you have three designs. You start out:

A vs. B = B wins

so you pair up the winner vs. C:

B vs. C = C wins

Is C automatically the best because it beat the winner of the first test? Probably. Well. Maybe? To find out, you pit them against each other.

A vs. C = A wins

Uh oh, now all three designs have one victory. Which is best?

Well you’d probably not use win/lose as the criteria, you’d test actual conversion numbers. So you’d run all three designs for equal lengths of time and whichever had the highest numbers is the best of all three. That means you can’t run one test then later do another, you really need to do A/B/C testing so the data is all during the same period so less unknowable external factors are involved. Also that the length of time is long enough that randomness is mitigated.

So yeah, not really “confusing” so much as A/B testing isn’t the right term when comparing more than two things. More like A/B/../X/ testing or “Let’s compare the effectiveness of some designs and not screw it up testing.”