Encyclopedia > Simpson's paradox

  Article Content

Simpson's paradox

Simpson's paradox is a statistical paradox first described by E. H. Simpson in 1951, in which two sets of data both separately support a certain hypothesis, but when considered together support the opposite hypothesis.

As an example, suppose we have two people, Ann and Bob. In the first test, Ann and Bob are let loose on Wikipedia, and Ann edits 100 articles, improving 60 of them, while Bob edits 10 articles, improving 9 of them. In the second test, Ann and Bob are again let loose on Wikipedia, and this time Ann edits 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them. So in the first test, Ann improved 60% of the articles of the articles she edited, while Bob's success rate was 90%, and in the second test Ann managed 10% while Bob achieved 30%. Both sets of data therefore support the hypothesis that Bob's edits are more likely to be beneficial than Ann's. But if we combine the two sets of data we see that Ann and Bob both edited 110 articles, and that Ann improved 61 while Bob improved only 39. The combined data therefore supports the opposite hypothesis, that Ann's edits are more likely to be beneficial than Bob's.

Like many other paradoxes, the contradiction here is merely a result of incorrect reasoning - percentages cannot always be compared without information on the size of the data set. The correct conclusion is that Ann's edits are more likely to be beneficial than Bob's.

As another example, consider two hospitals. In the first one, 3% of all patients die, while in the second one, only 2% of all patients die. Which hospital is better? The data is not sufficient. It is possible that the first hospital handles many more severe cases than the second one, and in fact handles both severe and non-severe cases better than the second hospital.

All Wikipedia text is available under the terms of the GNU Free Documentation License

  Search Encyclopedia

Search over one million articles, find something about almost anything!
  Featured Article
Digital Rights Management

... Felten's freedom-to-tinker Web site for information and pointers. An early example of a DRM system is the Content Scrambling System (CSS) employed by the DVD ...

This page was created in 30.9 ms