How to do data sampling without replacement?

Sampling without Replacement is a way to figure out probability without replacement. In other words, you don’t replace the first item you choose before you choose a second. This dramatically changes the odds of choosing sample items. In this case, the probabilities for the second pick are affected by the result of the first pick. The events are considered to be dependent or not independent.

Example: Let’s say you had a population of 7 people, and you wanted to sample 2. Their names are:

  • John
  • Jack
  • Qiu
  • Tina
  • Hatty
  • Jacques
  • Des

You would have the same list of names to choose two people from. And your list of results would similar, except you couldn’t choose the same person twice:

  • John, Jack
  • John, Qui
  • Jack, Qui
  • Jack Tina…

But now, your two items are dependent, or linked to each other. When you choose the first item, you have a 1/7 probability of picking a name. But then, assuming you don’t replace the name, you only have six names to pick from. That gives you a 1/6 chance of choosing a second name. The odds become:

  • P(John, Jack) = (1/7) * (1/6) = .024.
  • P(John, Qui) = (1/7) * (1/6) = .024.
  • P(Jack, Qui) = (1/7) * (1/6) = .024.
  • P(Jack Tina) = (1/7) * (1/6) = .024…

As you can probably figure out, I’ve only used a few items here, so the odds only change a little. But larger samples taken from small populations can have more dramatic results.