Sampling requires that we make a statistical inference about the population from a small set of observations.
We can generalize properties from the sample to the population. This process of estimation and generalization is much faster than working with all possible observations, but will contain errors. In many cases, we can quantify the uncertainty of our estimates and add errors bars, such as confidence intervals.
There are many ways to introduce errors into your data sample.
Two main types of errors include selection bias and sampling error.
- Selection Bias. Caused when the method of drawing observations skews the sample in some way.
- Sampling Error. Caused due to the random nature of drawing observations skewing the sample in some way.
Other types of errors may be present, such as systematic errors in the way observations or measurements are made.
In these cases and more, the statistical properties of the sample may be different from what would be expected in the idealized population, which in turn may impact the properties of the population that are being estimated.
Simple methods, such as reviewing raw observations, summary statistics, and visualizations can help expose simple errors, such as measurement corruption and the over- or underrepresentation of a class of observations.
Nevertheless, care must be taken both when sampling and when drawing conclusions about the population while sampling.