Statistics & Probability Medium Asked at AmazonAsked at MetaAsked at Nielsen

What are the main sampling methods and how can sampling introduce bias?

For Data Scientist Data Analyst ML Engineer

The short answer

The main probability sampling methods are simple random sampling, stratified sampling, cluster sampling, and systematic sampling. Bias enters when some units have a zero or systematically different probability of selection — as in convenience sampling, survivorship bias, or non-response bias — making the sample unrepresentative of the target population regardless of size.

How to think about it

Cover the four standard designs briefly, then spend time on bias sources — that is what distinguishes textbook knowledge from applied thinking in an interview.

Probability sampling methods

Simple random sampling (SRS): Every unit has an equal probability of selection. Unbiased and easy to analyse, but requires a complete sampling frame and can be expensive for rare subgroups.

Stratified sampling: Divide the population into strata (e.g., age groups, regions), then draw SRS within each stratum. Ensures representation of small subgroups and typically reduces variance compared to SRS — especially when strata are internally homogeneous.

Cluster sampling: Divide the population into clusters (e.g., schools, cities), randomly sample clusters, then survey all or a random subset of members within selected clusters. Cost-effective when a frame of individuals is unavailable, but introduces intra-cluster correlation that inflates variance.

Systematic sampling: Select every k-th unit after a random start. Efficient and simple, but can introduce periodicity bias if the population has a hidden periodic structure aligned with the step size.

Non-probability sampling and bias

Convenience sampling: Units are selected because they are easy to reach — volunteers, web surveys, street intercepts. Selection is systematically related to the outcome of interest in most cases.

Survivorship bias: Analysing only units that survived a filtering process (e.g., companies still operating, patients who completed treatment) ignores those that were eliminated, distorting conclusions.

Non-response bias: Occurs when people who respond differ systematically from those who don’t. A 10% response rate is dangerous not because it is small but because the 90% who declined may hold different views.

Self-selection bias: Users who opt in to a product feature, study, or platform are not representative of the full population.

Increasing sample size does not cure bias

A larger sample of a biased design produces a more precise wrong answer. The 1936 Literary Digest poll predicted a Landon landslide over Roosevelt using 2.4 million responses — and was catastrophically wrong because its sampling frame (car owners, phone subscribers) skewed wealthy Republican.

Learn it properly What is probability

What are the main sampling methods and how can sampling introduce bias?

Probability sampling methods

Non-probability sampling and bias

Increasing sample size does not cure bias

Keep practising

Explore further