Reservoir Sampling
Reservoir Sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Typically, n is large enough that the list doesn't fit into memory.
Essentials
Problem:
Choose
kentries fromnnumbers. Make sure each number is selected with the probability ofk/n
Basic idea:
Choose
1, 2, 3, ..., kfirst and put them into the reservoir.For
k+1, pick it with a probability ofk/(k+1), and randomly replace a number in the reservoir.For
k+i, pick it with a probability ofk/(k+i), and randomly replace a number in the reservoir.Repeat until
k+ireachesn
Proof:
For
k+i, the probability that it is selected and will replace a number in the reservoir isk/(k+i)For a number in the reservoir before (let's say
X), the probability that it keeps staying in the reservoir isP(X was in the reservoir last time)xP(X is not replaced by k+i)=
P(X was in the reservoir last time)x(1 - P(k+i is selected and replaces X))=
k/(k+i-1)x(1 - k/(k+i) x 1/k)=
k/(k+i)
When
k+ireachesn, the probability of each number staying in the reservoir isk/n
Example
Choose
3numbers from[111, 222, 333, 444]. Make sure each number is selected with a probability of3/4First, choose
[111, 222, 333]as the initial reservior.Then choose
444with a probability of3/4For 111 , it stays with a probability of
P(444 is not selected)+P(444 is selected but it replaces 222 or 333)=
1/4+3/4*2/3=
3/4
The same case with
222and333Now all the numbers have the probability of
3/4to be picked
经典题目
382 Linked List Random Node 398 Random Pick Index
Last updated