Reservoir Sampling

Reservoir Sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Typically, n is large enough that the list doesn't fit into memory.

Essentials

Problem:

  • Choose k entries from n numbers. Make sure each number is selected with the probability of k/n

Basic idea:

  • Choose 1, 2, 3, ..., k first and put them into the reservoir.

  • For k+1 , pick it with a probability of k/(k+1) , and randomly replace a number in the reservoir.

  • For k+i , pick it with a probability of k/(k+i) , and randomly replace a number in the reservoir.

  • Repeat until k+i reaches n

Proof:

  • For k+i , the probability that it is selected and will replace a number in the reservoir is k/(k+i)

  • For a number in the reservoir before (let's say X), the probability that it keeps staying in the reservoir is

    • P(X was in the reservoir last time) x P(X is not replaced by k+i)

    • = P(X was in the reservoir last time) x (1 - P(k+i is selected and replaces X))

    • = k/(k+i-1) x (1 - k/(k+i) x 1/k)

    • = k/(k+i)

  • When k+i reaches n , the probability of each number staying in the reservoir is k/n

Example

  • Choose 3 numbers from [111, 222, 333, 444]. Make sure each number is selected with a probability of 3/4

  • First, choose [111, 222, 333] as the initial reservior.

  • Then choose 444 with a probability of 3/4

  • For 111 , it stays with a probability of

    • P(444 is not selected) + P(444 is selected but it replaces 222 or 333)

    • = 1/4 + 3/4 * 2/3

    • = 3/4

  • The same case with 222 and 333

  • Now all the numbers have the probability of 3/4 to be picked

经典题目

382 Linked List Random Node 398 Random Pick Index

class Solution {
    private ListNode head;
    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    public Solution(ListNode head) {
        this.head = head;
    }
    /** Returns a random node's value. */
    public int getRandom() {
        ListNode cur = head;
        int ans = cur.val;
        int i = 0;
        Random random = new Random();
        for (; cur != null; i++) {
            if (random.nextInt(i + 1) == i) {
                ans = cur.val;
            }
            cur = cur.next;
        }
        return ans;
    }
}

Last updated