Piico의 일상

Pairwise Independence and Universal Hash Functions: An Introduction 본문

수학 (Mathematics)/이산확률 (Discrete Probability)

Pairwise Independence and Universal Hash Functions: An Introduction

Piico 2025. 3. 31. 05:01

Today we talk about what Pairwise independence, mutual independence, k-wise independence is, and discuss how we can artificially create $2^b-1$ pairwise independent random variables using only $b$ coin flips.

 

Definitions: MI, PI, k-wise independence
Independence between every random variable is known as mutual independence. 
Recall that MI is:
\[
    \mathbb{P}[\bigcap_{i \in I} E_i] = \prod_{i \in I}\mathbb{P}[E_i]
\]
where $E_1, ..., E_n$ are a set of events for any subset $I \subseteq [1,n]$.

Mutual independence written in the form of r.v.s is:
\[
    \mathbb{P}[\bigcap_{i\in I} X_i = x_i] = \prod_{i \in I}\mathbb{P}[X_i = x_i]
\]

The simpler form we are familiar with is pairwise independence.
\[
    \mathbb{P}[(X_i=a) \cap (X_j=b)] = \mathbb{P}[X_i=a] \cdot \mathbb{P}[X_j=b]
\]

The more general k-wise independence is defined as:
\[
    \mathbb{P}[\bigcap_{i\in I} X_i = x_i] = \prod_{i \in I}\mathbb{P}[X_i = x_i]
\]
with the condition that $|I| \leq k$.


Construction of Pairwise independence bits
The key idea of pairwise indep. bits is that with only $b$ bits chosen u.a.r., we can create $2^b-1$ pairwise independent bits. This "expansion trick" can be used to reduce time and space complexity of programs that only require pairwise independence.

We first create an ordering of the bits ${1, 2, ..., b}$. These bits are r.v.s $X_1, ..., X_b$, so their values are not yet chosen, but we give an ordering and its usefulness will be apparent soon. Then, create a power set of excluding the empty set $\mathcal{P}(\{1, ..., b\})$ \ $\{\}$, let's call this set S, of which there are $2^b - 1$ subsets as elements: $S = \{S_1, ..., S_{2^b-1}\}$.

Now, we XOR ($\oplus$) all the bits corresponding to the indices in each subset. Set
\[
    Y_j = \bigoplus_{i \in S_j} X_i = \sum_{i \in S_j}X_i \text{ mod 2}, \forall_{j \in \{1, ..., 2^b - 1\}}
\]


Lemma: The $Y_j$ are pairwise independent uniform bits.
Proof: 
Consider the two cases:
Case 1: If the bit $Y_i$ and $Y_j$ don't share any indices (i.e. $S_i \cap S_j = \varnothing$), then they are pairwise independent since they don't depend on the same random bits $X$, and are uniform since each bit $X$ is chosen u.a.r.
Case 2: If they share indices in their corresponding subsets (i.e. $S_i \cap S_j \neq \varnothing$), we use the Principle of Deferred Decisions (PDD). WLOG suppose $S_i$ has an index, call $z$, not in $S_j$. Formally, let $z \in S_i \text{-} \{S_i \cap S_j\}$, then
\[
    Y_i = \left( \bigoplus_{k \in S_i \text{-} \{z\}} X_k \right) \oplus X_z
\]
By PDD, we fix all $X_k$ for $k \neq z$, then since $X_z$ is independent of $X_k$ for $k$ and uniform from these fixed bits, $Y_i$ is uniform.
Now for pairwise independence we WTS: for any distinct $i, j \in \{1,...,2^b-1\}$,
\[
    \mathbb{P}[(Y_i=a) \cap (Y_j=b)] = \mathbb{P}[Y_i=a] \cdot \mathbb{P}[Y_j=b]
\]
By definition of conditional probability and for any $a, b \in \{0, 1\}$,
\[
    \mathbb{P}[(Y_i=a) \cap (Y_j=b)] = \mathbb{P}[Y_i=a|Y_j=b] \cdot \mathbb{P}[Y_j=b]
\]

Continuing the WLOG assumption from above, since $z \in S_i$ and $z \notin S_j$, $X_z$ only affects $Y_i$ and not $Y_j$, so
$\mathbb{P}[Y_i=a|Y_j=b] =  \mathbb{P}[Y_i=a]$ and
\[
    \mathbb{P}[Y_i=a|Y_j=b] \cdot \mathbb{P}[Y_j=b] = \mathbb{P}[Y_i=a] \cdot \mathbb{P}[Y_j=b] = \frac{1}{4}
\]
The same can be said of $Y_j$, and therefore $Y_i, Y_j$ are pairwise independent.


In future discussions, we will see how pairwise independence can be used to derandomize algorithms as well how it is used in hash functions to resolve fundamental issues in the Perfect hashing model.

 

References:

- MU Ch 15.1
- user_194421 (https://math.stackexchange.com/users/351579/user-194421), Representing pairwise-independent but not independent occurrences with venn diagram, URL (version: 2016-07-16): https://math.stackexchange.com/q/1859220