The whole point of sampling a pixel is to estimate the pixel's average colour, $C$. A sampling method is an estimator for $C$ and the value that it produces is an estimate of $C$.
For example, the "uniform random" estimator samples at $n$ uniformly randomly chosen positions, $x_1, x_2, \ldots, x_n$ within $P$ and takes the average:
$\widehat{C} = {1 \over n} \sum_{i=1}^n C(x_i)$
The mean and variance of an estimator are important:
The colour, $C$, is a random variable. It can take on values in ${C_1, C_2, \ldots, C_n}$. Let $P(C_i)$ be the probability that a uniform random sample in the pixel produces the colour $C_i$.
The mean, or "expectation", of $C$ is the weighted-average value
$E(C) = \sum_i P(C_i) \; C_i$
and the variance of $C$ is the weighted-average squared difference from the mean
$\begin{array}{rl} V(C) & = E( (C - E(C))^2 ) \\ & = \sum_i P(C_i) \; (C_i - E(C))^2 \end{array}$
In this analysis, we'll use the example of a pixel on the edge of an object. The pixel is half black ($C_0 = 0$) and half white ($C_1 = 1$). Then $P(C_0) = P(C_1) = {1 \over 2}$. The mean colour is $\mu = {1 \over 2}$.
Consider an estimator that makes a single sample at a uniformly randomly chosen position, $x_1$:
$\widehat{C} = C(x_1)$
$\widehat{C}$ can take on values 0 or 1.
The expected value of this estimator is
$\begin{array}{rl} E(\widehat{C}) & = \sum_i P(C_i) \; C_i \\ & = {1 \over 2} C_0 + {1 \over 2} C_1 \\ & = {1 \over 2} 0 + {1 \over 2} 1 \\ & = {1 \over 2} \end{array}$
Since $E(\widehat{C}) = \mu$, this estimator is unbiased.
The variance of this estimator is the average squared error of the estimator:
$\begin{array}{rl} V(\widehat{C}) & = \sum_i P(C_i) \; (C_i - \mu)^2 \\ & = {1 \over 2} (C_0 - \mu)^2 + {1 \over 2} (C_1 - \mu)^2 \\ & = {1 \over 2} (0 - {1 \over 2})^2 + {1 \over 2} (1 - {1 \over 2})^2 \\ & = {1 \over 4} \end{array}$
Since $V(\widehat{C})$ is the average squared error of the estimate, the average error of the estimate will be $\sqrt{V(\widehat{C})} = {1 \over 2}$. That is, on average, the estimate will be ${1 \over 2 }$ away from the mean.
Consider an estimator that computes the average of $n$ samples:
$\widehat{C} = {1 \over n} \sum_{i=1}^n C(x_i)$
The expected value of this estimator is
$\begin{array}{rl} E(\widehat{C}) & = E( {1 \over n} \sum_{i=1}^n C(x_i) ) \\ & = {1 \over n} \sum_{i=1}^n E(C(x_i)) \qquad \textrm{by "linearity of expectation"}\\ & = E( C(x_i) ) \\ & = \mu \qquad \textrm{from above} \end{array}$
Since $E(\widehat{C}) = \mu$, this estimator is unbiased.
The variance of this estimator is:
$\begin{array}{rl} V(\widehat{C}) & = V( {1 \over n} \sum_{i=1}^n C(x_i) ) \\ & = {1 \over n^2} V( \sum_{i=1}^n C(x_i) ) \qquad \textrm{since } V(kC) = k^2 V(C) \\ & = {1 \over n^2} \sum_{i=1}^n V( C(x_i) ) \qquad \textrm{by the Bienaymé formula} \\ & = {1 \over n} V( C(x_i) ) \\ \end{array}$
So the variance is reduced by $n$ with $n$ samples.
But the average error (i.e. the standard deviation = the square root of the variance) is reduced by $\sqrt{n}$ with $n$ samples.
So taking 9 samples reduces the average error by only a factor of 3, from 0.5 to 0.17.
Suppose the that $n$ samples are taken from $n$ different regions (or "strata") having different variances. In the example of a pixel on the edge of an object, some regions would have zero variance.
Given $n$ regions, $r_1, r_2, \ldots, r_n$ with corresponding variances $v_1, v_2, \ldots, v_n$, the estimator is
$\widehat{C} = {1 \over n} \sum_{i=1}^n C(x_i) \quad \textrm{for } x_i \textrm{ restricted to region } r_i$
and the variance of this estimator is
$\begin{array}{rl} V(\widehat{C}) & = {1 \over n^2} \sum_{i=1}^n v_i \end{array}$
In the example, suppose we make 9 regions and sample randomly in each region. Given the average value in each region shown below
the variances in each region can be calculated as
For example, sampling in the upper-left corner will always give a value of 1.0, so the variance of the upper-left corner is 0.0. Sampling in the upper-right corner will give a value of value of 0.0 80% of the time, and a value of 1.0 20% of the time. So the variance in the upper-right corner is
$\begin{array}{rl} V(C) & = \sum_i P(C_i) \; (C_i - \mu)^2 \\ & = P(C_0) \; (C_0 - \mu)^2 + P(C_1) \; (C_1 - \mu)^2 \\ & = 0.8 \; (0.0 - 0.2)^2 + 0.2 \; (1.0 - 0.2)^2 \\ & = 0.16 \\ \end{array}$
Using the formula for $V(\widehat{C})$ above, the variance of the 9-sample stratified estimator is 0.0093, which corresponds to an average error of 0.096 (= $\small \sqrt{0.0093}$).
This is much better than the $0.17$ average error of the 9-sample uniform random estimator in this particular example.
This occurs because the strata, which are the separate sampling regions, each has a variance lower than the variance over the whole pixel.
If the strata align with the features in the pixel (e.g. if they align with the edge crossing the pixel) such that the variance is zero within each stratum, the overall variance of the estimator will be zero!
So use strata that are aligned with sharp changes in variance. But you have to know or guess where those sharp changes are.