Notebook originel: statisticalTools.ipynb

Common Statistical Tools🔗

Author: Yannick Copin y.copin@ipnl.in2p3.fr

Abstract: A restricted number of statistical tools are presented, with emphasis on details sometimes overlooked in literature. This may be too simple, or too technical, or whatever: comments and suggestions are welcome.

[1]:
# NB: this cell includes technical LaTeX definitions.

# $
# \renewcommand{\d}{\mathrm{d}}
# \renewcommand{\v}[1]{\boldsymbol{#1}} % Vector
# \newcommand{\m}[1]{\mathsf{#1}} % Matrix
# %\DeclareMathOperator{\Var}{Var}
# \DeclareMathOperator{\Cov}{Cov}
# %\DeclareMathOperator{\med}{med}
# $
[2]:
import numpy as np
from scipy import stats
from scipy import linalg
import matplotlib.pyplot as plt
[3]:
%matplotlib inline

Sample statistics🔗

We define two test datasets:

Case without intrinsic dispersion: \(\boldsymbol{x} = (x_1,\ldots, x_N)\) is a sample of \(N\) measurements of the same null quantity, with (usually estimated) gaussian measurement uncertainties \(\d x_{i}\) following a log-normal distribution (i.e., \(\log\d x^2\) follows a \(\mathcal{N}(0,1)\) distribution).

Case with intrinsic dispersion: \(\boldsymbol{y} = (y_1,\ldots, y_N)\) is an intrinsically dispersed \(\mathcal{N}(0,\varsigma^2)\) sample, measured with the same gaussian measurement uncertainties \(\d x_{i}\) as above. Unfortunately, the intrinsic dispersion \(\varsigma\) is usually not known a priori.

[4]:
n = 100

rng = np.random.default_rng(12345)              # Random generator

dx = np.sqrt(rng.lognormal(sigma=0.3, size=n))  # Log-normally distributed uncertainties
x = rng.normal(loc=0, scale=dx)                 # Pure gaussian errors
y = x + rng.normal(size=n)                      # Gaussian uncertainties + intrinsic dispersion ς=1
[5]:
fig, ax = plt.subplots()

ax.errorbar(np.arange(n), x, yerr=dx, fmt='bo', label='x: no intrinsic dispersion (ς = 0)')
ax.errorbar(np.arange(n) + 0.2, y, yerr=dx, fmt='rs', label='y: x + intrinsic dispersion ς > 0')
ax.set(xlabel="Experiment", ylabel="Value")
ax.legend();
../_images/Cours_statisticalTools_8_1.png

Maximum-likelihood estimators🔗

Sample location🔗

Arithmetic average:

\[\begin{split}\begin{align} \hat{\mu} &= \frac{1}{N}\sum_i x_i \\ \text{Var}(\hat{\mu}) &= \frac{\hat{\sigma}^{2}}{N} \end{align}\end{split}\]
[6]:
mu = np.mean(x)                       # Mean
dmu = np.std(x, ddof=1) / np.sqrt(n)  # Standard error on mean
print(f"Arithmetic mean: µ = {mu:.4f} ± {dmu:.4f}")
Arithmetic mean: µ = 0.0204 ± 0.1067

Weighted (« optimal ») average:

\[\begin{split}\begin{align} \hat{\mu}_w &= \frac{\sum_i w_i x_i}{\sum_i w_i} \quad\text{with}\quad w_i = \frac{1}{\d x_{i}^{2} + \varsigma^2} \\ \text{Var}(\hat{\mu}_w) &= \frac{1}{\sum_i w_i} \end{align}\end{split}\]
[7]:
muw, sumw = np.average(x, weights=dx ** -2, returned=True)  # "Optimally" (inverse-variance) weighted mean
dmuw = np.sqrt(1 / sumw)                                    # Uncertainty on weighted mean
print(f"x weighted mean:               µ = {muw:.4f} ± {dmuw:.4f}")
print(f"y weighted mean (wrong disp.): µ = {np.average(y, weights=dx ** -2):.4f} ± {dmuw:.4f}")
muyw, sumwy = np.average(y, weights=1 / (1 + dx ** 2), returned=True)
dmuyw = np.sqrt(1 / sumwy)
print(f"y weighted mean (true disp.):  µ = {muyw:.4f} ± {dmuyw:.4f}")
x weighted mean:               µ = 0.0455 ± 0.0976
y weighted mean (wrong disp.): µ = 0.0441 ± 0.0976
y weighted mean (true disp.):  µ = 0.0252 ± 0.1411

Attention:

  • In the optimally weighted mean, \(\sigma_i^2 = 1/w_{i}\) is supposed to be the total variance of normally-distributed \(x_i\), therefore including both measurement error \(\d x_i\) and intrinsic distribution dispersion \(\varsigma\) if any: \(\sigma_i^2 = \d x_i^2 + \varsigma^2\).

    If the weighting is not the true inverse variance, the weighted average is optimal (with minimal variance) but biased.

  • The reduced \(\chi^2\) around estimated mean, defined as \(\chi^2_\mu = 1/(N-1) \times \sum_i (x_i - \mu_w)^2 / \sigma_i^2\), should not be significantly different from one, if \(\sigma_i\) is compatible to the total (observed) dispersion.

    If this is not the case, \(\sigma_i\) has probably been misestimated, either because it ignored the intrinsic dispersion \(\varsigma\) or because the measurement uncertainties \(\d x_i\) are misestimated, or both.

[8]:
print(f"DoF: k = {n - 1}")
chi2 = np.sum(((x - muw) / dx) ** 2)
p = stats.distributions.chi2.sf(chi2, n - 1)
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"x (no disp.):    χ² = {chi2:6.2f}, p-value = {p:.3f} ({z:.1f}-σ)")
chi2 = np.sum(((y - np.average(y, weights=dx ** -2))/dx) ** 2)  # Wrong dispersion: sigma_i**2 = dx**2
p = stats.distributions.chi2.sf(chi2, n - 1)
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"y (wrong disp.): χ² = {chi2:6.2f}, p-value = {p:.3f} ({z:.1f}-σ)")
chi2 = np.sum(((y - muyw) / np.hypot(1, dx)) ** 2)              # Correct dispersion: sigma_i**2 = sigma**2 + dx**2
p = stats.distributions.chi2.sf(chi2, n - 1)
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"y (true disp.):  χ² = {chi2:6.2f}, p-value = {p:.3f} ({z:.1f}-σ)")
DoF: k = 99
x (no disp.):    χ² =  93.89, p-value = 0.626 (0.0-σ)
y (wrong disp.): χ² = 220.03, p-value = 0.000 (6.5-σ)
y (true disp.):  χ² = 109.42, p-value = 0.223 (0.8-σ)

Sample scale🔗

Unweighted: The supposedly unbiased sample variance wrt. unknown mean \(\mu\) (but see discussion in Oliphant, 2006) is:

\[\begin{split}\begin{align} \hat{\sigma}^2 &= \frac{1}{N-1} \sum_i (x_i - \hat{\mu})^2 \\ \text{Var}(\hat{\sigma}^2) &= \frac{1}{N}\left(m^4 - \frac{N-3}{N-1}\sigma^4\right) \\ &\simeq \frac{1}{N} \times 2\hat{\sigma}^4 \quad\text{in the normal approximation} \\ \text{therefore}\quad \text{Var}(\hat{\sigma}) &\simeq \frac{1}{2N}\hat{\sigma}^2 \\ \text{Ahn \& Fessler (2003) suggest:}\quad \text{Var}(\hat{\sigma}) &= \frac{1}{2(N-1)}\hat{\sigma}^2 \quad\text{pour}\quad N > 10. \end{align}\end{split}\]
[9]:
sig = np.std(x, ddof=1)
dsig = sig / np.sqrt(2 * (n-1))  # Normal approximation, n>10
print(f"Dispersion: σ = {sig:.4f} ± {dsig:.4f}")
Dispersion: σ = 1.0666 ± 0.0758

Weighted:

\[\hat{\sigma}_w^2 = \frac{\sum w_i}{\left(\sum w_i\right)^2 - \sum w_i^2} \times \sum_i w_i (x_i - \hat{\mu}_w)^2.\]
[10]:
w = dx ** -2
sumw = np.sum(w)
sigw = np.sqrt( np.sum(w * (x - muw) ** 2) / sumw )
print(f"x Weighted dispersion (biased):                σ = {sigw:.4f}")
sigw = np.sqrt( np.sum(w * (x - muw) ** 2) * sumw / (sumw ** 2 - np.dot(w, w)) )
print(f"x Weighted dispersion (unbiased):              σ = {sigw:.4f}")
sigyw = np.sqrt( np.sum(w * (y - np.average(y, weights=w)) ** 2) * sumw / (sumw ** 2 - np.dot(w, w)) )
print(f"y Weighted dispersion (wrong disp., unbiased): σ = {sigyw:.4f}")
wy = 1 / (1 + dx ** 2)  # Total dispersion (incl. intrinsic)
sigyw = np.sqrt( np.sum(wy * (y - muyw) ** 2) * sumwy / (sumwy ** 2 - np.dot(wy, wy)) )
print(f"y Weighted dispersion (true disp., unbiased):  σ = {sigyw:.4f}")
x Weighted dispersion (biased):                σ = 0.9461
x Weighted dispersion (unbiased):              σ = 0.9513
y Weighted dispersion (wrong disp., unbiased): σ = 1.4563
y Weighted dispersion (true disp., unbiased):  σ = 1.4838

Weighted RMS is usually defined as:

[11]:
def wrms(x, dx):
    w = 1/dx**2
    muw,sumw = np.average(x, weights=w, returned=True)
    return np.sqrt( np.sum(w*(x-muw)**2)/sumw )

wrmsx = wrms(x,dx)
print(f"WRMS = {wrmsx:.4f}")
WRMS = 0.9461

This strictly corresponds to the biased weighted variance. When using inverse-variance weighting \(w_i = 1/\sigma_i^2\), one has \(\mathrm{WRMS}^2 = \chi^2 \times \sigma_{\mu_w}^2\). Under the assumption that the intrinsic scatter is null (\(\varsigma\equiv 0\)), the WRMS can be used as a proxy for the rescaled uncertainty on the weighted mean when the errors \(\d x_i\) were misestimated by a multiplicative factor:

\[\sigma_{\mu_w}^{*2} = \chi^2_\nu \times \sigma_{\mu_w}^2 = \mathrm{WRMS}^2/(N-1).\]
[12]:
muw, sumw = np.average(x, weights=(0.1*dx) ** -2, returned=True)         # Errors are wrong by a factor 10
dmuw = np.sqrt(1 / sumw)
print(f"x Weighted mean (wrong errors):    µ = {muw:.4f} ± {dmuw:.4f}")  # Wrong weighted mean uncertainty
dmur = wrms(x, 0.1 * dx) / np.sqrt(n - 1)
print(f"x Weighted mean (rescaled errors): µ = {muw:.4f} ± {dmur:.4f}")  # Rescaled weighted mean uncertainty
x Weighted mean (wrong errors):    µ = 0.0455 ± 0.0098
x Weighted mean (rescaled errors): µ = 0.0455 ± 0.0951

Note: the weighted mean \(\hat{\mu}_{w}\) is unaffected by a scaling on the errors \(\d x_{i}\).

Bayesian estimates🔗

They are studied in Oliphant, 2006, and implemented in scipy.stats.mvsdist (bayesian estimate of the sample mean, variance and standard deviation PDFs) or scipy.stats.bayes_mvs (bayesian confidence intervals).

[13]:
(mu, (lmu, hmu)), (var, (lvar, hvar)), (sig, (lsig, hsig)) = stats.bayes_mvs(x, alpha=0.95)  # 95%
print("Bayesian estimates: µ = %.4f +%.4f -%.4f (95%%)" % (mu, hmu-mu, mu-lmu))
print("Bayesian estimates: σ = %.4f +%.4f -%.4f (95%%)" % (sig, hsig-sig, sig-lsig))
Bayesian estimates: µ = 0.0204 +0.2116 -0.2116 (95%)
Bayesian estimates: σ = 1.0747 +0.1643 -0.1383 (95%)

Robust estimators🔗

Median-based estimators are robust (50% of the points can be wrong), but have a low accuracy. More pertinent estimators can be defined.

Location🔗

The variance of the median \(m\) for large samples and normal distributions is:

\[\text{Var}(m) = \frac{\pi}{2}\frac{s^{2}}{n},\]

where \(s\) is an (presumably robust) estimate of the scale.

Scale🔗

Median Absolute Deviation:

\[\mathrm{MAD} = \text{med}(| x - \text{med}(x) |)\]

Inter-Quartile Range: this is the 25%-trimmed range, i.e. the difference between the 75th- and the 25th-percentiles of a sample.

Both estimators have a optimal breakpoint of 0.5: up to half the points of the sample can be arbitrarily large before the estimator gives an arbitrarily large estimate. (By comparison, the sample standard deviation has a breakpoint of 0.)

For a normal distribution, these are related to standard deviation \(\sigma\) by:

\[\begin{split}\begin{align} \hat{\sigma} &= \frac{1}{\Phi^{-1}(3/4)}\mathrm{MAD} \simeq 1.4826\times\mathrm{MAD} \\ % 1/stats.norm.ppf(0.75) = 1.4826 &= \frac{1}{2\Phi^{-1}(3/4)} \mathrm{IQR} \simeq 0.7413\times\mathrm{IQR} % 2*sqrt(2)*scipy.special.erfinv(0.5) = 2*stats.norm.ppf(0.75) = 1.349 \end{align}\end{split}\]

where \(\Phi\) is the standard normal cumulative distribution function (scipy.stats.norm.cdf), and its inverse \(\Phi^{-1}\) is known as the normal quantile function, or probit function (scipy.stats.norm.ppf). Note that MAD is biased for low number of points (see Croux & Rousseeuw, 1992).

Unfortunately, \(\mathrm{MAD}\) is only 37% as efficient as the sample standard deviation.

[14]:
med = np.median(x)
nmad = stats.median_abs_deviation(x, scale='normal') # Normalized Median Absolute Deviation (for normal distributions)
dmed = nmad / np.sqrt(2 * n / np.pi)                 # For large normal samples
print(f"Median: {med:.4f} ± {dmed:.4f}")
Median: 0.0599 ± 0.1182

More robust estimators of scale with maximal breakdown point are presented in Croux and Rousseeuw (1992).

Absolute Pairwise Differences: defined in Rousseeuw and Croux (1993)

\[\begin{split}\begin{align} S_n &= 1.1926 \text{med}_i \left( \text{med}_j(\left| x_i - x_j \right|) \right), \\ Q_n &= c_n \text{first quartile of} \left( \left| x_i - x_j \right| : i < j \right), \end{align}\end{split}\]

Covariance matters🔗

\(\chi^2\) minimisation🔗

Consider \(\boldsymbol{y}\) a column-vector of \(N\) measurements \(y_i\), each of which assumed to be Gaussian distributed with a mean \(F_i(\boldsymbol{\theta})\) (hereafter the model \(\boldsymbol{F}(\boldsymbol{\theta})\), depending on \(M\) parameters \(\theta_k\)). The maximum-likelyhood estimate \(\hat{\boldsymbol{\theta}}\) of set of parameters \(\boldsymbol{\theta}\) is the one minimizing the \(\chi^2\) defined by:

\[\chi^2(\boldsymbol{\theta}) = (\boldsymbol{y} - \boldsymbol{F}(\boldsymbol{\theta}))^T \cdot \m{V}^{-1} \cdot (\boldsymbol{y} - \boldsymbol{F}(\boldsymbol{\theta})),\]

where \(\m{V}\) is the (supposedly known) covariance matrix between the measurements: \(V_{ij} = \text{Cov}(y_i, y_j)\).

In that case, \(\chi^2(\boldsymbol{\theta})\) follows a \(\chi^2\)-distribution with \(k=N-M\) degrees of freedom. In particular:

  • The goodness-of-fit (level of agreement between the measurements and the model) can be assessed from the minimum value \(\chi^2_\min = \chi^2(\hat{\boldsymbol{\theta}})\).

  • \(\chi^2 = \chi^2_\min + 1\) defines the \(1\,\sigma\)-isocontours around the maximum-likelyhood estimate \(\hat{\boldsymbol{\theta}}\). Using a Taylor-series quadratic development of \(\chi^2(\boldsymbol{\theta})\) around its minimum, one gets an approximation of the best-fit paramater covariance matrix:

    \[\Cov(\hat{\boldsymbol{\theta}}) \simeq 2\,\m{H}^{-1}\]

    where \(\m{H}\) the Hessian matrix of the \(\chi^2\) estimated at \(\hat{\boldsymbol{\theta}}\):

    \[\begin{split}\begin{align} H_{ij} &= \left.\frac{\partial^2\chi^2}{\partial\theta_i\partial\theta_j}\right|_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}} \\ &= 2 \left.\frac{\partial\boldsymbol{F}}{\partial\theta_i}\right|_{\hat{\boldsymbol{\theta}}}^T \cdot \m{V}^{-1} \cdot \left.\frac{\partial\boldsymbol{F}}{\partial\theta_j}\right|_{\hat{\boldsymbol{\theta}}}. \end{align}\end{split}\]

If the measurements are uncorrelated, the covariance matrix \(\m{V}\) is purely diagonal, with \(V_{ii} = \sigma_i^2\) the (supposedly known) variance of measurement \(y_i\), and the previous equations can be rewritten:

\[\begin{split}\begin{align} \chi^2(\boldsymbol{\theta}) &= \sum_{i=1}^N \frac{(y_i - F_i(\boldsymbol{\theta}))^2}{\sigma_i^2}, \\ H_{ij} &= 2 \sum_{k=1}^N \frac{1}{\sigma_k^2} \frac{\partial F_k}{\partial\theta_i}\frac{\partial F_k}{\partial\theta_j}. \end{align}\end{split}\]

Example: weighted mean🔗

We define two datasets:

  • an uncorrelated repeated measurement sample \(\boldsymbol{x}\) (\(N=100\)) of a null-quantity with log-normal measurement uncertainties,

  • a correlated sample \(\boldsymbol{y}\), derived from \(\boldsymbol{x}\) with an exponential correlation length of \(\tau = 5\) px and same measurement uncertainties.

[15]:
def vec2corr(vec):
    """Define n×n correlation matrix from *vec* of length n, such that
    corr[0,:] = corr[:,0] = vec."""

    n = len(vec)
    tmp = np.concatenate((vec[:0:-1],vec))
    corr = np.array([ tmp[n-(i+1):2*n-(i+1)] for i in range(n) ])

    return corr
[16]:
n = 100
tau = 5. # Exponential correlation length

rng = np.random.default_rng(12345)

sig = np.sqrt(rng.lognormal(sigma=0.3, size=n)) # Log-normally distributed uncertainties
cor = vec2corr(np.exp(-np.arange(n)/tau)) # Correlation matrix
cov = cor * np.outer(sig, sig)           # Covariance matrix
x = rng.normal(loc=0, scale=sig) # Uncorrelated gaussian measurements
y = np.dot(linalg.sqrtm(cor), x)           # Correlated gaussian measurements
[17]:
fig, ax = plt.subplots()
ax.errorbar(np.arange(n), x, yerr=sig, fmt='b.', label='x: uncorrelated')
ax.errorbar(np.arange(n)+0.2, y, yerr=sig, fmt='rs', label='y: correlated')
ax.legend()
[17]:
<matplotlib.legend.Legend object at 0x7fdd853a1c90>
../_images/Cours_statisticalTools_41_3.png

The goodness-of-fit with respect to constant value \(\mu = 0\) is given by:

[18]:
icov = linalg.pinvh(cov)

def chi2(mu, x):
    """Estimate chi2 *without* accounting for covariance."""
    return np.sum(((x - mu)/sig)**2)

def chi2Cov(mu, y):
    """Estimate chi2 accounting for covariance."""
    return np.dot(y - mu, np.dot(icov, y - mu))

print(f"DoF: k={n}")
c = chi2(0, x)                         # χ²
p = stats.distributions.chi2.sf(c, n)  # p-value
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"x (uncorrelated):         χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")
c = chi2(0, y)                   # No account for covariance
p = stats.distributions.chi2.sf(c, n)           # p-value
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"y (correlated, w/o cov.): χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")
c = chi2Cov(0, y)                # Account of covariance
p = stats.distributions.chi2.sf(c, n)           # p-value
z = max(0, stats.distributions.norm.isf(p))    # z-value
print(f"y (correlated, w/ cov.):  χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")
DoF: k=100
x (uncorrelated):         χ² =  94.10, p-value=0.647 (+0.0-σ)
y (correlated, w/o cov.): χ² = 115.53, p-value=0.137 (+1.1-σ)
y (correlated, w/ cov.):  χ² =  89.93, p-value=0.755 (+0.0-σ)

In this particular case, the fact of disregarding the covariance for the correlated dataset leads to a spurious 2.4-sigma tension with the true average \(\mu = 0\).

The objective is to derive the maximum-likelihood estimate of the average values, the model is simply the 1-parameter function \(\boldsymbol{F} = (\mu, \ldots, \mu)\). In presence of a covariance matrix \(\m{V}\), one has then \(\sigma_{\hat{\mu}}^2 = \left(\sum_{ij} (\m{V}^{-1})_{ij}\right)^{-1}\). For uncorrelated measurements, this gives \(\sigma_{\hat{\mu}}^2 = \left(\sum_{i} 1/\sigma_i^2\right)^{-1}\) (uncertainty on variance weighted mean, see above).

[19]:
mus = np.linspace(-1, 1, 1000)
print(f"DoF: k={n - 1}")

def p2z(p):
    """Express the input one-sided *p*-value as a sigma equivalent
    significance from a normal distribution (the so-called *z*-value)."""

    return np.where(p >= 0.5, 0, stats.distributions.norm.isf(p))

xchi2s = [ chi2(mu, x) for mu in mus ]
iminx = np.argmin(xchi2s) # Poor man's minimization!
mux = mus[iminx]
dmux = np.sqrt(1/np.sum(1/sig**2))
print("x ML mean:           µ = %+.4f ± %.4f (%.1f-sigma)" % (mux, dmux, np.abs(mux)/dmux))
c = chi2(mux, x)                 # χ²
p = stats.distributions.chi2.sf(c, n-1)         # p-value
z = p2z(p)                       # z-value
print(f"                     χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")

ychi2s = [ chi2(mu, y) for mu in mus ]
iminy = np.argmin(ychi2s) # Poor man's minimization!
muy = mus[iminy]
dmuy = np.sqrt(1/np.sum(1/sig**2))
print("y ML mean (w/o cov): µ = %+.4f ± %.4f (%.1f-sigma)" % (muy, dmuy, np.abs(muy)/dmuy))
c = chi2(muy, y)                 # No account for covariance
p = stats.distributions.chi2.sf(c, n-1)         # p-value
z = p2z(p)                       # z-value
print(f"                     χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")

ychi2Covs = [ chi2Cov(mu, y) for mu in mus ]
iminyCov = np.argmin(ychi2Covs)  # Poor man's minimization!
muyCov = mus[iminyCov]
dmuyCov = np.sqrt(1/np.sum(icov))  # Simple expression of dµ derived from model
print("y ML mean (w/ cov):  µ = %+.4f ± %.4f (%.1f-sigma)" % (muyCov, dmuyCov, np.abs(muyCov)/dmuyCov))
c = chi2Cov(muyCov, y)           # Account of covariance
p = stats.distributions.chi2.sf(c, n-1)         # p-value
z = p2z(p)                       # z-value
print(f"                     χ² = {c:6.2f}, p-value={p:.3f} ({z:+.1f}-σ)")
DoF: k=99
x ML mean:           µ = +0.0450 ± 0.0976 (0.5-sigma)
                     χ² =  93.89, p-value=0.626 (+0.0-σ)
y ML mean (w/o cov): µ = +0.1031 ± 0.0976 (1.1-sigma)
                     χ² = 114.40, p-value=0.138 (+1.1-σ)
y ML mean (w/ cov):  µ = +0.2112 ± 0.2146 (1.0-sigma)
                     χ² =  88.96, p-value=0.755 (+0.0-σ)

The effect of neglecting the covariance in correlated data is even stronger on ML-estimate of mean: the estimated value is more than 7-sigma away from the true value, with an unrealistically good \(\chi^2_\min\).

[20]:
fig, ax = plt.subplots()

ax.plot(mus, xchi2s, 'g:')
ax.plot(mus, ychi2s, 'r--')
ax.plot(mus, ychi2Covs, 'b-')
ax.errorbar([mux],[xchi2s[iminx]], xerr=dmux, fmt='g^', label='UNcorrelated')
ax.errorbar([muy],[ychi2s[iminy]], xerr=dmuy, fmt='rs', label='Correlated (w/o cov)')
ax.errorbar([muyCov],[ychi2Covs[iminyCov]], xerr=dmuyCov, fmt='bo', label='Correlated (w/ cov)')
ax.set(xlabel="µ", ylabel="χ²")
ax.legend(numpoints=1)
ax.grid()
../_images/Cours_statisticalTools_47_1.png

Propagation of uncertainties🔗

Consider a set of \(N\) quantities (measurements, parameter estimates, etc.) \(\boldsymbol{y} = (y_1, \ldots y_N)\) associated to a covariance matrix \(V_{ij} = \Cov(y_i, y_j)\) (estimated from measurement or adjustment procedures), and a set of \(M\) functions \(\boldsymbol{f} = (f_1, \ldots, f_M)\). The covariance matrix among the propagated values \(\boldsymbol{f}(\boldsymbol{y})\) can be estimated from a Taylor expansion around \(\boldsymbol{y}\). To first order, one has:

\[\Cov(\boldsymbol{f}(\boldsymbol{y})) \simeq \m{J}\cdot\m{V}\cdot{\m{J}}^T\]

where the matrix of derivative (« jacobian ») is given by: \(J_{ij} = \left.\frac{\partial f_i}{\partial y_j}\right|_{\boldsymbol{y}}\).

This 1st-order expression is exact if \(\boldsymbol{f}\) is linear: \(\boldsymbol{f}(\boldsymbol{y}) = \m{J}\cdot\boldsymbol{y}\). If this is not the case, one has to make sure the functions are reasonably linear over the domain \(\boldsymbol{y} \pm \boldsymbol{\sigma}_{\boldsymbol{y}}\), or expand the Taylor series to higher orders.

The 1st-order expression be rewritten in a non-matricial form:

\[\Cov(f_i, f_j) \simeq \sum_{k,l = 1}^{N} \frac{\partial f_i}{\partial y_k}\frac{\partial f_j}{\partial y_l} V_{kl}.\]

In the case of uncorrelated input quantities, \(V_{ij} = \sigma_i^2 \delta_{ij}\) and one recovers the traditional uncertainty propagation expression:

\[\sigma_{f_i}^2 = \sum_{k=1}^{N} \left(\frac{\partial f_i}{\partial y_k}\right)^2 \sigma_k^2.\]

Example: synthetic photometry🔗

Consider a spectrum made of \(N\) flux measurements \(\boldsymbol{y} = (y_1, \ldots y_N)\) associated to a covariance matrix \(V_{ij} = \Cov(y_i, y_j)\). One wants to compute the synthetic photometry (i.e. integrated flux) in bands \(A\) and \(B\),

\[\begin{split}\begin{align} A &= \sum_i \alpha_i y_i \\ B &= \sum_i \beta_i y_i \end{align}\end{split}\]

where the fixed values of \(\alpha\) and \(\beta\) coefficients define the filter bandpass, as well as resulting variances and covariances.

In this simple linear case, fluctuations \(\delta A\) are directly related to flux fluctuations \(\delta y_i\):

\[\delta A = \sum_i \alpha_i \delta y_i\]

and therefore

\[\begin{split}\begin{align} \sigma_A^2 &= \langle \delta^2 A \rangle \\ &= \sum_{i,j} \alpha_i \alpha_j \langle \delta y_i \delta y_j \rangle \\ &= \boldsymbol{\alpha} \cdot \m{V} \cdot \boldsymbol{\alpha}^T \\ \Cov(A, B) &= \langle \delta A \delta B \rangle \\ &= \sum_{i,j} \alpha_i \beta_j \langle \delta y_i \delta y_j \rangle \\ &= \boldsymbol{\alpha} \cdot \m{V} \cdot \boldsymbol{\beta}^T \end{align}\end{split}\]

References🔗

Besides standard statistics handbooks,

Cette page a été générée à partir de statisticalTools.ipynb.