Beta Phase: Square45 is currently in beta testing. Expect some features or content to be incomplete or missing.
45

Mathematical Statistics

The application of probability theory to statistics.

Sequence of Expressions

Statistical data collection is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling. The initial analysis of the data often follows the study protocol specified prior to the study being conducted. The data from a study can also be analyzed to consider secondary hypotheses inspired by the initial results, or to suggest new studies. A secondary analysis of the data from a planned study uses tools from data analysis, and the process of doing this is mathematical statistics. Data analysis is divided into: - descriptive statistics – the part of statistics that describes data, i.e. summarises the data and their typical properties. - inferential statistics – the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. using confidence intervals). While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data. For example, from natural experiments and observational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective. - ^Freedman, D.A. (2005) Statistical Models: Theory and Practice, Cambridge University Press. ISBN978-0-521-67105-7 - ^Freedman, David A. (2010). Collier, David; Sekhon, Jasjeet S.; Stark, Philp B. (eds.). Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Cambridge University Press. ISBN978-0-521-12390-7.
Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a complete probability space. Consider a statistical model defined by a parameter θΘRk\theta \in \Theta \subset \mathbb{R}^k, where Θ\Theta is the parameter space. We assume that an observed sample X=(X1,,Xn)X = (X_1, \dots, X_n) is drawn independently and identically distributed (i.i.d.) according to a probability measure PθP_{\theta} parameterized by θ\theta. The core of Mathematical Statistics is the rigorous development of inference procedures. Specifically, given the likelihood function L(θX)=i=1nf(Xiθ)L(\theta | X) = \prod_{i=1}^{n} f(X_i | \theta), the field provides the theoretical foundation for: \n\n1. **Estimation:** Constructing an estimator θ^:XΘ\hat{\theta}: \mathcal{X} \to \Theta that minimizes the expected risk R(θ,θ^)=E[L(θ,θ^)]R(\theta, \hat{\theta}) = E[L(\theta, \hat{\theta})]. For instance, the Maximum Likelihood Estimator (MLE) θ^MLE\hat{\theta}_{MLE} is defined by maximizing the log-likelihood function: \nθ^MLE=argmaxθΘlogL(θX)=argmaxθΘi=1nlogf(Xiθ)\hat{\theta}_{MLE} = \arg \max_{\theta \in \Theta} \log L(\theta | X) = \arg \max_{\theta \in \Theta} \sum_{i=1}^{n} \log f(X_i | \theta) \n\n2. **Hypothesis Testing:** Formulating a test statistic T(X)T(X) and a rejection region R\mathcal{R} such that the p-value p=P(T(X)tobsH0)p = P(T(X) \ge t_{obs} | H_0) allows for a decision regarding the null hypothesis H0:θΘ0H_0: \theta \in \Theta_0. This involves establishing asymptotic distributions (e.g., n(θ^θ0)dN(0,I1(θ0))\sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow{d} \mathcal{N}(0, I^{-1}(\theta_0))) and controlling the Type I error rate α\alpha.