辞别 發表於 2025-5-26 10:07:00

机器学习 概率统计基础 随机变量部分

<h2 id="第零章-积分">第零章 积分</h2>
<ul>
<li>
<p><strong>变上限积分</strong>:设积分形式为<span class="math inline">\(\boxed{I(x)=\int_{v(x)}^{u(x)}f(t,x)\text{d}t}\)</span>,则对<span class="math inline">\(I(x)\)</span>求导得:</p>
<p></p><div class="math display">\[\boxed{\frac{\text{d}I}{\text{d}x} = f(v(x), x) \cdot v'(x) - f(u(x), x) \cdot u'(x) + \int_{u(x)}^{v(x)} \frac{\partial f}{\partial x}(t, x) \text{d}t}
\]</div><p></p></li>
<li>
<p><strong>二重积分</strong>:<span class="math inline">\(\boxed{\iint_Df(x,y)\text{d}\sigma=\int_a^b\left[\int_{\phi_1(x)}^{\phi_2(x)}f(x,y)\text{d}y\right]\text{d}x}\)</span>,</p>
<ul>
<li><strong>体积几何意义</strong>:以<span class="math inline">\(D\)</span>为底面,<span class="math inline">\(f(x,y)\)</span>为顶面的曲顶柱体的体积。</li>
<li><strong>质量几何意义</strong>:以<span class="math inline">\(D\)</span>为面,<span class="math inline">\(f(x,y)\)</span>为面密度的质量。</li>
<li><strong>口诀</strong>:
<ul>
<li><strong>后积先定常数限</strong>:先找常数限(如<span class="math inline">\(a\leq x\leq b\)</span>),然后确定后积<span class="math inline">\(\text{d}x\)</span>,先积<span class="math inline">\(\text{d}y\)</span>,然后对每个固定的<span class="math inline">\(x\)</span>,写出内层变量<span class="math inline">\(y\)</span>的积分范围<span class="math inline">\(\phi_1(x)\leq y \leq \phi_2(x)\)</span>,最后先写后积的<span class="math inline">\(\int_a^b\text{d}x\)</span>,再写先积<span class="math inline">\(\int_{\phi_1(x)}^{\phi_2(x)}f(x,y)\text{d}y\)</span>。</li>
<li><strong>限内画先积直线</strong>:比如区域<span class="math inline">\(D\)</span>的两侧都是<span class="math inline">\(x=a\)</span>、<span class="math inline">\(x=b\)</span>这种形式,就在区域中间从下往上画条竖线。</li>
<li><strong>先交写下限</strong>:<span class="math inline">\(y=\phi_1(x)\)</span>写在下限。</li>
<li><strong>后交写上限</strong>:<span class="math inline">\(y=\phi_2(x)\)</span>写在下限。</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="第一章-随机事件的概率">第一章 随机事件的概率</h2>
<h3 id="一-随机试验与随机事件">一 随机试验与随机事件</h3>
<ul>
<li>
<p><strong>试验</strong>:对某种特性的观察。</p>
</li>
<li>
<p><strong>随机试验</strong>:满足以下三个条件的试验,记作试验<span class="math inline">\(E\)</span>:</p>
<ul>
<li><strong>可重复性</strong>:在相同条件下可重复进行。</li>
<li><strong>可预知性</strong>:每次试验结果不止一个,但所有可能结果已知。</li>
<li><strong>不确定性</strong>:每次试验结果不确定。</li>
</ul>
</li>
<li>
<p><strong>样本空间</strong>:试验<span class="math inline">\(E\)</span>的全部基本事件组成的集合,记作<span class="math inline">\(\Omega\)</span>。</p>
</li>
<li>
<p><strong>样本点</strong>:样本空间的元素。</p>
</li>
<li>
<p><strong>随机事件</strong>:对随机试验的观察中,试验的结果,记作<span class="math inline">\(A_1\)</span>、<span class="math inline">\(A_2\)</span>等。</p>
</li>
<li>
<p><strong>基本事件</strong>:随机试验每一个不可再分的结果,记作<span class="math inline">\(a\)</span>、<span class="math inline">\(b\)</span>等。</p>
</li>
<li>
<p><strong>必然事件与不可能事件</strong>:必然会发生的事件是必然事件。</p>
<ul>
<li>注意:概率为1的事件不一定是必然事件。</li>
</ul>
</li>
</ul>
<h3 id="二-随机事件的运算">二 随机事件的运算</h3>
<ul>
<li>
<p><strong>随机事件的运算</strong>:</p>
<ul>
<li>
<p><strong>包含</strong>: <span class="math inline">\(A \subset B\)</span></p>
</li>
<li>
<p><strong>和事件</strong>: <span class="math inline">\(A + B\)</span></p>
</li>
<li>
<p><strong>差事件</strong>: <span class="math inline">\(A - B\)</span></p>
</li>
<li>
<p><strong>积事件</strong>: <span class="math inline">\(AB\)</span></p>
</li>
<li>
<p><strong>事件<span class="math inline">\(A_1, A_2\)</span>互不相容/互斥</strong>: <span class="math inline">\(A_1A_2 = \emptyset\)</span></p>
</li>
<li>
<p><strong>事件<span class="math inline">\(A_1, A_2, \cdots, A_n\)</span>互不相容</strong>: <span class="math inline">\(A_iA_j = \emptyset(i\neq j)\)</span></p>
</li>
<li>
<p><strong>事件<span class="math inline">\(A_1, A_2\)</span>对立</strong>: <span class="math inline">\(A_1+A_2 = \Omega\)</span>、<span class="math inline">\(A_1=\overline{A_2}\)</span></p>
</li>
<li>
<p><strong>交换律、结合律、分配律、德摩根公式</strong></p>
</li>
</ul>
</li>
<li>
<p><strong>古典概型</strong>:记试验<span class="math inline">\(E\)</span>:<span class="math inline">\(\Omega=\{e_1,e_2,\cdots,e_n\}\)</span>,且<strong>有限个基本事件等可能发生</strong>,则<span class="math inline">\(P(A)=\dfrac{事件A包含基本事件个数}{基本事件总数n}\)</span>。</p>
<ul>
<li><strong>有界性</strong>:<span class="math inline">\(0 ≤ P(A) ≤ 1\)</span></li>
<li><strong>规范性</strong>: <span class="math inline">\(P(\Omega)=1\)</span>、 <span class="math inline">\(P(\emptyset)=0\)</span></li>
<li><strong>单调性</strong>: 若 <span class="math inline">\(A \subset B\)</span>,则<span class="math inline">\(P(A) ≤ P(B)\)</span></li>
<li><strong>有限可加性</strong>: 若<span class="math inline">\(A_1, A_2, \cdots, A_n\)</span>两两互斥,则<span class="math inline">\(P(A_1 + A_2 + \cdots + A_n) = P(A_1) + P(A_2) + \cdots + P(A_n)\)</span></li>
</ul>
</li>
<li>
<p><strong>推论</strong>:</p>
<ul>
<li><strong>加法公式</strong>:<span class="math inline">\(P(A+B)=P(A)+P(B)-P(AB)\)</span></li>
<li><strong>减法公式</strong>:<span class="math inline">\(P(A-B)=P(A\overline{B})\)</span></li>
<li><strong>对立事件概率</strong>:<span class="math inline">\(P(\overline{A})=1-P(A)\)</span></li>
</ul>
</li>
</ul>
<h3 id="三-条件概率与全概率公式">三 条件概率与全概率公式</h3>
<ul>
<li>
<p><strong>条件概率</strong>:<span class="math inline">\(P(A|B)=\dfrac{P(AB)}{P(B)}\)</span>。</p>
<ul>
<li><strong>加法公式</strong>:<span class="math inline">\(P(A+B|C)=P(A|C)+P(B|C)-P(AB|C)\)</span></li>
</ul>
</li>
<li>
<p><strong>乘法公式</strong>:若<span class="math inline">\(P(B)&gt;0\)</span>,则<span class="math inline">\(P(AB)=P(A|B)P(B)\)</span>。</p>
</li>
<li>
<p><strong>事件的独立性</strong>:若<span class="math inline">\(P(AB)=P(A)P(B)\)</span>,则事件<span class="math inline">\(A\)</span>与<span class="math inline">\(B\)</span>独立。</p>
<ul>
<li>
<p>设<span class="math inline">\(P(B)&gt;0\)</span>,则<span class="math inline">\(A, B\)</span>独立<span class="math inline">\(\iff P(A|B)=P(A)\)</span></p>
</li>
<li>
<p>设事件<span class="math inline">\(A_1,A_2,\cdots,A_n\)</span>相互独立,则:</p>
<p></p><div class="math display">\[\begin{array}{l} P(A_1+A_2+\cdots+A_n) &amp; = &amp; 1-P(\overline{A_1+A_2+\cdots+A_n}) \\ &amp; = &amp; 1-P(\overline{A_1}\space\overline{A_2}\cdots\overline{A_n}) \\ &amp; = &amp; 1-P(\overline{A_1})\cdot P(\overline{A_2})\cdots P(\overline{A_n}) \end{array}
\]</div><p></p></li>
</ul>
</li>
<li>
<p><strong>全概率公式</strong>:设<span class="math inline">\(B_1, B_2,\cdots ,B_n\)</span>为<span class="math inline">\(\Omega\)</span>的一个完整事件组,且<span class="math inline">\(P(B_i)&gt;0(i=1,2,\cdots,n)\)</span>,则:<span class="math inline">\(P(A)=\sum_{i=1}^nP(A|B_i)P(B_i)\)</span>。</p>
<ul>
<li><strong>完备事件组</strong>:①<span class="math inline">\(B_i, B_j\)</span>两两互斥,②<span class="math inline">\(B_1+B_2+\cdots+B_n=\Omega\)</span>。</li>
</ul>
</li>
<li>
<p><strong>贝叶斯公式</strong>:设<span class="math inline">\(B_1, B_2,\cdots ,B_n\)</span>为<span class="math inline">\(\Omega\)</span>的一个完整事件组,且<span class="math inline">\(P(B_i)&gt;0(i=1,2,\cdots,n)\)</span>,则对任意<span class="math inline">\(P(A)&gt;0\)</span>的事件:<span class="math inline">\(P(B_i|A)=\dfrac{P(AB_i)}{P(A)}=\dfrac{P(A|B_i)P(B_i)}{\sum_{i=1}^nP(A|B_i)P(B_i)}\)</span>。</p>
</li>
</ul>
<h2 id="第二章-一维随机变量">第二章 一维随机变量</h2>
<h3 id="一-随机变量与分布函数">一 随机变量与分布函数</h3>
<ul>
<li>
<p><strong>集合族</strong>:幂集的子集,可以理解为“集合的集合”。</p>
</li>
<li>
<p><strong>事件域</strong>:设集合族<span class="math inline">\(F\)</span>是样本空间<span class="math inline">\(\Omega\)</span>的某些子集构成的一个集合族,且满足下面三个条件,则称<span class="math inline">\(F\)</span>是<span class="math inline">\(\Omega\)</span>上的一个<strong>事件域</strong>:</p>
<ul>
<li><strong>空集、全集在其中</strong>:<span class="math inline">\(\empty \in F\)</span>,<span class="math inline">\(\Omega \in F\)</span></li>
<li><strong>对补运算封闭</strong>:<span class="math inline">\(A \in F \Longrightarrow \overline{A} \in F\)</span>
<ul>
<li><strong>对可列并运算封闭</strong>:对任意有限个/可列个<span class="math inline">\(A_i \in F\)</span>,都有 <span class="math inline">\(A_1+A_2+\cdots+A_n \in F\)</span>。</li>
</ul>
</li>
</ul>
</li>
<li>
<p><strong>概率测度函数</strong>:给定样本空间<span class="math inline">\(\Omega\)</span>和其上的事件域<span class="math inline">\(F\)</span>,一个概率测度函数是从<span class="math inline">\(F\)</span>到区间<span class="math inline">\(\)</span>的映射<span class="math inline">\(P: F \rightarrow \)</span>,并满足下面三条概率公理:</p>
<ul>
<li><strong>有界性</strong>:对任意事件<span class="math inline">\(A\in F\)</span>,<span class="math inline">\(0 ≤P(A) ≤1\)</span></li>
<li><strong>规范性</strong>:<span class="math inline">\(P(\empty)=0\)</span>,<span class="math inline">\(P(\Omega)=1\)</span></li>
<li><strong>可列可加性</strong>:对任意可列个互斥事件<span class="math inline">\(A_1,A_2,\cdots,A_n,\cdots\)</span>,有:<span class="math inline">\(P(A_1+A_2+\cdots+A_n+\cdots)=P(A_1)+P(A_2)+\cdots+P(A_n)+\cdots\)</span></li>
</ul>
</li>
<li>
<p><strong>概率的公理化定义</strong>:概率测度函数是定义在某个事件域 <span class="math inline">\(F\)</span> 上的一个满足上述三条性质的函数 <span class="math inline">\(P\)</span>,事件<span class="math inline">\(A \in F\)</span>的概率是<span class="math inline">\(P(A)\)</span>。</p>
</li>
<li>
<p><strong>概率空间</strong>:一个三元组<span class="math inline">\((\Omega, F, P)\)</span>,包含<strong>样本空间、事件域、概率测度函数</strong>。</p>
</li>
<li>
<p><strong>随机变量</strong>:设<span class="math inline">\((\Omega, F, P)\)</span>是一个概率空间,则随机变量是一个从样本空间<span class="math inline">\(\Omega\)</span>到实数集<span class="math inline">\(\R\)</span>的<strong>函数</strong><span class="math inline">\(X: \Omega \rightarrow \R\)</span>,并满足下面的条件:</p>
<ul>
<li>
<p><strong>可测性</strong>:<span class="math inline">\(\forall x \in \R, \{\omega \in \Omega | X(\omega) \leq x\} \in F\)</span></p>
</li>
<li>
<p>可测性简化写法:<span class="math inline">\(\forall x \in \R, \{X \leq x\} \in F\)</span></p>
</li>
<li>
<p><strong>可测性的含义</strong>:可以把<span class="math inline">\(\{X \leq x\}\)</span>这种“所有使得函数值小于等于 <span class="math inline">\(x\)</span> 的样本点组成的集合”视为一个事件,作为概率测度函数<span class="math inline">\(P\)</span>的自变量,进而合理谈论积分等数学操作。</p>
</li>
<li>
<p><strong>注意事项</strong>:<strong>随机变量是一个函数,把样本点映射为数值</strong></p>
</li>
</ul>
</li>
<li>
<p><strong>分布函数</strong>:设<span class="math inline">\(X\)</span>是一个定义在概率空间<span class="math inline">\((\Omega, F, P)\)</span>的随机变量,则其<strong>累计分布函数</strong>(Cumulative Distribution Function, 简称 CDF)记为:</p>
<p></p><div class="math display">\[F_X(x)=P(X\leq x)=P(\omega\in \{\omega \in \Omega | X(\omega) \leq x\}),x\in \R
\]</div><p></p><p>即:对任意实数<span class="math inline">\(x\)</span>,<span class="math inline">\(F_X(x)\)</span> 表示样本点<span class="math inline">\(\omega\)</span>满足<span class="math inline">\(X(\omega) \leq x\)</span>的概率。</p>
</li>
<li>
<p><strong>分布函数的充要条件</strong>:</p>
<ul>
<li><strong>有界性</strong>:<span class="math inline">\(0 \leq F(X) \leq 1\)</span></li>
<li><strong>规范性</strong>:<span class="math inline">\(\lim_{x \to -\infty} F_X(x) = 0,\quad \lim_{x \to +\infty} F_X(x) = 1\)</span></li>
<li><strong>单调不减</strong>:<span class="math inline">\(若\ x_1 &lt; x_2,\ 则\ F_X(x_1) \le F_X(x_2)\)</span>。</li>
<li><strong>右连续</strong>:<span class="math inline">\(\lim_{x \to x_0^+} F_X(x) = F_X(x_0+0) = F_X(x_0)\)</span></li>
</ul>
</li>
<li>
<p><strong>概率计算常用等式</strong>:<strong>小于等于就是函数值,小于就是左极限</strong></p>
<ul>
<li>
<p><span class="math inline">\(P(X\leq a) = F(a)\)</span></p>
</li>
<li>
<p><span class="math inline">\(P(X &lt; a)=F(a-0)=\lim_{x \to a^-} F_X(x)\)</span></p>
</li>
<li>
<p><span class="math inline">\(P(X=a)=F(a)-F(a-0)\)</span></p>
</li>
<li>
<p><span class="math inline">\(P(a&lt;X\leq b)=P(X\leq b)-P(X\leq a) = F(b)-F(a)\)</span></p>
</li>
</ul>
</li>
</ul>
<h3 id="二-离散型随机变量">二 离散型随机变量</h3>
<ul>
<li>
<p><strong>离散型随机变量</strong>:函数值只有有限个或可列无限个值的随机变量<span class="math inline">\(X: \Omega \rightarrow S\)</span>(<span class="math inline">\(S\)</span>是可数集或可列无限集)。</p>
</li>
<li>
<p><strong>分布律</strong>:设离散型随机变量<span class="math inline">\(X\)</span>的所有可能取值为<span class="math inline">\(x_1,x_2,\cdots\)</span>,则其分布律是一个<strong>概率质量函数</strong>:<span class="math inline">\(p(x_i)=P(X=x_i)\)</span>。也可以用表格表示:</p>
<table>
<thead>
<tr>
<th><span class="math inline">\(X\)</span></th>
<th><span class="math inline">\(x_1\)</span></th>
<th><span class="math inline">\(x_2\)</span></th>
<th><span class="math inline">\(\cdots\)</span></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math inline">\(P(X=x_i)\)</span></td>
<td><span class="math inline">\(p(x_1)\)</span></td>
<td><span class="math inline">\(p(x_2)\)</span></td>
<td><span class="math inline">\(\cdots\)</span></td>
</tr>
</tbody>
</table>
</li>
</ul>
<h3 id="三-连续型随机变量">三 连续型随机变量</h3>
<ul>
<li>
<p><strong>连续型随机变量概念</strong>:可以在某个区间(或多个区间)内取任意实数值。</p>
</li>
<li>
<p><strong>分布函数</strong>:<span class="math inline">\(F_X(x)=P(X\leq x)\)</span>。</p>
</li>
<li>
<p><strong>概率密度函数</strong>:如果存在一个非负函数<span class="math inline">\(f_X(x)\)</span>,使得对任意实数<span class="math inline">\(x\)</span>,有:<span class="math inline">\(F_X(x)=\int_{-\infin}^{x}f_X(t)\text{d}t\)</span>,则<strong><span class="math inline">\(X\)</span>为连续型随机变量,<span class="math inline">\(f_X(x)\)</span>为<span class="math inline">\(X\)</span>的概率密度函数。</strong>另外,如果<span class="math inline">\(f(x)\)</span>是某个连续型随机变量<span class="math inline">\(X\)</span>的概率密度函数,当且仅当具有以下三条性质:</p>
<ul>
<li><strong>可积性</strong>:<span class="math inline">\(f(x)\)</span>不必连续,必须可积。</li>
<li><strong>非负性</strong>:<span class="math inline">\(\forall x\in \R, f(x)\geq0\)</span>。</li>
<li><strong>规范性</strong>:<span class="math inline">\(\int_{-\infin}^{+\infin}f(x)\text{d}x=F(+\infin)=1\)</span>。</li>
</ul>
</li>
<li>
<p>设<span class="math inline">\(X\)</span>为连续型随机变量,分布函数<span class="math inline">\(F_X(X)\)</span>,概率密度函数为<span class="math inline">\(f_X(x)\)</span>,则:</p>
<ul>
<li><strong>分布函数连续</strong>:<span class="math inline">\(F_X(x)=\int_{-\infin}^{x}f_X(t)\text{d}t\)</span></li>
<li><strong>分布函数在概率密度函数连续点可导</strong>:<span class="math inline">\(f_X(x)\)</span>在点<span class="math inline">\(x_0\)</span>连续,则<span class="math inline">\(F(x)\)</span>在点<span class="math inline">\(x_0\)</span>可导,且<span class="math inline">\(F_X'(x_0)=f_X(x_0)\)</span></li>
<li><strong>单点概率为零</strong>:<span class="math inline">\(\forall x\in \R, P(X=x)=F_X(x)-F_X(x-0)=0\)</span>(因为<span class="math inline">\(F_X(x)\)</span>连续)</li>
<li><strong>区间概率</strong>:<span class="math inline">\(P(a&lt;X&lt;b)\)</span>(无论是开区间、闭区间、半开半闭)<span class="math inline">\(=\int_a^bf_X(x)\text{d}x\)</span>。</li>
<li><strong>概率密度函数的广义定义</strong>:对任意可测集合<span class="math inline">\(A\)</span>,可将其分解为<strong>互不相交的区间或简单集合的并</strong><span class="math inline">\(A=\cup_{i=1}^{\infin}(a_i,b_i]\)</span>,其中各区间可开可并不重叠。由概率的可列可加性和积分的可加性,<span class="math inline">\(P(X\in \cup_{i=1}^{\infin}(a_i,b_i])=\sum_{i=1}^{\infin}\int_{a_i}^{b_i}f_X(t)\text{d}t\)</span>,所以,<span class="math inline">\(\boxed{P(X\in A)=\int_Af_X(x)\text{d}x}\)</span>。</li>
</ul>
</li>
<li>
<p><strong>积分表</strong>:</p>
<ul>
<li><span class="math inline">\(\int a^x\text{d}x = \dfrac{a^x}{\ln a}+C(a&gt;0,a\neq 1)\)</span></li>
<li><span class="math inline">\(\int e^{\lambda x}\text{d}x = \dfrac{e^{\lambda x}}{\lambda}+C\)</span></li>
<li><span class="math inline">\(\int e^{-\lambda x}\text{d}x = -\dfrac{e^{-\lambda x}}{\lambda}+C\)</span></li>
</ul>
</li>
</ul>
<h3 id="四-常见随机变量分布">四 常见随机变量分布</h3>
<table>
<thead>
<tr>
<th>分布名称</th>
<th>类型</th>
<th>概率函数 / 密度函数 <span class="math inline">\(f(x)\)</span> 或 <span class="math inline">\(P(X=x)\)</span></th>
<th>分布函数 <span class="math inline">\(F(x)\)</span></th>
<th>期望 <span class="math inline">\(E(X)\)</span></th>
<th>方差 <span class="math inline">\(\text{Var}(X)\)</span></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>两点分布</strong><br>(伯努利分布)</td>
<td>离散</td>
<td><span class="math inline">\(\begin{array}{l} P(X=1)=p,\\ P(X=0)=1-p\end{array}\)</span></td>
<td>阶梯函数:<br><span class="math inline">\(F(x) = \begin{cases} 0 &amp; x &lt; 0 \\ 1-p &amp; 0 \le x &lt; 1 \\ 1 &amp; x \ge 1 \end{cases}\)</span></td>
<td><span class="math inline">\(p\)</span></td>
<td><span class="math inline">\(p(1-p)\)</span></td>
</tr>
<tr>
<td><strong>二项分布</strong><br><span class="math inline">\(B(n,p)\)</span></td>
<td>离散</td>
<td><span class="math inline">\(\begin{array}{l} \binom{n}{k} p^k (1-p)^{n-k},\\ k=0,1,\dots,n\end{array}\)</span></td>
<td><span class="math inline">\(F(x) = \sum_{k=0}^{\lfloor x \rfloor} \binom{n}{k} p^k (1-p)^{n-k}\)</span></td>
<td><span class="math inline">\(np\)</span></td>
<td><span class="math inline">\(np(1-p)\)</span></td>
</tr>
<tr>
<td><strong>泊松分布</strong><br><span class="math inline">\(P(\lambda)\)</span></td>
<td>离散</td>
<td><span class="math inline">\(\dfrac{\lambda^k e^{-\lambda}}{k!},\ k=0,1,2,\dots\)</span></td>
<td><span class="math inline">\(F(x) = \sum_{k=0}^{\lfloor x \rfloor} \dfrac{\lambda^k e^{-\lambda}}{k!}\)</span></td>
<td><span class="math inline">\(\lambda\)</span></td>
<td><span class="math inline">\(\lambda\)</span></td>
</tr>
<tr>
<td><strong>超几何分布</strong><br><span class="math inline">\(H(N,K,n)\)</span></td>
<td>离散</td>
<td><span class="math inline">\(\begin{array}{l}\dfrac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}},\\ k=0,1,\dots,\min(n,K)\end{array}\)</span></td>
<td>无显式表达,可通过累加计算</td>
<td><span class="math inline">\(n\cdot\frac{K}{N}\)</span></td>
<td><span class="math inline">\(n\cdot\frac{K}{N}\cdot\left(1-\frac{K}{N}\right)\cdot\frac{N-n}{N-1}\)</span></td>
</tr>
<tr>
<td><strong>均匀分布</strong><br><span class="math inline">\(U(a,b)\)</span></td>
<td>连续</td>
<td><span class="math inline">\(f(x) = \dfrac{1}{b-a},\ a \le x \le b\)</span></td>
<td><span class="math inline">\(F(x) = \begin{cases} 0 &amp; x &lt; a \\ \dfrac{x-a}{b-a} &amp; a \le x \le b \\ 1 &amp; x &gt; b \end{cases}\)</span></td>
<td><span class="math inline">\(\frac{a+b}{2}\)</span></td>
<td><span class="math inline">\(\frac{(b-a)^2}{12}\)</span></td>
</tr>
<tr>
<td><strong>指数分布</strong><br><span class="math inline">\(Exp(\lambda)\)</span></td>
<td>连续</td>
<td><span class="math inline">\(f(x) = \lambda e^{-\lambda x},\ x \ge 0\)</span></td>
<td><span class="math inline">\(F(x) = 1 - e^{-\lambda x},\ x \ge 0\)</span></td>
<td><span class="math inline">\(\frac{1}{\lambda}\)</span></td>
<td><span class="math inline">\(\frac{1}{\lambda^2}\)</span></td>
</tr>
<tr>
<td><strong>伽马分布</strong><br><span class="math inline">\(\Gamma(k,\theta)\)</span></td>
<td>连续</td>
<td><span class="math inline">\(\begin{array}{l}f(x) = \dfrac{x^{k-1}e^{-x/\theta}}{\theta^k \Gamma(k)},\\ x &gt; 0\end{array}\)</span></td>
<td>无显式表达,需数值积分</td>
<td><span class="math inline">\(k\theta\)</span></td>
<td><span class="math inline">\(k\theta^2\)</span></td>
</tr>
<tr>
<td><strong>正态分布</strong><br><span class="math inline">\(N(\mu,\sigma^2)\)</span></td>
<td>连续</td>
<td><span class="math inline">\(f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)</span></td>
<td><span class="math inline">\(F(x) = \Phi\left( \dfrac{x - \mu}{\sigma} \right)\)</span>,其中 <span class="math inline">\(\Phi\)</span> 是标准正态分布函数</td>
<td><span class="math inline">\(\mu\)</span></td>
<td><span class="math inline">\(\sigma^2\)</span></td>
</tr>
</tbody>
</table>
<ul>
<li>
<p><strong>指数分布性质</strong>:</p>
<ul>
<li><strong>分布函数</strong>:<span class="math inline">\(F(x)=\int_0^x\lambda e^{-\lambda t}\text{d}t=[-e^{-\lambda t}]_0^x=1-e^{-\lambda x}\)</span>(注意下边是0)。</li>
<li><span class="math inline">\(P(X&gt;a)=e^{-\lambda a}(a&gt;0)\)</span>。</li>
<li><strong>无记忆性</strong>:<span class="math inline">\(P(X&gt;s+t|X&gt;s)=P(X&gt;t)\)</span>,其中<span class="math inline">\(s,t&gt;0\)</span>。</li>
</ul>
</li>
<li>
<p><strong>正态分布性质</strong>:若<span class="math inline">\(X\sim N(\mu, \sigma^2)\)</span>,则:</p>
<ul>
<li><strong>对称性</strong>:<span class="math inline">\(P(X&gt;\mu)=P(X&lt;\mu)=\dfrac{1}{2}\)</span></li>
<li><strong>线性性</strong>:<span class="math inline">\(Y=aX+b\sim N(a\mu+b, a^2\sigma^2)\)</span></li>
<li><strong>标准化</strong>:<span class="math inline">\(Z=\dfrac{X-\mu}{\sigma}\sim N(0,1)\)</span>,故<span class="math inline">\(P(a&lt;X\leq b)=\Phi(\dfrac{b-\mu}{\sigma})-\Phi(\dfrac{a-\mu}{\sigma})\)</span>。</li>
<li><strong>独立正态变量的线性组合仍服从正态分布</strong>:<span class="math inline">\(X\sim N(\mu_1, \sigma_1^2),Y\sim N(\mu_2, \sigma_2^2)\)</span>,且<span class="math inline">\(X\)</span>与<span class="math inline">\(Y\)</span>相互独立,则非零线性组合<span class="math inline">\(aX+bY\sim N(a\mu_1+b\mu_2,a^2\sigma_1^2+b^2\sigma_2^2)\)</span>。</li>
</ul>
</li>
</ul>
<h3 id="五-的分布">五 <span class="math inline">\(Y=g(X)\)</span>的分布</h3>
<ul>
<li><strong>离散型</strong>:多加一行表格。</li>
<li><strong>连续型</strong>:
<ul>
<li><strong>分布函数定义</strong>:<span class="math inline">\(F_Y(y)=P(Y\leq y)=P(g(X)\leq y)=\int_{g(x)\leq y}f(x)\text{d}x\)</span>。</li>
<li><strong>解不等式</strong>:解<span class="math inline">\(g(x)\leq y\)</span>,得<span class="math inline">\(x\)</span>的解集<span class="math inline">\(\{x|g(x)\leq y\}\)</span>,记作<span class="math inline">\(A\)</span>。
<ul>
<li>若<span class="math inline">\(y\)</span>在<span class="math inline">\(A\)</span>上单调递增:反函数求解:<span class="math inline">\(F_Y(y)=P(X\leq h(y))=F_X(h(y))\)</span>,其中<span class="math inline">\(h(y)\)</span>是<span class="math inline">\(g(x)\)</span>的反函数。</li>
<li>若<span class="math inline">\(y\)</span>在<span class="math inline">\(A\)</span>上单调递减:<span class="math inline">\(F_Y(y)=P(X\geq h(y))=1-P(X&lt; h(y))=1-F_X(h(y))\)</span>。</li>
<li>若<span class="math inline">\(y\)</span>在<span class="math inline">\(A\)</span>上非单调,需分区间讨论<span class="math inline">\(x\)</span>的取值范围。</li>
</ul>
</li>
<li><strong>第二种理解</strong>:<span class="math inline">\(P(g(X)\leq y)=\boxed{P(\omega\in \{\omega\in \Omega|g(X(\omega))\leq y\})=P(X(\omega)\in\{x\in \R|g(x)\leq y\})=\int_{g(x)\leq y}f_X(x)\text{d}x}\)</span>。
<ul>
<li><strong>框式解释</strong>:因为<span class="math inline">\(X(\omega)\)</span>是实数,所以<span class="math inline">\(\boxed{ \{ \omega\in \Omega|g(X(\omega))\leq y \} = \{ \omega\in\Omega|X(\omega)\in\{x\in\R|g(x)\leq y \} \} }\)</span> 。由于这两个事件是同一事件,所以把二者代入概率测度函数,函数值相等: <span class="math inline">\(P(g(X(\omega))\leq y)=P(X(\omega)\in \{x\in\R|g(x)\leq y\})\)</span>。根据概率密度函数的广义定义,<span class="math inline">\(P(X\in A)=\int_Af_X(x)\text{d}x\)</span>,所以把<span class="math inline">\(A=\{x\in\R|g(x)\leq y\}\)</span>代入左式,得到上述式子。</li>
<li><strong>注意事项</strong>:框式得到的实际上是勒贝格积分,<strong>但<span class="math inline">\(A\)</span>为区间或分段区间时会退化到黎曼积分</strong>。</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="第三章-二维随机变量">第三章 二维随机变量</h2>
<h3 id="一-二维随机变量联合分布函数与边缘分布函数">一 二维随机变量、联合分布函数与边缘分布函数</h3>
<ul>
<li>
<p><strong>二维随机变量</strong>:设<span class="math inline">\(X, Y\)</span>是定义在同一概率空间<span class="math inline">\((\Omega, F, P)\)</span>的两个随机变量,则称<span class="math inline">\((X,Y): \Omega\rightarrow\R^2\)</span>为一个二维随机变量。</p>
</li>
<li>
<p><strong>联合分布函数</strong>:<span class="math inline">\((X,Y)\)</span>的联合分布函数为<span class="math inline">\(F_{X,Y}(x,y)=P(X\leq x, Y\leq y),x,y\in \R\)</span>,即随机变量<span class="math inline">\(X\)</span>不超过<span class="math inline">\(x\)</span>且随机变量<span class="math inline">\(Y\)</span>不超过<span class="math inline">\(y\)</span>的联合概率。</p>
<ul>
<li>
<p><strong>有界性</strong>:<span class="math inline">\(0≤F_{X,Y}(x,y)≤1\)</span>。</p>
</li>
<li>
<p><strong>规范性</strong>:以下四条</p>
<ul>
<li><span class="math inline">\(F_{X,Y}(+\infin,+\infin)=1\)</span>、</li>
<li><span class="math inline">\(F_{X,Y}(x,-\infin)=0\)</span>、</li>
<li><span class="math inline">\(F_{X,Y}(-\infin,y)=0\)</span>、</li>
<li><span class="math inline">\(F_{X,Y}(-\infin,-\infin)=0\)</span>。</li>
</ul>
</li>
<li>
<p><strong>单调不减</strong>:</p>
<ul>
<li>若<span class="math inline">\(x_1&lt;x_2\)</span>,则<span class="math inline">\(F_{X,Y}(x_1,y)\leq F_{X,Y}(x_2,y)\)</span>。</li>
<li>若<span class="math inline">\(y_1&lt;y_2\)</span>,则<span class="math inline">\(F_{X,Y}(x,y_1)\leq F_{X,Y}(x,y_2)\)</span>。</li>
</ul>
</li>
<li>
<p><strong>右连续</strong>:</p>
<ul>
<li>
<p><span class="math inline">\(F_{X,Y}(x+0,y)=F_{X,Y}(x,y)\)</span></p>
</li>
<li>
<p><span class="math inline">\(F_{X,Y}(x,y+0)=F_{X,Y}(x,y)\)</span></p>
</li>
</ul>
</li>
</ul>
</li>
<li>
<p><strong>概率计算常用等式</strong>:</p>
<ul>
<li><span class="math inline">\(P(x_1&lt;X\leq x_2,y_1&lt;Y\leq y_2) = F_{X,Y}(x_2,y_2)-F_{X,Y}(x_1,y_2)-F_{X,Y}(x_2,y_1)+F_{X,Y}(x_1,y_1)\)</span></li>
<li>例如:<span class="math inline">\(P(X&gt;x_1, Y&gt;x_2)=1-F_{X,Y}(x_1,+\infin)-F_{X,Y}(+\infin,y_1)+F_{X,Y}(x_1,y_1)\)</span>。</li>
</ul>
</li>
<li>
<p><strong>边缘分布函数</strong>:忽略一个变量,只对单独一个变量的概率分布。</p>
<ul>
<li>设二维随机变量<span class="math inline">\((X,Y)\)</span>的联合分布函数为<span class="math inline">\(F_{X,Y}(x,y)\)</span>,则:</li>
<li><span class="math inline">\(X\)</span>的边缘分布函数<span class="math inline">\(F_X(x)=P(X\leq x)=P(X\leq x,Y \leq +\infin)=F_{X,Y}(x,+\infin)\)</span>。</li>
<li><span class="math inline">\(Y\)</span>的边缘分布函数<span class="math inline">\(F_Y(y)=P(Y\leq y)=P(X \leq +\infin,Y\leq y)=F_{X,Y}(+\infin,y)\)</span>。,</li>
</ul>
</li>
</ul>
<h3 id="二-二维离散型随机变量">二 二维离散型随机变量</h3>
<ul>
<li>
<p><strong>二维离散型随机变量</strong>:设<span class="math inline">\(X, Y\)</span>是定义在同一概率空间<span class="math inline">\((\Omega, F, P)\)</span>的两个离散型随机变量,则称<span class="math inline">\((X,Y): \Omega\rightarrow S^2\)</span>为一个二维随机变量(<span class="math inline">\(S^2\)</span>是可数集或可列无限集)。</p>
</li>
<li>
<p><strong>联合分布律</strong>:<span class="math inline">\(p_{ij}=P(X=x_i,Y=y_j)\)</span>或表格法。</p>
</li>
<li>
<p><strong>边缘分布律</strong>:对于表格法来说,在最下和最右各加一栏求和。</p>
<ul>
<li><span class="math inline">\(F_X(x)=P(X=x)=\sum_{y=0}^{+\infin}P(X=x,Y=y)\)</span>。</li>
<li><span class="math inline">\(F_Y(y)=P(Y=y)=\sum_{x=0}^{+\infin}P(X=x,Y=y)\)</span>。</li>
</ul>
</li>
<li>
<p><strong>条件分布律</strong>:</p>
<ul>
<li>
<p><span class="math inline">\(P(X=x_i|Y=y_i)=\dfrac{P(X=x_i,Y=y_i)}{P(Y=y_i)}\)</span>。</p>
</li>
<li>
<p><span class="math inline">\(P(Y=y_i|X=x_i)=\dfrac{P(X=x_i,Y=y_i)}{P(X=x_i)}\)</span>。</p>
</li>
</ul>
</li>
<li>
<p><strong>判断独立性</strong>:联合分布律各行(列)成比例。</p>
</li>
</ul>
<h3 id="三-二维连续型随机变量">三 二维连续型随机变量</h3>
<ul>
<li>
<p><strong>二维连续型随机变量</strong>:若<span class="math inline">\((X,Y)\)</span>在平面上的某个区域中可以取任意不可数个实数值,则称<span class="math inline">\((X,Y): \Omega\rightarrow\R^2\)</span>为一个二维连续型随机变量。</p>
</li>
<li>
<p><strong>联合分布函数</strong>:<span class="math inline">\(F_{X,Y}(x,y)=P(X\leq x, Y\leq y),x,y\in \R\)</span>。</p>
</li>
<li>
<p><strong>联合概率密度函数</strong>:若存在非负函数<span class="math inline">\(f_{X,Y}(x,y)\)</span>,使得对任意<span class="math inline">\(x,y\in \R\)</span>都有:<span class="math inline">\(F_{X,Y}(x,y)=\int_{-\infin}^{y}\int_{-\infin}^{x}f_{X,Y}(u,v)\text{d}u\text{d}v\)</span>,则称<span class="math inline">\(f_{X,Y}(x,y)\)</span>为<span class="math inline">\((X,Y)\)</span>的联合概率密度函数。</p>
<ul>
<li><strong>广义定义</strong>:设<span class="math inline">\(D\)</span>为平面上任一区域,则<span class="math inline">\(P((X,Y)\in D)=\iint_{D}f_{X,Y}(x,y)\text{d}x\text{d}y\)</span>。</li>
</ul>
</li>
<li>
<p><strong>边缘分布函数</strong>:忽略一个变量,只关心一个变量情况下的累计概率。</p>
<ul>
<li><span class="math inline">\(X\)</span>的边缘分布函数:<span class="math inline">\(F_X(x) = P(X \le x) = \lim_{y \to +\infty} F_{X,Y}(x, y)\)</span>,</li>
<li><span class="math inline">\(Y\)</span>的边缘分布函数:<span class="math inline">\(F_Y(y) = P(Y \le y) = \lim_{x \to +\infty} F_{X,Y}(x, y)\)</span>。</li>
</ul>
</li>
<li>
<p><strong>边缘概率密度函数</strong>:保留关心的维度,把不关心的维度“积掉”</p>
<ul>
<li><span class="math inline">\(X\)</span>的边缘概率密度函数:竖线,<span class="math inline">\(f_X(x)=\int_{-\infty}^{+\infin} f_{X,Y}(x, y)\text{d}y\)</span>。</li>
<li><span class="math inline">\(Y\)</span>的边缘概率密度函数:横线,<span class="math inline">\(f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x, y)\text{d}x\)</span>。</li>
<li>证明方法:由于<span class="math inline">\(F_X(x) = \lim_{y \to +\infin}F_{X,Y}(x,y)= \int_{-\infty}^{x} \int_{-\infty}^{+\infin} f_{X,Y}(u, v)\text{d}v\text{d}u=\int_{-\infty}^{x}\left( \int_{-\infty}^{+\infin} f_{X,Y}(u, v)\text{d}v \right) \text{d}u\)</span>,所以<span class="math inline">\(f_X(x)=\dfrac{\text{d}}{\text{d}x}\left[\int_{-\infty}^{x}\left( \int_{-\infty}^{+\infin} f_{X,Y}(u, v)\text{d}v \right) \text{d}u\right]\)</span>,记<span class="math inline">\(g(u)= \int_{-\infty}^{+\infin} f_{X,Y}(u, v)\text{d}v\)</span>,则<span class="math inline">\(f_X(x)=\dfrac{\text{d}}{\text{d}x}\left[\int_{-\infty}^{x}g(u) \text{d}u\right]=g(x)\)</span>(变上限积分)</li>
</ul>
</li>
<li>
<p><strong>条件分布函数与条件概率密度</strong>:</p>
<ul>
<li><span class="math inline">\(X\)</span>的条件分布函数为:<span class="math inline">\(F_{X|Y}(x|y)=P(X\leq x|Y=y)=\dfrac{\int_{-\infin}^x f_{X,Y}(u,y)\text{d}u}{f_Y(y)}\)</span>。</li>
<li><span class="math inline">\(Y\)</span>的条件分布函数为:<span class="math inline">\(F_{Y|X}(y|x)=P(Y\leq y|X=x)=\dfrac{\int_{-\infin}^y f_{X,Y}(x,v)\text{d}v}{f_X(x)}\)</span>。</li>
<li><span class="math inline">\(X\)</span>的条件概率密度为:<span class="math inline">\(f_{X|Y}(x|y)=\dfrac{f_{X,Y}(x,y)}{f_Y(y)}\)</span>。</li>
<li><span class="math inline">\(Y\)</span>的条件概率密度为:<span class="math inline">\(f_{Y|X}(y|x)=\dfrac{f_{X,Y}(x,y)}{f_X(x)}\)</span>。</li>
</ul>
</li>
<li>
<p><strong>判断两个连续型随机变量是否独立</strong>:二者任选其一</p>
<ul>
<li><span class="math inline">\(F_{X,Y}(x, y) = F_X(x) \cdot F_Y(y),\quad \forall x, y \in \mathbb{R}\)</span>。</li>
<li><span class="math inline">\(f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y),\quad \text{除了测度为零的集合外}\)</span>。</li>
</ul>
</li>
<li>
<p><strong>判断两个随机变量是否独立的定义</strong>:<span class="math inline">\(\forall A,B\subseteq \R, P(X\in A,且 Y\in B)=P(X\in A)P(Y \in B)\)</span>。</p>
<ul>
<li>推论:若<span class="math inline">\(X\)</span>与<span class="math inline">\(Y\)</span>相互独立,且<span class="math inline">\(f(x)\)</span>与<span class="math inline">\(g(y)\)</span>是可测函数,则<span class="math inline">\(f(X)\)</span>与<span class="math inline">\(g(Y)\)</span>相互独立。</li>
</ul>
</li>
</ul>
<h3 id="四-的函数的分布">四 <span class="math inline">\(Z=g(X,Y)\)</span>的函数的分布</h3>
<p>由<span class="math inline">\(F_Z(z)=P(Z\leq z)=P(g(X,Y)\leq z)\)</span>,对右边的集合变形:</p>
<p></p><div class="math display">\[\{\omega\in\Omega|g(X(\omega),Y(\omega))\leq z\}=\{\omega\in\Omega|(X(\omega),Y(\omega))\in\{(x,y)\in\R^2|g(x,y)\leq z\}\}
\]</div><p></p><p>因为等号两边是同一个事件,所以代入概率测度函数得到的函数值相等。记<span class="math inline">\(A_z=\{(x,y)\in\R^2|g(x,y)\leq z\}\)</span>,则<span class="math inline">\(P(g(X,Y)\leq z)=P((X,Y)\in A_z)\)</span>。根据概率密度函数的广义定义:</p>
<p></p><div class="math display">\[P((X,Y)\in A)=\iint_{A_z}f_{X,Y}(x,y)\text{d}x\text{d}y
\]</div><p></p><p>所以:</p>
<p></p><div class="math display">\[\boxed{F_Z(z)=\iint_{g(x,y)\leq z}f_{X,Y}(x,y)\text{d}x\text{d}y}
\]</div><p></p><p>同样,最终得到的是勒贝格积分,但<strong>对于连续概率密度函数与规则区域,可视为二维黎曼积分</strong>。</p>
<p>此外,若<span class="math inline">\(X_1,X_2,\cdots,X_n\)</span>相互独立,则:</p>
<p><span class="math inline">\(Z=\max(X_1,X_2,\cdots,X_n)\)</span>的分布函数为<span class="math inline">\(F_{max}(z)=F_{X_1}(z)F_{X_2}(z)\cdots F_{X_n}(z)\)</span>,</p>
<p><span class="math inline">\(Z=\min(X_1,X_2,\cdots,X_n)\)</span>的分布函数为<span class="math inline">\(F_{min}(z)=1-\cdots \)</span>。</p>
<h2 id="第四章-数字特征">第四章 数字特征</h2>
<h3 id="一-数学期望">一 数学期望</h3>
<ul>
<li><strong>离散型随机变量的数学期望</strong>:设随机变量<span class="math inline">\(X\)</span>的分布律为<span class="math inline">\(P(X=x_i)=p_i\)</span>,则期望<span class="math inline">\(\boxed{E(X)=\sum_{i}x_ip_i}\)</span>。</li>
<li><strong>离散型随机变量的函数的数学期望</strong>:设随机变量<span class="math inline">\(X\)</span>的分布律为<span class="math inline">\(P(X=x_i)=p_i\)</span>,<span class="math inline">\(g(x)\)</span>是实值函数,则<span class="math inline">\(\boxed{E(g(X))=\sum_ig(x_i)p_i}\)</span>。</li>
<li><strong>连续型随机变量的数学期望</strong>:设随机变量<span class="math inline">\(X\)</span>的概率密度函数为<span class="math inline">\(f_X(x)\)</span>,则<span class="math inline">\(\boxed{E(X)=\int_{-\infin}^{+\infin}xf(x)\text{d}x}\)</span>(绝对收敛)。</li>
<li><strong>连续型随机变量的函数的数学期望</strong>:设随机变量<span class="math inline">\(X\)</span>的概率密度函数为<span class="math inline">\(f_X(x)\)</span>,<span class="math inline">\(g(x)\)</span>是实值函数,则<span class="math inline">\(\boxed{E(g(X))=\int_{-\infin}^{+\infin}g(x)f(x)\text{d}x}\)</span>。</li>
<li><strong>二维随机变量的函数的数学期望</strong>:设<span class="math inline">\(Z=g(X,Y)\)</span>是二维随机变量<span class="math inline">\((X,Y)\)</span>的一个实值函数,
<ul>
<li><strong>离散型</strong>:<span class="math inline">\(P(X=x_i,Y=y_j)=p_{ij}\)</span>。则<span class="math inline">\(Z\)</span>的期望<span class="math inline">\(\boxed{E(Z)=E(g(X,Y))=\sum_i\sum_jg(x_i,y_j)p_{ij}}\)</span>。</li>
<li><strong>连续型</strong>:概率密度函数<span class="math inline">\(f_{X,Y}(x,y)\)</span>,则<span class="math inline">\(Z\)</span>的期望<span class="math inline">\(\boxed{E(Z)=\int_{-\infin}^{+\infin}\int_{-\infin}^{+\infin}g(x,y)f(x,y)\text{d}x\text{d}y}\)</span>。</li>
</ul>
</li>
<li><strong>数学期望的性质</strong>:
<ul>
<li><span class="math inline">\(E(aX+b)=aE(X)+b\)</span></li>
<li><span class="math inline">\(E(X\pm Y)=E(X)\pm E(Y)\)</span></li>
<li>若<span class="math inline">\(X,Y\)</span>相互独立,则<span class="math inline">\(E(XY)=E(X)E(Y)\)</span></li>
</ul>
</li>
</ul>
<h3 id="二-方差">二 方差</h3>
<ul>
<li><span class="math inline">\(D(x)\)</span>实际上求<span class="math inline">\(X\)</span>的函数<span class="math inline">\(Y=(X-E(X))^2\)</span>的数学期望。
<ul>
<li><strong>离散型</strong>: <span class="math inline">\(P(X=x_i)=p_i\)</span>,则<span class="math inline">\(\boxed{D(X)=E((X-E(X))^2)=\sum_i(x_i-E(X))^2p_i}\)</span>。</li>
<li><strong>连续型</strong>:概率密度函数<span class="math inline">\(f_X(x)\)</span>,则<span class="math inline">\(\boxed{D(X)=E((X-E(X))^2)=\int_{-\infin}^{+\infin}(x-E(X))^2f_X(x)\text{d}x}\)</span>。</li>
<li><span class="math inline">\(\boxed{D(X)=E(X^2)-(E(X))^2}\)</span>。</li>
</ul>
</li>
<li><strong>方差的性质</strong>:
<ul>
<li><span class="math inline">\(D(aX+b)=a^2D(X)\)</span></li>
<li>若<span class="math inline">\(X,Y\)</span>相互独立,则<span class="math inline">\(D(X\pm Y)=D(X)+D(Y)\)</span>(注意等号右边是<strong>加号</strong>)</li>
</ul>
</li>
</ul>
<h3 id="三-协方差和相关系数">三 协方差和相关系数</h3>
<ul>
<li>
<p><strong>协方差</strong>:设<span class="math inline">\(X,Y\)</span>是两个随机变量,期望分别为<span class="math inline">\(E(X)\)</span>和<span class="math inline">\(E(Y)\)</span>,则协方差:<span class="math inline">\(\boxed{\text{Cov}(X,Y)=E[(X-E(X))(Y-E(Y))]=E(XY)-E(X)E(Y)}\)</span></p>
</li>
<li>
<p><strong>协方差的性质</strong>:</p>
<ul>
<li>
<p><span class="math inline">\(\text{Cov}(aX+b,cY+d)=ac\text{Cov}(X,Y)\)</span>、<span class="math inline">\(\text{Cov}(X,X)=D(X)\)</span></p>
<ul>
<li><span class="math inline">\(\text{Cov}(X,Y)=\text{Cov}(Y,X)\)</span>、<span class="math inline">\(\text{Cov}(X,C)=0\)</span></li>
<li><span class="math inline">\(\text{Cov}(X,Y+Z)=\text{Cov}(X,Y)+\text{Cov}(X,Z)\)</span>、<span class="math inline">\(\text{Cov}(X+Y,Z)=\text{Cov}(X,Z)+\text{Cov}(Y,Z)\)</span></li>
<li><span class="math inline">\(D(X\pm Y)=D(X)+D(Y)\pm 2\text{Cov}(X,Y)\)</span>(知三求一)</li>
</ul>
</li>
<li>
<p>若<span class="math inline">\(X,Y\)</span>相互独立,则<span class="math inline">\(\text{Cov}(X,Y)=0\)</span>,<strong>反着不成立</strong>。</p>
</li>
</ul>
</li>
<li>
<p><strong>相关系数</strong>:设<span class="math inline">\(X,Y\)</span>是两个随机变量,期望分别为<span class="math inline">\(E(X)\)</span>和<span class="math inline">\(E(Y)\)</span>,方差分别为<span class="math inline">\(D(X)\)</span>和<span class="math inline">\(D(Y)\)</span>,协方差为<span class="math inline">\(\text{Cov}(X,Y)\)</span>则:<span class="math inline">\(\text{Cov}(X,Y)\)</span>则:<span class="math inline">\(\boxed{\rho_{XY}=\dfrac{\text{Cov}(X,Y)}{\sqrt{D(X)D(Y)}}}\)</span>。</p>
</li>
<li>
<p><strong>相关系数的性质</strong>:</p>
<ul>
<li>
<p>若<span class="math inline">\(\rho_{XY}=0\)</span>,则称<strong><span class="math inline">\(X\)</span>和<span class="math inline">\(Y\)</span>不相关</strong>。</p>
<ul>
<li>
<p>不相关仅表示<span class="math inline">\(X,Y\)</span>无限性关系,而独立表示<span class="math inline">\(X,Y\)</span>完全无关系。</p>
</li>
<li>
<p><span class="math inline">\(X,Y\)</span>独立则一定不相关,而<strong>不相关不能推出独立</strong>。</p>
</li>
</ul>
</li>
<li>
<p><span class="math inline">\(|\rho_{XY}|\leq 1\)</span>、<span class="math inline">\(\rho_{XY}=\rho_{YX}\)</span>、<span class="math inline">\(\rho_{XX}=1\)</span>。</p>
</li>
<li>
<p><strong>不相关的四个等价命题</strong>:<strong>两变量独立等价于协方差、相关系数为零、期望的乘积等于乘积的期望、和的方差可分解</strong>。</p>
<p></p><div class="math display">\[\boxed{\text{Cov}(X,Y)=0\iff\rho_{XY}=0\iff E(XY)=E(X)E(Y)\iff D(X\pm Y)=D(X)+D(Y)}
\]</div><p></p></li>
</ul>
</li>
</ul>
<h3 id="四-协方差矩阵">四 协方差矩阵</h3>
<ul>
<li>
<p><strong>随机向量<span class="math inline">\(\boldsymbol{X}=(X_1,X_2,\cdots,X_n)^T\)</span>的协方差矩阵</strong>:设</p>
<p></p><div class="math display">\[\boldsymbol{X} =
\begin{bmatrix}
X_1 \\
X_2 \\
\vdots \\
X_n
\end{bmatrix},\quad
E(\boldsymbol{X}) = \boldsymbol{\mu} =
\begin{bmatrix}
\mu_1 \\
\mu_2 \\
\vdots \\
\mu_n
\end{bmatrix}
\]</div><p></p><p>则:</p>
<p></p><div class="math display">\[\text{Cov}(\boldsymbol{X}) = E \left((\boldsymbol{X} - \boldsymbol{\mu})(\boldsymbol{X} - \boldsymbol{\mu})^T \right)
\]</div><p></p><p>故:</p>
<p></p><div class="math display">\[\boldsymbol{\Sigma} = \text{Cov}(\boldsymbol{X}) =
\begin{bmatrix}
\text{Cov}(X_1,X_1) &amp; \text{Cov}(X_1,X_2) &amp; \cdots &amp; \text{Cov}(X_1,X_n) \\
\text{Cov}(X_2,X_1) &amp; \text{Cov}(X_2,X_2) &amp; \cdots &amp; \text{Cov}(X_2,X_n) \\
\vdots &amp; \vdots &amp; \ddots &amp; \vdots \\
\text{Cov}(X_n,X_1) &amp; \text{Cov}(X_n,X_2) &amp; \cdots &amp; \text{Cov}(X_n,X_n)
\end{bmatrix}
\]</div><p></p><p>对于二维随机变量<span class="math inline">\(\boldsymbol{Z}=(X,Y)^T\)</span>,由于<span class="math inline">\(\text{Cov}(X,Y)=\text{Cov}(Y,X)\)</span>、<span class="math inline">\(\text{Cov}(X,X)=D(X)\)</span>、<span class="math inline">\(\text{Cov}(Y,Y)=D(Y)\)</span>,所以其协方差矩阵为:</p>
<p></p><div class="math display">\[\text{Cov}(\boldsymbol{Z}) =
\begin{bmatrix}
D(X) &amp; \text{Cov}(X,Y) \\
\text{Cov}(X,Y) &amp; D(Y)
\end{bmatrix}
\]</div><p></p></li>
<li>
<p><strong>特殊性质</strong>:</p>
<ul>
<li>如果随机变量 <span class="math inline">\(X_1, X_2, \cdots, X_n\)</span> 相互独立,根据<strong>两变量独立等价于协方差、相关系数为零、期望的乘积等于乘积的期望、和的方差可分解</strong>,所以协方差矩阵是对角矩阵:<p></p><div class="math display">\[\boldsymbol{\Sigma} = \text{Cov}(\boldsymbol{X}) =
   \begin{bmatrix}
   D(X_1) &amp; 0 &amp; \cdots &amp; 0 \\
   0 &amp; D(X_2) &amp; \cdots &amp; 0 \\
   \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\
   0 &amp; 0 &amp; \cdots &amp; D(X_n)
   \end{bmatrix}
\]</div><p></p></li>
</ul>
</li>
</ul>
<h3 id="五-二维正态分布的数字特征">五 二维正态分布的数字特征</h3>
<p>二维正态分布 <span class="math inline">\((X, Y)\)</span> 的概率密度函数为:</p>
<p></p><div class="math display">\[f_{X,Y}(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}
\exp\left\{
-\frac{1}{2(1-\rho^2)}\left[
\left(\frac{x - \mu_X}{\sigma_X}\right)^2
- 2\rho\left(\frac{x - \mu_X}{\sigma_X}\right)\left(\frac{y - \mu_Y}{\sigma_Y}\right)
+ \left(\frac{y - \mu_Y}{\sigma_Y}\right)^2
\right]
\right\}
\]</div><p></p><p>其中参数满足:</p>
<ul>
<li><span class="math inline">\(\mu_X\)</span>:<span class="math inline">\(X\)</span> 的数学期望;</li>
<li><span class="math inline">\(\mu_Y\)</span>:<span class="math inline">\(Y\)</span> 的数学期望;</li>
<li><span class="math inline">\(\sigma_X &gt; 0\)</span>:<span class="math inline">\(X\)</span> 的标准差;</li>
<li><span class="math inline">\(\sigma_Y &gt; 0\)</span>:<span class="math inline">\(Y\)</span> 的标准差;</li>
<li><span class="math inline">\(-1 &lt; \rho &lt; 1\)</span>:<span class="math inline">\(X\)</span> 与 <span class="math inline">\(Y\)</span> 的相关系数。</li>
</ul>
<p>记作:</p>
<p></p><div class="math display">\[(X, Y) \sim N(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)
\]</div><p></p><table>
<thead>
<tr>
<th>特征</th>
<th>表达式</th>
</tr>
</thead>
<tbody>
<tr>
<td>联合分布</td>
<td><span class="math inline">\((X, Y) \sim N(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho)\)</span></td>
</tr>
<tr>
<td>边缘分布</td>
<td><span class="math inline">\(X \sim N(\mu_X, \sigma_X^2),\quad Y \sim N(\mu_Y, \sigma_Y^2)\)</span></td>
</tr>
<tr>
<td>数学期望</td>
<td><span class="math inline">\(E(X) = \mu_X,\quad E(Y) = \mu_Y\)</span></td>
</tr>
<tr>
<td>方差</td>
<td><span class="math inline">\(D(X) = \sigma_X^2,\quad D(Y) = \sigma_Y^2\)</span></td>
</tr>
<tr>
<td>协方差</td>
<td><span class="math inline">\(\text{Cov}(X, Y) = \rho \sigma_X \sigma_Y\)</span></td>
</tr>
<tr>
<td>相关系数</td>
<td><span class="math inline">\(\rho_{XY} = \rho\)</span></td>
</tr>
<tr>
<td>独立条件</td>
<td>当且仅当 <span class="math inline">\(\rho = 0\)</span> 时,<span class="math inline">\(X\)</span> 与 <span class="math inline">\(Y\)</span> 独立</td>
</tr>
<tr>
<td>线性组合分布</td>
<td><span class="math inline">\(aX + bY \sim N(a\mu_X + b\mu_Y,\ a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho\sigma_X\sigma_Y)\)</span></td>
</tr>
</tbody>
</table>
<h3 id="六-多维正态分布">六 多维正态分布</h3>
<ul>
<li>
<p><strong>多维正态分布定义</strong>:设 <span class="math inline">\(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T\)</span> 是一个 <span class="math inline">\(n\)</span> 维随机向量,其期望为:</p>
<p></p><div class="math display">\[\boldsymbol{\mu} = E(\boldsymbol{X}) =
\begin{bmatrix}
E(X_1) \\
E(X_2) \\
\vdots \\
E(X_n)
\end{bmatrix}
=
\begin{bmatrix}
\mu_1 \\
\mu_2 \\
\vdots \\
\mu_n
\end{bmatrix}
\]</div><p></p><p>其协方差矩阵为:</p>
<p></p><div class="math display">\[\boldsymbol{\Sigma} = \text{Cov}(\boldsymbol{X}) = E\left((\boldsymbol{X} - \boldsymbol{\mu})(\boldsymbol{X} - \boldsymbol{\mu})^T\right)
\]</div><p></p><p>若<span class="math inline">\(\boldsymbol{X}\)</span> 的联合概率密度函数为:</p>
<p></p><div class="math display">\[f_{\boldsymbol{X}}(\boldsymbol{x}) = \frac{1}{(2\pi)^{n/2}\text{det}(\boldsymbol{\Sigma})^{1/2}}
\exp\left\{
-\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\boldsymbol{x} - \boldsymbol{\mu})
\right\}
\]</div><p></p><p>其中:</p>
<ul>
<li><span class="math inline">\(\boldsymbol{x} = (x_1, x_2, \ldots, x_n)^T\)</span> 是实数向量;</li>
<li><span class="math inline">\(\boldsymbol{\Sigma}\)</span> 是 <span class="math inline">\(n \times n\)</span> 协方差矩阵,必须是<strong>对称正定矩阵</strong>;</li>
<li><span class="math inline">\(\text{det}(\boldsymbol{\Sigma})\)</span> 表示矩阵 <span class="math inline">\(\boldsymbol{\Sigma}\)</span> 的行列式;</li>
</ul>
<p>则称 <span class="math inline">\(\boldsymbol{X}\)</span> 服从 <strong><span class="math inline">\(n\)</span> 维正态分布</strong>,记作:</p>
<p></p><div class="math display">\[\boldsymbol{X} \sim N_n(\boldsymbol{\mu}, \boldsymbol{\Sigma})
\]</div><p></p><p>在人工智能相关的论文中,常写作<span class="math inline">\(\boldsymbol{X} \sim \mathcal{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma})\)</span>。</p>
</li>
<li>
<p><strong>二维正态分布的表示</strong>:</p>
<p></p><div class="math display">\[\boldsymbol{X} = \begin{bmatrix} X_1 \\ X_2 \end{bmatrix}
\sim \mathcal{N}_n\left(
\begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix},
\begin{bmatrix}
\sigma_1^2 &amp; \rho \sigma_1 \sigma_2 \\
\rho \sigma_1 \sigma_2 &amp; \sigma_2^2
\end{bmatrix}
\right)
\]</div><p></p><p>如果用<span class="math inline">\(\boldsymbol{Z}=(X,Y)^T\)</span>表示,则可以写成:</p>
<p></p><div class="math display">\[\boldsymbol{Z} = \begin{bmatrix} X \\ Y \end{bmatrix}
\sim \mathcal{N}_n\left(
\begin{bmatrix} \mu_X \\ \mu_Y \end{bmatrix},
\begin{bmatrix}
\sigma_X^2 &amp; \rho_{XY} \sigma_X \sigma_Y \\
\rho_{XY} \sigma_X \sigma_Y &amp; \sigma_Y^2
\end{bmatrix}
\right)
\]</div><p></p></li>
<li>
<p><strong>标准正态分布</strong>:设一个 $ n $ 维随机向量:</p>
<p></p><div class="math display">\[\boldsymbol{X} = \begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{bmatrix}
\]</div><p></p><p>如果它的每个分量 $ X_i \sim N(0, 1) $,并且各分量之间相互独立,则称这个随机向量服从 <strong>n 维标准正态分布</strong>,记作:</p>
<p></p><div class="math display">\[\boldsymbol{Z} \sim \mathcal{N}_n(\boldsymbol{0}, \boldsymbol{I}_n)
\]</div><p></p><p>其中:</p>
<ul>
<li><span class="math inline">\(\boldsymbol{0}\)</span> 是 $ n $ 维零向量(均值为0);</li>
<li><span class="math inline">\(\boldsymbol{I}_n\)</span> 是 $ n \times n $ 的单位矩阵(协方差矩阵是对角线为1、其余为0的矩阵),表示各个维度相互独立且方差为1。</li>
</ul>
</li>
</ul>
<h3 id="七-矩-常用不等式">七 矩 常用不等式</h3>
<ul>
<li>
<p><strong>k阶原点矩</strong>:<span class="math inline">\(\mu_k'=E(X^k)\)</span>。<strong>数学期望是一阶原点矩</strong>。</p>
</li>
<li>
<p><strong>k阶中心距</strong>:<span class="math inline">\(\mu_k=E((X-E(X))^k)\)</span>。<strong>方差是二阶中心距</strong>。</p>
</li>
<li>
<p><strong>矩生成函数</strong>:设<span class="math inline">\(X\)</span>为随机变量,其矩生成函数定义为:<span class="math inline">\(\boxed{M_X(t) = E(e^{tX}) = \int_{-\infty}^{\infty} e^{tx} f_X(x)\text{d}x}\)</span>。</p>
<ul>
<li>
<p>若<span class="math inline">\(M_X(t)\)</span>在<span class="math inline">\(t=0\)</span>的某个领域内存在且可导,则对任意正整数<span class="math inline">\(k\)</span>,有:<span class="math inline">\(\boxed{E(X^k)=\dfrac{\text{d}^k}{\text{d}t^k}M_X(t)\Bigg|_{t=0}}\)</span></p>
</li>
<li>
<p>即:<span class="math inline">\(M_X(t)\)</span>的k阶导数在<span class="math inline">\(t=0\)</span>的值是<span class="math inline">\(E(X^k)\)</span>。</p>
</li>
</ul>
</li>
<li>
<p><strong>正态分布的矩生成函数</strong>:</p>
<ul>
<li>由<span class="math inline">\(f_X(x) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp\left( -\dfrac{(x - \mu)^2}{2\sigma^2} \right)\)</span>,把它代入<span class="math inline">\(M_X(t) = E(e^{tX}) = \int_{-\infty}^{\infty} e^{tx} f_X(x)\text{d}x\)</span>,合并同类项,得:</li>
<li><span class="math inline">\(M_X(t) = \int_{-\infty}^{\infty} e^{tx} \cdot \dfrac{1}{\sqrt{2\pi}\sigma} \exp\left( -\dfrac{(x - \mu)^2}{2\sigma^2} \right) \text{d}x= \dfrac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^{\infty} \exp\left( tx - \dfrac{(x - \mu)^2}{2\sigma^2} \right) \text{d}x\)</span>。完全平方,得:</li>
<li><span class="math inline">\(M_X(t)=\dfrac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^{\infty} \exp\left( -\dfrac{1}{2\sigma^2}(x - (\mu + \sigma^2 t))^2 + \dfrac{(\mu + \sigma^2 t)^2 - \mu^2}{2\sigma^2} \right) \text{d}x\)</span>。移动一些常数项,得:</li>
<li><span class="math inline">\(M_X(t) = \exp\left( \dfrac{(\mu + \sigma^2 t)^2 - \mu^2}{2\sigma^2} \right) \cdot \int_{-\infty}^{\infty} \dfrac{1}{\sqrt{2\pi}\sigma} \cdot \exp\left( -\dfrac{(x - (\mu + \sigma^2 t))^2}{2\sigma^2} \right) \text{d}x\)</span>。</li>
<li>注意到积分内是一个<span class="math inline">\(N(\mu+\sigma^2t,\sigma^2)\)</span>的正态分布。根据<span class="math inline">\(\int_{-\infin}^{+\infin}f_X(x)\text{d}x=1\)</span>,所以:</li>
<li><span class="math inline">\(M_X(t) = \exp\left( \dfrac{(\mu + \sigma^2 t)^2 - \mu^2}{2\sigma^2} \right)\)</span>,化简,得:<span class="math inline">\(\boxed{M_X(t) = \exp\left( \mu t + \frac{1}{2} \sigma^2 t^2 \right)}\)</span>。</li>
</ul>
</li>
<li>
<p><strong>正态分布的原点矩与中心距</strong>(把<span class="math inline">\(t=0\)</span>代入矩生成函数的各阶导数):</p>
<ul>
<li><strong>一阶原点矩</strong>:<span class="math inline">\(M'_X(t) = (\mu + \sigma^2 t) e^{\mu t + \frac{1}{2}\sigma^2 t^2} \Rightarrow E(X) = \mu\)</span>。</li>
<li><strong>二阶原点矩</strong>:<span class="math inline">\(M''_X(t) = [(\mu + \sigma^2 t)^2 + \sigma^2] e^{\mu t + \frac{1}{2}\sigma^2 t^2} \Rightarrow E(X^2) = \mu^2 + \sigma^2\)</span>。</li>
<li><strong>三阶原点矩</strong>:<span class="math inline">\(M'''_X(t) = [(\mu + \sigma^2 t)^3 + 3\sigma^2(\mu + \sigma^2 t)] e^{\mu t + \frac{1}{2}\sigma^2 t^2} \Rightarrow E(X^3) = \mu^3 + 3\mu\sigma^2\)</span>。</li>
<li><strong>四阶原点矩</strong>:<span class="math inline">\(M''''_X(t) = [(\mu + \sigma^2 t)^4 + 6\sigma^2(\mu + \sigma^2 t)^2 + 3\sigma^4] e^{\mu t + \frac{1}{2}\sigma^2 t^2} \Rightarrow E(X^4) = \mu^4 + 6\mu^2\sigma^2 + 3\sigma^4\)</span>。</li>
<li><strong>奇数阶中心距</strong>:<span class="math inline">\(\mu_k=E((X-E(X))^k)=\int_{-\infin}^{+\infin} (x-\mu)^kf_X(x)\text{d}x=\int_{-\infin}^{+\infin} (x-\mu)^ke^{-\frac{(x-\mu)^2}{2\sigma^2}}\text{d}x\)</span>。<span class="math inline">\(k\)</span>是奇数时,积分符号内的式子关于<span class="math inline">\(x=\mu\)</span>对称,所以积分结果为0。<span class="math inline">\(\mu_k=0\)</span>。</li>
<li><strong>偶数阶中心距</strong>:<span class="math inline">\(\mu_k=\sigma^k(k-1)!!\)</span>(换元法,较为麻烦,这里直接给出结果)</li>
<li><strong>方差的k次方</strong>:<span class="math inline">\(D(X^k)=E(X^{2k})-(E(X^k))^2=\mu_{2k}'-\mu_{k}'^2\)</span>,<span class="math inline">\(D(X^2)=4\mu^2\sigma^2+2\sigma^4\)</span>。</li>
</ul>
</li>
<li>
<p><strong>常用不等式</strong>:</p>
<ul>
<li><strong>琴生不等式</strong>:若<span class="math inline">\(g''(x)\geq0\)</span>,则<span class="math inline">\(E(g(X))\geq g(E(X))\)</span>,若<span class="math inline">\(g(x)''\leq0\)</span>,则<span class="math inline">\(E(g(X))\leq g(E(X))\)</span>。</li>
<li><strong>柯西不等式</strong>:<span class="math inline">\((E(XY))^2\leq E(X^2)E(Y^2)\)</span>。</li>
<li><strong>协方差绝对值有界</strong>:<span class="math inline">\(|\text{Cov}(X,Y)|\leq \sqrt{D(X)D(Y)}\)</span>。</li>
</ul>
</li>
</ul><br><br>
来源:https://www.cnblogs.com/kirina-official/p/18896426
頁: [1]
查看完整版本: 机器学习 概率统计基础 随机变量部分