神经网络常见的40多种激活函数(应用场景+数学公式+代码实现+函数图象)
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow: unset; padding: 0; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; visibility: visible" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; margin: 0; overflow: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; visibility: visible"><span style="visibility: visible">什么是激活函数</span></span></h2><p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">激活函数,属于神经网络中的概念。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">激活函数,就像神经元的开关,决定了输入信号能否被传递,以及以什么形式传递。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">为应对不同的场景,激活函数不断发展出了各种实现。它们存在的意义,就是为信号传递赋予不同种类的“非线性”特征,从而让神经网络能够表达更为丰富的含义。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">本文旨在梳理常见的 40 多种激活函数(也包含少量经典的输出层函数)。</span></p>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow: unset; padding: 0; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; visibility: visible" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; margin: 0; overflow: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; visibility: visible"><span style="visibility: visible">说明</span></span></h2>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">本文将简要介绍激活函数的概念和使用场景,并列出其数学公式,然后基于Python进行可视化实现。最后一节则以表格的形式,从多个维度对比了其中最为经典的 20 多个激活函数,以期为读者提供选型参考。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">本文所有代码实现均基于Jupyter NoteBook,感兴趣的读者可以后台留言获取完整ipynb文件。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; margin: 0; padding: 8px 0; visibility: visible" data-tool="mdnice编辑器"><span style="visibility: visible">为使得各激活函数的代码实现更为简洁,首先做一些初始化操作,如导入对应Python库、定义对应的绘图函数等,如下:</span></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px; visibility: visible"><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px; visibility: visible"><span style="visibility: visible"># -*- coding: utf-8 -*-</span></span><span style="visibility: visible"><br style="visibility: visible"></span><span style="visibility: visible"><br style="visibility: visible"></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 导入必要的库</span></span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> numpy </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>as</span></span><span> np</span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> matplotlib.pyplot </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>as</span></span><span> plt</span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>from</span></span><span> scipy.special </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> expit </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>as</span></span><span> sigmoid </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># scipy 的 sigmoid</span></span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> warnings</span><span><br></span><span>warnings.filterwarnings(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"ignore"</span></span><span>)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 设置中文字体和图形样式</span></span><span><br></span><span>plt.rcParams[</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'font.sans-serif'</span></span><span>] = [</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'SimHei'</span></span><span>, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Arial Unicode MS'</span></span><span>, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'DejaVu Sans'</span></span><span>]</span><span><br></span><span>plt.rcParams[</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'axes.unicode_minus'</span></span><span>] = </span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>False</span></span><span><br></span><span>plt.style.use(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'seaborn-v0_8'</span></span><span>) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用美观样式</span></span><span><br></span></code></pre>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 定义输入范围</span></span><span><br></span><span>x = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1000</span></span><span>)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 定义画图函数(单张图)</span></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>plot_activation</span></span><span style="line-height: 26px"><span>(func, grad_func, name)</span></span><span>:</span></span><span><br></span><span> y = func(x)</span><span><br></span><span> dy = grad_func(x)</span><span><br></span><span> plt.figure(figsize=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>8</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>))</span><span><br></span><span> plt.plot(x, y, label=name, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span><span><br></span><span> plt.plot(x, dy, label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>f"</span><span style="color: rgba(224, 108, 117, 1); line-height: 26px"><span>{name}</span></span><span>'s derivative"</span></span><span>, linestyle=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'--'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span><span><br></span><span> plt.title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>f'</span><span style="color: rgba(224, 108, 117, 1); line-height: 26px"><span>{name}</span></span><span> Function and Its Derivative'</span></span><span>)</span><span><br></span><span> plt.legend()</span><span><br></span><span> plt.grid(</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span> plt.axhline(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'black'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span><span><br></span><span> plt.axvline(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'black'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span><span><br></span><span> plt.show()</span><span><br></span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 定义画图函数(多张图,用于对比不同参数的效果)</span></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>plot_activations</span></span><span style="line-height: 26px"><span>(functions, x)</span></span><span>:</span></span><span><br></span><span> plt.figure(figsize=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>7</span></span><span>))</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>for</span></span><span> func, grad_func, name </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>in</span></span><span> functions:</span><span><br></span><span> y = func(x)</span><span><br></span><span> dy = grad_func(x)</span><span><br></span><span> plt.plot(x, y, label=name, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span><span><br></span><span> plt.plot(x, dy, label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>f"</span><span style="color: rgba(224, 108, 117, 1); line-height: 26px"><span>{name}</span></span><span>'s derivative"</span></span><span>, linestyle=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'--'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span><span><br></span><span><br></span><span> plt.title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Activation Functions and Their Derivatives'</span></span><span>)</span><span><br></span><span> plt.legend()</span><span><br></span><span> plt.grid(</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span> plt.axhline(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'black'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span><span><br></span><span> plt.axvline(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'black'</span></span><span>, linewidth=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span><span><br></span><span> plt.show()</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>接下来,让我们开始吧!</span></p>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>经典激活函数</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Sigmoid</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>适用于二分类问题的输出层,将输出压缩到 (0,1) 区间表示概率。不推荐用于隐藏层,因易导致梯度消失。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sigmoid</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sigmoid_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> s = sigmoid(x)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> s * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - s)</span><span><br></span><span><br></span><span>plot_activation(sigmoid, sigmoid_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Sigmoid'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<p><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123455226-311035028.png" alt="image" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>
<p> </p>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Tanh(双曲正切)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Tanh 输出零中心化,使梯度更新方向更均衡,收敛更快,是一种比 Sigmoid 更优的激活函数,适合隐藏层使用,尤其在 RNN 中仍有应用。但它仍可能梯度消失。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>tanh</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.tanh(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>tanh_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - np.tanh(x)**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span><br></span><span><br></span><span>plot_activation(tanh, tanh_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Tanh'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<p><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123553415-1604024155.png" alt="image" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>
<p> </p>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Linear</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要用于回归任务的输出层,保持输出为原始实数,不进行非线性变换。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>不适合用在隐藏层(否则整个网络等价于单层线性模型,无法学习非线性特征)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>在某些特定模型(如自编码器的中间层或策略网络)中也可能使用。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>linear</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>linear_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.ones_like(x)</span><span><br></span><span><br></span><span>plot_activation(linear, linear_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Linear'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 446.363px !important" data-ratio="0.6593245227606461" data-type="png" data-w="681" data-imgfileid="100003027" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEhk4wpxxr8Q5O0TugEdqANOicwfMv3Q9hWSPlRqQZMOLWNOJe9N1nSibA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=2" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="3"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Softmax</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>多分类问题的输出层标准激活函数,将输出转化为概率分布。不用于隐藏层。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>from</span></span><span> mpl_toolkits.mplot3d </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> Axes3D</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softmax</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> exp_x = np.exp(x - np.max(x, axis=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, keepdims=</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 数值稳定</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> exp_x / np.sum(exp_x, axis=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, keepdims=</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softmax_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> s = softmax(x).reshape(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.diagflat(s) - np.dot(s, s.T) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># Jacobian矩阵</span></span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 生成输入数据(二维,便于可视化)</span></span><span><br></span><span>x = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>100</span></span><span>)</span><span><br></span><span>y = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>10</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>100</span></span><span>)</span><span><br></span><span>X, Y = np.meshgrid(x, y)</span><span><br></span><span>inputs = np.vstack().T</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 计算Softmax输出(取第一个维度作为输出值,因为Softmax输出是概率分布)</span></span><span><br></span><span>outputs = np.array( </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>for</span></span><span> p </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>in</span></span><span> inputs]).reshape(X.shape)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 计算梯度(取Jacobian矩阵的第一个对角线元素)</span></span><span><br></span><span>gradients = np.array( </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>for</span></span><span> p </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>in</span></span><span> inputs]).reshape(X.shape)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 绘制Softmax函数</span></span><span><br></span><span>fig = plt.figure(figsize=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>12</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>))</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 1. Softmax函数图像</span></span><span><br></span><span>ax1 = fig.add_subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>121</span></span><span>, projection=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'3d'</span></span><span>)</span><span><br></span><span>ax1.plot_surface(X, Y, outputs, cmap=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'viridis'</span></span><span>, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.8</span></span><span>)</span><span><br></span><span>ax1.set_title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Softmax (First Output Dimension)'</span></span><span>)</span><span><br></span><span>ax1.set_xlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'x1'</span></span><span>)</span><span><br></span><span>ax1.set_ylabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'x2'</span></span><span>)</span><span><br></span><span>ax1.set_zlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'P(x1)'</span></span><span>)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 2. Softmax梯度图像</span></span><span><br></span><span>ax2 = fig.add_subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>122</span></span><span>, projection=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'3d'</span></span><span>)</span><span><br></span><span>ax2.plot_surface(X, Y, gradients, cmap=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'plasma'</span></span><span>, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.8</span></span><span>)</span><span><br></span><span>ax2.set_title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Gradient of Softmax (∂P(x1)/∂x1)'</span></span><span>)</span><span><br></span><span>ax2.set_xlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'x1'</span></span><span>)</span><span><br></span><span>ax2.set_ylabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'x2'</span></span><span>)</span><span><br></span><span>ax2.set_zlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Gradient'</span></span><span>)</span><span><br></span><span><br></span><span>plt.tight_layout()</span><span><br></span><span>plt.show()</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 316.355px !important" data-ratio="0.4672897196261682" data-type="png" data-w="1070" data-imgfileid="100003030" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE5U1u6e59T0wzKzIicuNyZ7I1AZKzafTria3TltC0XWq1NKFBL0gWQ1iaw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=3" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="4"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>ReLU 函数及其变体</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>ReLU(Rectified Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>中文名称是线性整流函数,是在神经网络中常用的激活函数。通常意义下,其指代数学中的斜坡函数。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>relu</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.maximum(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>relu_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> (x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>).astype(float)</span><span><br></span><span><br></span><span>plot_activation(relu, relu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'RelU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003029" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE7NjITMMH4vhUXCC7l2oiaEQWqYickcHwibqN85oUeL8pgn4nSTcnknFGw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=4" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="5"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>ReLU6</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>ReLU6 是 ReLU 的有界版本,输出限制在 区间。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要用于移动端和轻量级网络(如 MobileNet、EfficientNet 的早期版本),其有界性有助于提升低精度推理(如量化)时的稳定性。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>也常见于强化学习(如 DQN)中,用于限制输出范围,防止训练波动。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>或:</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>relu6</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.minimum(np.maximum(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x), </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span>)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>relu6_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> dx = np.zeros_like(x)</span><span><br></span><span> dx[(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>) & (x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span>)] = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> dx</span><span><br></span><span><br></span><span>plot_activation(relu6, relu6_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ReLU6'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 657px !important; height: 449px !important" data-ratio="0.6834094368340944" data-type="png" data-w="657" data-imgfileid="100003032" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEAlWxYic3dUpxDOkpEInz7ib7mmics19tUgwLlSsKb0P240aD5JuE2x2AA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=5" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="6"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Leaky ReLU</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Leaky ReLU 是对传统 ReLU 的改进,它试图解决“死亡 ReLU”问题,即某些神经元可能永远不会再激活的问题。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<span style="cursor: pointer" data-tool="mdnice编辑器">通常固定取</span>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>leaky_relu</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, x * alpha)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>leaky_relu_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>)</span></span><span>:</span></span><span><br></span><span> dx = np.ones_like(x)</span><span><br></span><span> dx = alpha</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> dx</span><span><br></span><span><br></span><span>plot_activation(leaky_relu, leaky_relu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Leaky ReLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003033" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEbRwA8SNJsTMasvibh7H8muwRAKJwZlcgu61GbFeYLoRljAuR93eO3Eg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=6" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="7"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>PReLU(Parametric ReLU)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>上一节的 Leaky ReLU 是“固定小斜率”,而 PReLU 将该斜率变为可学习参数,表达能力更强。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>prelu</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, alpha * x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>prelu_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>, alpha)</span><span><br></span><span><br></span><span>functions_to_plot = [</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'PReLU α=0.1'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'PReLU α=0.25'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: prelu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'PReLU α=0.5'</span></span><span>)</span><span><br></span><span>]</span><span><br></span><span>plot_activations(functions_to_plot, x)</span><span><br></span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 497.843px !important" data-ratio="0.7353658536585366" data-type="png" data-w="820" data-imgfileid="100003034" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE0RcictS9WvjCPGAYSQ9dp10QLR21eveiaxM6FUvIeZc0kNcCS8U8xbXA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=7" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="8"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>RReLU(Randomized ReLU)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>RReLU是一种在训练时使用随机斜率的变体ReLU激活函数,而在测试时则采用固定的斜率。其主要目的是为了减少过拟合并解决“死亡ReLU”问题。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>由于 RReLU 在训练时使用的是一个区间内的随机值,而测试时使用的是固定值。为了简化起见,这里使用一个确定性的斜率(例如训练过程中使用的平均斜率)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>以下代码实现了 RReLU 函数及其导数,并使用了一个介于 lower 和 upper 之间的固定斜率来代替随机选择的过程,以便进行可视化。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>在实际应用中,对于每个负输入值,斜率会在给定范围内随机选择,但在测试或推理阶段,通常会使用所有可能斜率的平均值。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>rrelu</span></span><span style="line-height: 26px"><span>(x, lower=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>8.</span></span><span>, upper=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3.</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 在实际应用中,这里的a应该是在之间随机选取的</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 但为了绘图方便,我们取平均值作为固定的a</span></span><span><br></span><span> a = (lower + upper) / </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, a * x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>rrelu_grad</span></span><span style="line-height: 26px"><span>(x, lower=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>8.</span></span><span>, upper=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3.</span></span><span>)</span></span><span>:</span></span><span><br></span><span> a = (lower + upper) / </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span><br></span><span> dx = np.ones_like(x)</span><span><br></span><span> dx = a</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> dx</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: rrelu(x), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: rrelu_grad(x), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'RReLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003031" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE9IwcMe0H1d7KbUI9mXT0e5tvkDSSEhAS1fPhQVzYhHibiawIdHROKvbA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=8" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="9"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>ELU(Exponential Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>ELU 旨在解决传统激活函数在深度神经网络中可能遇到的一些问题,例如梯度消失和“死亡神经元”问题。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它能产生负值输出,使激活均值接近零,加速收敛。适合深层网络,训练稳定性优于 ReLU,但计算稍慢。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>elu</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, alpha * (np.exp(x) - </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>elu_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>, elu(x, alpha) + alpha)</span><span><br></span><span><br></span><span>functions_to_plot = [</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ELU α=0.1'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ELU α=0.25'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu_grad(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ELU α=0.5'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu_grad(x,</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ELU α=1'</span></span><span>),</span><span><br></span><span> (</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: elu_grad(x,</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ELU α=2'</span></span><span>)</span><span><br></span><span>]</span><span><br></span><span>plot_activations(functions_to_plot, x)</span><span><br></span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 497.843px !important" data-ratio="0.7353658536585366" data-type="png" data-w="820" data-imgfileid="100003035" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEgOfmkF4KvN0SWLrh6rmpEAk3FIgdqhdksETNibpdR4r65qvibeYFZ08A/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=9" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="10"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>SELU(Scaled Exponential Linear Units)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SELU 是一种自归一化激活函数,它能够使得神经网络的输出在一定条件下自动趋近于零均值和单位方差,从而有助于加速训练过程,并且有可能提高模型的性能。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SELU激活函数是由Günter Klambauer等人在2017年的论文《Self-Normalizing Neural Networks》中提出的。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># SELU 参数(论文中推荐值)</span></span><span><br></span><span>lambda_s = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0507009873554804934193349852946</span></span><span><br></span><span>alpha_s = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.673261549988240216825385979984</span></span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>selu</span></span><span style="line-height: 26px"><span>(x, lambda_=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0507009873554804934193349852946</span></span><span>, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.673261549988240216825385979984</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> lambda_ * np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, alpha * (np.exp(x) - </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>selu_grad</span></span><span style="line-height: 26px"><span>(x, lambda_=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0507009873554804934193349852946</span></span><span>, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.673261549988240216825385979984</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> lambda_ * np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>, alpha * np.exp(x))</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 调用plot_activation绘制SELU及其导数</span></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: selu(x), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: selu_grad(x), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'SELU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003037" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEQCEq7PYDuOgFQTyhwj0icsicEvY3CHwomajNkzDsBiaJCpdOvfickerdNA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=10" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="11"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>CELU(Continuously Differentiable Exponential Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>CELU 是 ELU 的改进版本,保证了在 x = 0 处连续可导(平滑性优于 ELU),有助于优化稳定性。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>与 ELU 类似,能产生负值激活,促进神经元平均输出接近零,适合深层网络训练。在某些对梯度平滑性要求较高的任务中可作为 ReLU、ELU 的替代选择,但计算成本略高。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>celu</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x, alpha * (np.exp(x / alpha) - </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>celu_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> dx = np.ones_like(x)</span><span><br></span><span> dx = np.exp(x / alpha)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> dx</span><span><br></span><span><br></span><span>plot_activation(celu, celu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"CELU"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003038" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEuziabCJXxNe4xWG8NYjJictsiaLMuvh8Jfgyr8E8g45xc5ktpicoCSqPjA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=11" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="12"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>GELU (Gaussian Error Linear Unit,高斯误差线性单元)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>GELU 是 Transformer 等现代架构(如 BERT)的标准激活函数,平滑且非单调,在 NLP 和大模型中广泛使用。它的性能优于 ReLU,逐渐成为ReLU的替代选择。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>GELU 是由 Dan Hendrycks 和 Kevin Gimpel 在2016年的论文《Gaussian Error Linear Units (GELUs)》中提出。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其中,</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>可以用双曲正切函数(tanh)近似表示,常见形式为:</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>因此,实际计算中经常使用近似公式:</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>gelu</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span> * x * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.tanh(np.sqrt(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> / np.pi) * (x + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.044715</span></span><span> * np.power(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>))))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>gelu_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 导数计算较为复杂,这里简化处理</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span> * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.tanh(np.sqrt(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> / np.pi) * (x + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.044715</span></span><span> * np.power(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>)))) + \</span><span><br></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span> * x * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - np.tanh(np.sqrt(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> / np.pi) * (x + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.044715</span></span><span> * np.power(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>)))**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>) * \</span><span><br></span><span> (np.sqrt(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> / np.pi) * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.134145</span></span><span> * np.power(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)))</span><span><br></span><span><br></span><span>plot_activation(gelu, gelu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"GELU"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003036" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEQncsp78Oayuiayxeb9Va72piao2A4ibJ44KrncyqJk7LHb2ALbbt0rwgQ/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=12" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="13"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>现代高性能激活函数</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Swish</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>由 Google 提出,在某些深度模型中表现优于 ReLU,尤其在注意力机制和移动端模型中有效。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其中,</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>而 β 是一个可学习参数。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>swish</span></span><span style="line-height: 26px"><span>(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-beta*x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>swish_grad</span></span><span style="line-height: 26px"><span>(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)</span></span><span>:</span></span><span><br></span><span> s = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-beta*x))</span><span><br></span><span> f = x * s</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> f + (s * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - f)) * beta</span><span><br></span><span><br></span><span>plot_activation(swish, swish_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Swish"</span></span><span>)</span><span><br></span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003039" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEGLbN7JZtk8DL2UZKmPXavPnk2cqlQxQQ3p7FeUddpdVHaicVWv2tmeg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=13" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="14"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>SiLU(Sigmoid Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SiLU 是 Swish 激活函数在 (beta=1) 时的特例。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>实现与图像可参考 Swish。</span></p>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>E-Swish</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>E-Swish 是 SiLU 的缩放版本,通过超参数 (beta) 增强非线性表达能力。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>假设 beta 为1.5:</span></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>eswish</span></span><span style="line-height: 26px"><span>(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> beta * x * sigmoid(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>eswish_grad</span></span><span style="line-height: 26px"><span>(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> s = sigmoid(x)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> beta * s * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + x * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - s))</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: eswish(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: eswish_grad(x, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.5</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'E-Swish'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003040" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEp3unrywe1riavwfYbSLbmoGW1H0fRy8rld6oVic4agA5dbNNNsWiaEjbg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=14" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="15"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Mish</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Mish 是一种自门控(self-gated)的非单调激活函数,由Diganta Misra在2019年的论文《Mish: A Self Regularized Non-Monotonic Neural Activation Function》中提出。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它在深度学习中表现出色,尤其在图像分类等任务中,性能常优于ReLU及其变体(如Swish、Leaky ReLU等)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>mish</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x * np.tanh(np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(x)))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>mish_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> sp = np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(x))</span><span><br></span><span> tanh_sp = np.tanh(sp)</span><span><br></span><span> sech2_sp = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - tanh_sp**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> tanh_sp + x * sech2_sp * sigmoid(x)</span><span><br></span><span><br></span><span>plot_activation(mish, mish_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Mish'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003041" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEUgBpKtS3NiaCBlHDOcxvTJMq670ibTz7XMuzIibtGawd8xwAbVeppxa0w/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=15" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="16"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>SQNL(Square Nonlinearity)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SQNL 激活函数使用平方算子引入所需的非线性,其特点是计算操作次数更少。其在多层感知器人工神经网络架构问题中的收敛速度更快。此外,该函数的导数是线性的,因此梯度计算速度更快。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SQNL 激活函数是由Adedamola Wuraola等人在2018年的论文《SQNL: A New Computationally Efficient Activation Function》中提出的。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sqnl</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>,</span><span><br></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x - (x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>,</span><span><br></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-2</span></span><span>, x + (x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>)))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sqnl_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>,</span><span><br></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - x/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>,</span><span><br></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + x/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>)))</span><span><br></span><span><br></span><span>plot_activation(sqnl, sqnl_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'SQNL'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 446.363px !important" data-ratio="0.6593245227606461" data-type="png" data-w="681" data-imgfileid="100003043" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEOAWBMguxYz3rxQuE7Ir2rmlrakSQU83mvvqm1R5IoNSl0rKCFMGZqg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=16" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="17"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Bent Identity</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Bent Identity 是一种平滑、非单调、可微、无上界的激活函数,输出接近输入值但带有轻微非线性弯曲(“bent”)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它适用于回归任务或自编码器的隐藏层,尤其在需要保留输入结构的同时引入轻微非线性变换的场景。其导数始终大于 0.5,避免梯度消失,适合浅层网络或需要稳定梯度的训练过程。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>但由于计算涉及平方根,速度较慢,不常用于大规模深度网络。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>另可参阅:https://www.gabormelli.com/RKB/Bent_Identity_Activation_Function</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>bent_identity</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> (np.sqrt(x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>) - </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>) / </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> + x</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>bent_identity_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> * np.sqrt(x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)) + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span><br></span><span><br></span><span>plot_activation(bent_identity, bent_identity_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Bent Identity'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 676px !important; height: 449px !important" data-ratio="0.6642011834319527" data-type="png" data-w="676" data-imgfileid="100003044" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE5cXZhMB6HLbicbPvKMrKvGQDQGMhNAV8BE9Hrst8eXSLLWCibczZZL6g/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=17" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="18"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>门控与组合型激活函数</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>GLU (Gated Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>GLU 是一种门控机制激活函数,通过将输入的一部分作为“门”来调制另一部分的输出,增强了模型的表达能力。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>GLU 广泛应用于 Transformer 变体(如 GLU Variants in GLU-Transformer)、序列模型(如 CNN-based NLP 模型)和语音任务中。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>相比传统激活函数,GLU 能更灵活地控制信息流动,提升建模能力。常见变体包括 SwiGLU、ReLU-Glu 等,在大模型(如 Llama 系列)中表现优异。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> numpy </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>as</span></span><span> np</span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> matplotlib.pyplot </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>as</span></span><span> plt</span><span><br></span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>from</span></span><span> mpl_toolkits.mplot3d </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>import</span></span><span> Axes3D</span><span><br></span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>glu_2d</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""二维输入版本的GLU"""</span></span><span><br></span><span> a, b = x[..., </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>], x[..., </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>] </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 分割输入的两个维度</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> a * sigmoid(b) </span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>plot_glu_2d</span></span><span style="line-height: 26px"><span>()</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 创建二维输入网格</span></span><span><br></span><span> x = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>50</span></span><span>)</span><span><br></span><span> y = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>50</span></span><span>)</span><span><br></span><span> X, Y = np.meshgrid(x, y)</span><span><br></span><span> xy = np.stack(, axis=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 组合成(50,50,2)的输入</span></span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 计算GLU输出</span></span><span><br></span><span> Z = glu_2d(xy)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 3D可视化</span></span><span><br></span><span> fig = plt.figure(figsize=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>12</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span>))</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 1. 激活函数曲面</span></span><span><br></span><span> ax1 = fig.add_subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>121</span></span><span>, projection=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'3d'</span></span><span>)</span><span><br></span><span> ax1.plot_surface(X, Y, Z, cmap=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'viridis'</span></span><span>, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.8</span></span><span>)</span><span><br></span><span> ax1.set_title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'GLU: a * σ(b)'</span></span><span>)</span><span><br></span><span> ax1.set_xlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Input a'</span></span><span>)</span><span><br></span><span> ax1.set_ylabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Input b'</span></span><span>)</span><span><br></span><span> ax1.set_zlabel(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Output'</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 2. 梯度场切片(固定b=0时的梯度)</span></span><span><br></span><span> ax2 = fig.add_subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>122</span></span><span>)</span><span><br></span><span> b_zero_idx = np.abs(y).argmin() </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 找到b=0的索引</span></span><span><br></span><span> grad_at_b0 = Z * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - Z) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># ∂(aσ(b))/∂a = σ(b)</span></span><span><br></span><span> ax2.plot(x, grad_at_b0, label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'∂GLU/∂a at b=0'</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'blue'</span></span><span>)</span><span><br></span><span> ax2.plot(x, np.zeros_like(x), label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'∂GLU/∂b at b=0'</span></span><span>, color=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'red'</span></span><span>) </span><span><br></span><span> ax2.set_title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Gradient Slices at b=0'</span></span><span>)</span><span><br></span><span> ax2.legend()</span><span><br></span><span> ax2.grid(</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> plt.tight_layout()</span><span><br></span><span> plt.show()</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 执行可视化</span></span><span><br></span><span>plot_glu_2d()</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 341.007px !important" data-ratio="0.5037037037037037" data-type="png" data-w="1080" data-imgfileid="100003045" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE2DOWHgSicnU7qtmwP0OLF1vUqCVjFdMrr0Zl0QOgAerH6E6eDc9sMOg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=18" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="19"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Maxout</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Maxout 是一种分段线性激活函数,定义为多个线性变换的最大值。它是一种可学习的分段线性激活函数,具有很强的表达能力——理论上,只要有足够多的片段,它可以逼近任意凸函数。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它与 Dropout 结合使用时表现优异,曾广泛用于全连接网络。但由于每个 Maxout 单元需要 k 倍参数(即 k 个 W_i, b_i),参数量大、计算开销高,因此在现代 CNN 或大模型中较少使用。适合对模型表达力要求高、但对计算资源不敏感的研究性任务。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>maxout</span></span><span style="line-height: 26px"><span>(x, w1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, w2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1.0</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Maxout 简化版(k=2)用于可视化:</span><span><br></span><span> f(x) = max(w1*x + b1, w2*x + b2)</span><span><br></span><span> </span><span><br></span><span> 常用设置:w1=1, w2=-1 → f(x) = max(x, -x) = |x|(绝对值)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.maximum(w1 * x + b1, w2 * x + b2)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>maxout_grad</span></span><span style="line-height: 26px"><span>(x, w1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, w2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1.0</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Maxout 梯度:根据哪个线性函数被激活返回对应权重</span><span><br></span><span> """</span></span><span><br></span><span> linear1 = w1 * x + b1</span><span><br></span><span> linear2 = w2 * x + b2</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(linear1 >= linear2, w1, w2)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 可视化:f(x) = max(x, -x) = |x|</span></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: maxout(x, w1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, w2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: maxout_grad(x, w1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, w2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Maxout (k=2, |x|)'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003042" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEby7Jm9qUOCYdia6JSvZmE9Qicxxeq0JvYuvzv8CX6XmtnPtbqINP7dwQ/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=19" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="20"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>SReLU (S-shaped Rectified Linear Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SReLU 是一种参数自适应的 S 形激活函数,能够根据数据自动学习激活曲线的形状,兼具线性和饱和特性。适用于需要灵活非线性变换的全连接网络或卷积网络,在某些图像分类和回归任务中表现优于 ReLU 和 ELU。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其设计目标是模拟生物神经元的响应特性,在深度模型中可提升表达能力。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>但由于引入了四个可学习参数(每通道或共享),增加了模型复杂度,训练成本较高,目前应用不如 ReLU 或 GELU 广泛。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>srelu</span></span><span style="line-height: 26px"><span>(x, tl=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, al=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, tr=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, ar=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x <= tl, tl + al * (x - tl),</span><span><br></span><span> np.where(x < tr, x,</span><span><br></span><span> tr + ar * (x - tr)))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>srelu_grad</span></span><span style="line-height: 26px"><span>(x, tl=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, al=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, tr=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, ar=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x <= tl, al,</span><span><br></span><span> np.where(x < tr, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, ar))</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: srelu(x, tl=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, al=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, tr=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, ar=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: srelu_grad(x, tl=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, al=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, tr=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, ar=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'SReLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003046" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEM6h2OVpWa0f9swo0c30bicbg6mJWqmjmNqrgEKOxuEDad06CGMwyP1Q/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=20" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="21"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>CReLU (Concatenated ReLU)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>CReLU 是一种受“CNN模型中滤光片成对”启发而发展出来的一种改进 ReLU 激活函数。由Wenling Shang等人于2016年在论文《Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units》中提出。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>crelu</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> 输出维度翻倍</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.concatenate(, axis=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>crelu_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> CReLU 梯度:返回 </span><span><br></span><span> </span><span><br></span><span> 注意:ReLU(-x) 对 x 的导数是:</span><span><br></span><span> - 如果 x < 0: ReLU(-x) = -x, 导数为 -1</span><span><br></span><span> - 如果 x >= 0: ReLU(-x) = 0, 导数为 0</span><span><br></span><span> => 即: -LeakyReLU(-x, negative_slope=1) 或 -H(x<0)</span><span><br></span><span> </span><span><br></span><span> 所以:</span><span><br></span><span> d/dx ReLU(-x) = -1 if x < 0 else 0</span><span><br></span><span> """</span></span><span><br></span><span> grad_positive = relu_grad(x) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># ReLU(x) 的梯度: 1 if x > 0 else 0</span></span><span><br></span><span> grad_negative = np.where(x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># ReLU(-x) 的梯度: -1 if x < 0 else 0</span></span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.concatenate(, axis=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>plot_crelu_separate</span></span><span style="line-height: 26px"><span>()</span></span><span>:</span></span><span><br></span><span> x = np.linspace(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-3</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1000</span></span><span>)</span><span><br></span><span> y = crelu(x)</span><span><br></span><span> grad = crelu_grad(x)</span><span><br></span><span> </span><span><br></span><span> plt.figure(figsize=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>12</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>))</span><span><br></span><span> </span><span><br></span><span> plt.subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)</span><span><br></span><span> plt.plot(x, y[:len(x)], label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ReLU(x)'</span></span><span>)</span><span><br></span><span> plt.plot(x, y, label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ReLU(-x)'</span></span><span>)</span><span><br></span><span> plt.title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'CReLU: '</span></span><span>)</span><span><br></span><span> plt.legend()</span><span><br></span><span> plt.grid(</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> plt.subplot(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)</span><span><br></span><span> plt.plot(x, grad[:len(x)], label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"d/dx ReLU(x)"</span></span><span>, linestyle=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'--'</span></span><span>,)</span><span><br></span><span> plt.plot(x, grad, label=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"d/dx ReLU(-x)"</span></span><span>, linestyle=</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'--'</span></span><span>,)</span><span><br></span><span> plt.title(</span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'CReLU Gradient'</span></span><span>)</span><span><br></span><span> plt.legend()</span><span><br></span><span> plt.grid(</span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>True</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> plt.tight_layout()</span><span><br></span><span> plt.show()</span><span><br></span><span><br></span><span>plot_crelu_separate()</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 278.949px !important" data-ratio="0.41203703703703703" data-type="png" data-w="1080" data-imgfileid="100003049" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEaN29O6NczIl1YV6pfAcyiateIHjFy1qEKx17dq1aaQDUvYr3VpPGibibw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=21" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="22"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>特殊用途与研究型函数</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Softplus</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Softplus 是 ReLU 的平滑近似版本,输出始终为正,且处处连续可导。当 x 很大时趋近于 x,当 x 很小时趋近于 0。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>适用于需要平滑、非线性、非饱和(无上界)激活的场景,如:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>变分自编码器(VAE)中用于生成方差参数(保证正值);</span></li>
<li><span>强化学习中的策略网络输出层;</span></li>
<li><span>需要避免 ReLU “神经元死亡” 问题但又希望保持单侧软饱和特性的任务。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其主要缺点是计算开销较大(涉及指数和对数),且在 x 很大时可能产生数值溢出,需做稳定处理(如 torch.nn.Softplus 内部实现会做裁剪)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>作为理论性质良好的激活函数,常用于概率建模和生成模型中。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softplus</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softplus_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> sigmoid(x)</span><span><br></span><span><br></span><span>plot_activation(softplus, softplus_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Softplus'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003047" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEibNkrcoicYiahnjAvElsYXlrcfnNmJf86YG1Ub1t3GYfP4AhQHg483eZg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=22" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="23"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Softsign</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Softsign 是 Tanh 的替代品,输出范围 (−1,1),具有平滑的S形曲线但计算更简单。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>应用场景主要有:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>替代Tanh/Sigmoid:需平滑饱和激活时(如RNN、生成模型)。</span></li>
<li><span>对抗梯度消失:梯度衰减比Tanh更缓慢,适合深层网络。</span></li>
<li><span>低精度训练:计算无指数运算,对量化友好。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>优点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>计算高效:仅需一次除法和绝对值运算(比Tanh快约2倍)。</span></li>
<li><span>梯度平缓:最大梯度为1(对比Tanh的0.25),缓解梯度消失。</span></li>
<li><span>输出归一化:天然将输入压缩到(−1,1),避免数值爆炸。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>缺点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>饱和区梯度趋零:当∣x∣→∞ 时梯度接近0,可能拖慢训练。</span></li>
<li><span>非零中心化:输出均值不为零(类似Sigmoid),需配合BatchNorm。</span></li>
<li><span>表达能力有限:非线性弱于Swish等新型激活函数。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softsign</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.abs(x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>softsign_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / ((</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.abs(x)) ** </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)</span><span><br></span><span><br></span><span>plot_activation(softsign, softsign_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Softsign"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 446.363px !important" data-ratio="0.6593245227606461" data-type="png" data-w="681" data-imgfileid="100003048" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEYMMpNqpnR0vYNbqEXqyLicx5E9rCARAHicQONn85hiaHWJZZjuPSP2ojw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=23" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="24"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Sine</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Sine 是一种周期性、有界、平滑振荡的激活函数。与 ReLU、Sigmoid 等传统激活函数不同,它具有无限多的极值点和零点,能自然地建模周期性或高频信号。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要适用于:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>神经隐式表示(Neural Implicit Representations),如 SIREN(Sinusoidal Representation Networks),用于表示图像、音频、3D 形状等连续信号;</span></li>
<li><span>函数逼近任务,尤其是包含周期性、振荡行为的物理系统建模(如波函数、机械振动);</span></li>
<li><span>需要高频率细节重建的场景(如超分辨率、神经辐射场 NeRF 的变体)。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>虽然不适用于通用深度分类网络,但在特定科学计算和表示学习任务中表现出色。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sine</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.sin(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sine_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.cos(x)</span><span><br></span><span><br></span><span>plot_activation(sine, sine_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Sine"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 446.363px !important" data-ratio="0.6593245227606461" data-type="png" data-w="681" data-imgfileid="100003050" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE5K3lNMoeUvhkgTD0S74xSviaTicx4X1PDicM5pb9Mic1G86FeXpjPQa1WQ/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=24" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="25"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Cosine</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Cosine 是一种周期性、有界、偶函数的激活函数,与 Sine 类似,输出在 [-1, 1] 之间振荡,具有平滑性和无限可导性。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>虽然不作为标准神经网络的通用激活函数使用,但在以下特定场景中有应用价值:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>周期性信号建模:在函数逼近任务中,用于表示具有固定周期的连续信号(如音频、电磁波);</span></li>
<li><span>位置编码的替代或补充:在 Transformer 或神经隐式场中,与 Sine 配合使用构建更丰富的周期基函数;</span></li>
<li><span>对比学习中的相似度建模:cos(x) 本身是余弦相似度的核心,某些自定义层可能直接使用 cos(x) 作为非线性变换;</span></li>
<li><span>神经隐式表示(Neural Implicit Fields):与 Sine 一起用于构建高频基函数,例如在 Fourier Feature Networks 中作为输入映射的一部分。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>与 sin(x) 类似,cos(x) 作为激活函数时对权重初始化敏感,且其全局振荡特性可能导致训练不稳定。因此,它同样不适用于通用前馈网络的隐藏层,仅在特定结构或表示学习任务中使用。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>与 Sine 的区别:cos(x) = sin(x + π/2),即余弦是正弦的相位偏移版本。在建模能力上两者等价,但 cos(0) = 1,而 sin(0) = 0,因此 cos(x) 在零点有最大响应,更适合需要“中心对称高响应”的场景。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>cosine</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.cos(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>cosine_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> -np.sin(x)</span><span><br></span><span><br></span><span>plot_activation(cosine, cosine_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Cosine"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 677px !important; height: 446.363px !important" data-ratio="0.6593245227606461" data-type="png" data-w="681" data-imgfileid="100003053" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEdS7xUqEGnbicCMdvgbR6yNFL7h8TiaOCrETL3gAl0oy0hy7zF4iaibhO5g/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=25" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="26"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Sinc (归一化或非归一化正弦函数)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Sinc 是一种振荡衰减型激活函数,具有无限支撑但随 |x| 增大而幅度减小。其特性源于信号处理中的理想低通滤波器和插值核。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要特点是在中心在 0 处有一个主峰,向两边衰减并振荡,幅度逐渐减小。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>虽然在标准深度学习中极少使用,但在以下特定领域有潜在价值:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>信号与图像重建任务:在神经隐式表示中用于建模带限信号(band-limited signals),理论上可完美重建奈奎斯特频率以下的信号;</span></li>
<li><span>插值网络:设计用于上采样或超分辨率的网络中,作为先验引导的激活函数;</span></li>
<li><span>物理信息神经网络(PINN):在需要满足特定频域约束的微分方程求解中,Sinc 的频域稀疏性可能带来优势;</span></li>
<li><span>傅里叶相关架构:作为输入特征映射的一部分,增强模型对周期性和频率结构的感知能力。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>也有一些注意的地方:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>Sinc 函数在 x = 0 处不可导(需特殊处理),且存在多个零点和振荡,容易导致梯度不稳定;</span></li>
<li><span>计算开销较大(涉及 sin 和除法),且在 |x| 较大时梯度接近零,易造成训练困难;</span></li>
<li><span>目前仍属研究性激活函数,未在主流模型中广泛应用。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>总体来说,Sinc 是一种理论性质优良但训练挑战大的激活函数,适用于对信号保真度要求高的科学计算任务,不适合通用深度网络。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>归一化形式:</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>非归一化形式:</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>在数学和物理中常见非归一化形式,而在信号处理(尤其是数字信号处理)中通常使用归一化形式。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现(归一化形式)</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sinc</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 避免除以零,对于 x=0 的情况,sinc 函数定义为 1</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(np.abs(x) < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, np.sin(np.pi * x) / (np.pi * x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sinc_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># sinc(x) = sin(πx) / (πx)</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用商法则求导: (u/v)' = (u'v - uv') / v^2</span></span><span><br></span><span> pi_x = np.pi * x</span><span><br></span><span> sin_pi_x = np.sin(pi_x)</span><span><br></span><span> cos_pi_x = np.cos(pi_x)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 分母为零时的处理</span></span><span><br></span><span> small = np.abs(x) < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 正常情况下的导数</span></span><span><br></span><span> grad = (pi_x * cos_pi_x - sin_pi_x) / (pi_x ** </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 在 x=0 处导数为 0</span></span><span><br></span><span> grad = np.where(small, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, grad)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> grad</span><span><br></span><span><br></span><span>plot_activation(sinc, sinc_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Sinc"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像(归一化形式)</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 673px !important; height: 449px !important" data-ratio="0.6671619613670133" data-type="png" data-w="673" data-imgfileid="100003051" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEiaAkZehRtE6hRb9ajpjJLWz4U7zM7ELaPnNFoxcVObL5iaVrUvgzya8A/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=26" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="27"></span>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现(非归一化形式)</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sinc_unscaled</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 避免除以零,对于 x=0 的情况,sinc 函数定义为 1</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(np.abs(x) < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, np.sin(x) / (x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>sinc_unscaled_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用商法则求导: (sin(x)/x)' = (x*cos(x) - sin(x)) / x^2</span></span><span><br></span><span> sin_x = np.sin(x)</span><span><br></span><span> cos_x = np.cos(x)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 处理 x=0 的极限情况(此时导数为0)</span></span><span><br></span><span> small = np.abs(x) < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span><br></span><span> grad = np.where(small, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, (x * cos_x - sin_x) / (x ** </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>))</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> grad</span><span><br></span><span><br></span><span>plot_activation(sinc_unscaled, sinc_unscaled_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"Sinc_unscaled"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像(非归一化形式)</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 673px !important; height: 449px !important" data-ratio="0.6671619613670133" data-type="png" data-w="673" data-imgfileid="100003055" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEggh9zW0juXz2rDwDXqRbC9ymPToJB1JxPgMTNicUOry2X7LJ7jljuiaA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=27" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="28"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>ArcTan</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>ArcTan 是一种有界、平滑、单调递增的激活函数,输出范围为 </span></p>
<span>,接近饱和时梯度趋近于零。</span>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其特点包括:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>输出自动归一化到有限区间,有助于稳定训练;</span></li>
<li><span>处处连续可导,无尖锐转折;</span></li>
<li><span>比 Tanh 更缓慢地饱和,对异常值更鲁棒。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>适用场景:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>回归任务的输出层,当输出需要有界但不强制在 [-1,1] 时(相比 Tanh 更宽);</span></li>
<li><span>RBF 网络或函数逼近系统中作为隐藏层激活,用于建模平滑非线性映射;</span></li>
<li><span>强化学习策略网络,输出连续动作且需限制范围;</span></li>
<li><span>某些物理系统建模中,需要输出对输入变化敏感但又不爆炸的场景。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>arctan</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.arctan(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>arctan_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.power(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>))</span><span><br></span><span><br></span><span>plot_activation(arctan, arctan_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"ArcTan"</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 673px !important; height: 449px !important" data-ratio="0.6671619613670133" data-type="png" data-w="673" data-imgfileid="100003052" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTETl7nBd4O8kjvEZyzz8acuqI1fTDq7gib8jXQCriaWaR1RUpj9G3FXcaQ/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=28" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="29"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>LogSigmoid</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>LogSigmoid 是 Sigmoid 的对数形式,核心价值在于数值稳定的损失计算,是深度学习框架内部实现的重要组成部分,但一般不直接作为网络层的激活函数暴露给用户。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>log_sigmoid</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> 公式等价于:f(x) = -softplus(-x)</span><span><br></span><span> 输出范围: (-∞, 0)</span><span><br></span><span> 注意:在 x 很大时稳定,但 x 很小时可能下溢。</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> -np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>log_sigmoid_stable</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> 数值稳定的 LogSigmoid 实现,避免 exp(-x) 溢出。</span><span><br></span><span> 使用分段函数:</span><span><br></span><span> x >= 0: -log(1 + exp(-x))</span><span><br></span><span> x < 0: x - log(1 + exp(x))</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>,</span><span><br></span><span> -np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-x)),</span><span><br></span><span> x - np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(x)))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>log_sigmoid_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> LogSigmoid 的梯度。恰好等于 Sigmoid 函数本身</span><span><br></span><span> 推导:</span><span><br></span><span> f(x) = log(σ(x)) = -log(1 + exp(-x))</span><span><br></span><span> f'(x) = σ(x) = 1 / (1 + exp(-x))</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> sigmoid(x)</span><span><br></span><span><br></span><span>plot_activation(log_sigmoid_stable, log_sigmoid_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'LogSigmoid'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003054" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEVzDrMtpCrbx3PB1mlgM7PDEOgVkhsiaiaReavGxFVia6eic7kNbAAy81LA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=29" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="30"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>自动化搜索与结构创新</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>TanhExp (Tanh Exponential Activation)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>TanhExp 是一种结合指数与双曲正切的自门控激活函数,在保持 ReLU 风格的同时增强非线性表达能力,适合对性能有更高要求的视觉任务。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>TanhExp 是 Xinyu Liu等人于 2020 年在论文《TanhExp: A smooth activation function with high convergence speed for lightweight neural networks》中提出的。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>tanhexp</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x * np.tanh(np.exp(x))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>tanhexp_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> TanhExp 梯度(使用链式法则)</span><span><br></span><span> f(x) = x * tanh(exp(x))</span><span><br></span><span> f'(x) = tanh(exp(x)) + x * sech^2(exp(x)) * exp(x)</span><span><br></span><span> """</span></span><span><br></span><span> exp_x = np.exp(x)</span><span><br></span><span> tanh_e = np.tanh(exp_x)</span><span><br></span><span> sech2_e = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - tanh_e**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># sech^2(x) = 1 - tanh^2(x)</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> tanh_e + x * sech2_e * exp_x</span><span><br></span><span><br></span><span>plot_activation(tanhexp, tanhexp_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'TanhExp'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003056" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEAT7f1PBLB0iblKWtoxpLDZPh8lsQqso4tib2wSIQlKLPFFVr1ibsTYic2A/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=30" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="31"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>PAU (Power Activation Unit)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>PAU 是一种基于幂函数的可学习激活函数。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>其主要应用于适合研究场景,因为计算开销大、稳定性差,不推荐用于主流深度学习模型或大规模网络。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>在实际应用中,更推荐使用 Swish、GELU 等高效且稳定的激活函数。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>pau</span></span><span style="line-height: 26px"><span>(x, a1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, a2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> PAU简化版,K=2</span><span><br></span><span> f(x) = a1 * x^b1 + a2 * x^b2</span><span><br></span><span> </span><span><br></span><span> 注意:</span><span><br></span><span> - 当 x < 0 且 b_k 非整数时,x^b_k 可能为复数</span><span><br></span><span> - 此处使用 np.power 并允许 warning(或限制 b_k 为整数)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 处理负数的幂运算(避免复数)</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 方法:对负数取绝对值并保留符号</span></span><span><br></span><span> </span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>safe_power</span></span><span style="line-height: 26px"><span>(x, b)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.sign(x) * np.power(np.abs(x), b)</span><span><br></span><span> </span><span><br></span><span> term1 = a1 * safe_power(x, b1)</span><span><br></span><span> term2 = a2 * safe_power(x, b2)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> term1 + term2</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>pau_grad</span></span><span style="line-height: 26px"><span>(x, a1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, a2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> 修正后的梯度计算:</span><span><br></span><span> f'(x) = a1*b1*x^(b1-1) + a2*b2*x^(b2-1)</span><span><br></span><span> (严格处理x=0和负数情况)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>safe_grad</span></span><span style="line-height: 26px"><span>(x, a, b)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 处理x=0和负数</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>if</span></span><span> b == </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>:</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.ones_like(x) * a</span><span><br></span><span> mask = x >= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span><br></span><span> pos_part = a * b * np.power(np.maximum(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span>), b</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>) * mask</span><span><br></span><span> neg_part = a * b * np.power(np.maximum(-x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-7</span></span><span>), b</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-1</span></span><span>) * (~mask)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> pos_part + neg_part</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> safe_grad(x, a1, b1) + safe_grad(x, a2, b2)</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: pau(x, a1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, a2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: pau_grad(x, a1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, a2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, b1=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, b2=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'PAU (Power Activation Unit)'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003058" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEgpibuy7Vs5fYHoTdgRhmxDHhgtk84otqQATOdEnCiawqO1BhHOOLjXBw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=31" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="32"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Learnable Sigmoid</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Learnable Sigmoid 是标准 Sigmoid 的可学习扩展版本。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>learnable_sigmoid</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Learnable Sigmoid: f(x) = 1 / (1 + exp(-(alpha * x + beta)))</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入</span><span><br></span><span> alpha: 控制斜率(>1 更陡,<1 更平缓)</span><span><br></span><span> beta: 控制偏移(>0 右移,<0 左移)</span><span><br></span><span> </span><span><br></span><span> 输出范围: (0, 1)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-(alpha * x + beta)))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>learnable_sigmoid_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Learnable Sigmoid 梯度:</span><span><br></span><span> f(x) = sigmoid(alpha*x + beta)</span><span><br></span><span> f'(x) = alpha * f(x) * (1 - f(x))</span><span><br></span><span> """</span></span><span><br></span><span> s = learnable_sigmoid(x, alpha, beta)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> alpha * s * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - s)</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: learnable_sigmoid(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: learnable_sigmoid_grad(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Learnable Sigmoid (α=2.0)'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003057" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEO7lKoH3mN9yrFDkia5PZDqXvkjqiciacx65icWyfibP6H4ibyUhXUpux0qicA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=32" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="33"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Parametric Softplus</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Parametric Softplus 是标准 Softplus 函数的可学习扩展版本。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>parametric_softplus</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Parametric Softplus: f(x) = (1/β) * log(1 + exp(α * x))</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入</span><span><br></span><span> alpha: 输入缩放因子(>1 更陡,<1 更平缓)</span><span><br></span><span> beta: 输出温度系数(>1 更平滑,<1 更陡峭)</span><span><br></span><span> </span><span><br></span><span> 输出范围: (0, ∞)</span><span><br></span><span> </span><span><br></span><span> 注意:</span><span><br></span><span> - 当 alpha*x 过大时,exp(alpha*x) 可能溢出</span><span><br></span><span> - 此处使用数值稳定版本(见梯度部分)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 数值稳定版本:避免 exp 溢出</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用恒等式:log(1 + exp(z)) = z + log(1 + exp(-z)) for z > 0</span></span><span><br></span><span> z = alpha * x</span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 分段处理</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(z > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>20</span></span><span>, z / beta,</span><span><br></span><span> np.where(z < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-20</span></span><span>, np.exp(z) / beta,</span><span><br></span><span> np.log(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(z)) / beta))</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>parametric_softplus_grad</span></span><span style="line-height: 26px"><span>(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Parametric Softplus 梯度:</span><span><br></span><span> f(x) = log(1 + exp(αx)) / β</span><span><br></span><span> f'(x) = (α / β) * sigmoid(αx)</span><span><br></span><span> """</span></span><span><br></span><span> sigmoid_alpha_x = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> + np.exp(-alpha * x))</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> (alpha / beta) * sigmoid_alpha_x</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: parametric_softplus(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: parametric_softplus_grad(x, alpha=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Parametric Softplus (α=2.0, β=0.5)'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003060" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTE4EtKPuaQyOYgd7ueUMQj7X75R68rgOVd8AicVj0BP7jBcDIdeBYP2cA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=33" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="34"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Dynamic ReLU</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Dynamic ReLU 是一种内容感知的可变形激活函数,其参数(如斜率、阈值)由输入数据动态生成,而非全局共享。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它最初用于轻量级网络(如 RepVGG、DyNet),能显著提升性能而几乎不增加计算量。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>这里实现一个简化版本。</span></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>dynamic_relu</span></span><span style="line-height: 26px"><span>(x, global_context=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, a_min=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, a_max=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.2</span></span><span>, b_scale=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Dynamic ReLU (简化版本,方便可视化)</span><span><br></span><span> </span><span><br></span><span> 假设 'global_context' 是来自输入的统计量(如均值、最大值)</span><span><br></span><span> 用它生成负半轴的斜率 a 和偏置 b</span><span><br></span><span> </span><span><br></span><span> f(x) = </span><span><br></span><span> a * x + b, x < 0</span><span><br></span><span> x, x >= 0</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入</span><span><br></span><span> global_context: 模拟全局上下文(如 batch 的均值)</span><span><br></span><span> a_min, a_max: 动态斜率范围</span><span><br></span><span> b_scale: 动态偏置的缩放因子</span><span><br></span><span> </span><span><br></span><span> 注意:真实版本中 a,b 由小型网络生成</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 模拟动态参数生成(真实中为小型网络)</span></span><span><br></span><span> a = a_min + (a_max - a_min) * sigmoid(global_context) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># a ∈ </span></span><span><br></span><span> b = b_scale * tanh(global_context) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># b ∈ [-b_scale, b_scale]</span></span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, a * x + b, x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>dynamic_relu_grad</span></span><span style="line-height: 26px"><span>(x, global_context=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, a_min=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.01</span></span><span>, a_max=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.2</span></span><span>, b_scale=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Dynamic ReLU 梯度</span><span><br></span><span> 注意:a 和 b 依赖于 global_context,但在对 x 求导时视为常数</span><span><br></span><span> """</span></span><span><br></span><span> a = a_min + (a_max - a_min) * sigmoid(global_context)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, a, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 可视化:固定 global_context = 1.0(激活动态性)</span></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: dynamic_relu(x, global_context=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: dynamic_relu_grad(x, global_context=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Dynamic ReLU (context=1.0)'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003059" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEXGV5QxFZCiaH2T85KHD1jxk4qhJic62RvnZYH3ibJLKxWBYZsJ0Lg6Ylg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=34" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="35"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>EvoNorm</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>EvoNorm 并非传统意义上的激活函数,而是一类结合归一化与激活的层,由 Google Research 在论文《Evolving Normalization-Activation Functions》中提出。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它的目标是替代 BN + ReLU 或 BN + Swish,在无 Batch Normalization 的情况下提供稳定激活。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>由于 EvoNorm 依赖于 统计量(方差、均值),这里做简化实现。</span></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>evonorm_b0</span></span><span style="line-height: 26px"><span>(x, gamma=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, v=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, eps=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> EvoNorm-B0:</span><span><br></span><span> f(x) = gamma * x / sqrt(v * x^2 + (1-v) * running_v + eps) + beta * x</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> v: 控制动态方差权重(可学习)</span><span><br></span><span> running_v: 运行时方差(训练中更新)</span><span><br></span><span> </span><span><br></span><span> 此处为简化版:使用固定 v 和模拟 running_v</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 模拟运行方差(真实中为 EMA 更新)</span></span><span><br></span><span> running_v = np.mean(x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 简化:用当前方差</span></span><span><br></span><span> var_dynamic = v * x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> + (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - v) * running_v</span><span><br></span><span> x_normalized = x / np.sqrt(var_dynamic + eps)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> gamma * x_normalized + beta * x</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>evonorm_b0_grad</span></span><span style="line-height: 26px"><span>(x, gamma=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, v=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, eps=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> EvoNorm-B0 梯度(简化版)</span><span><br></span><span> """</span></span><span><br></span><span> running_v = np.mean(x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>)</span><span><br></span><span> var_dynamic = v * x**</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> + (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - v) * running_v</span><span><br></span><span> denom = np.sqrt(var_dynamic + eps)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 简化梯度</span></span><span><br></span><span> grad = gamma / denom + beta</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> grad</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: evonorm_b0(x, gamma=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, v=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: evonorm_b0_grad(x, gamma=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.0</span></span><span>, beta=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>, v=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'EvoNorm-B0'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003061" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTER6TGyxdROmWnlOFlOicB9MYP4u8iagbluF7sNkOOZa5FsMibeicYk7vrlg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=35" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="36"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>Transformer 与大模型专用</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>GeGLU(Gated Linear Unit using GELU)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>GeGLU 是 GLU 家族的一种重要变体,使用 GELU 作为门控非线性函数,相比原始 GLU(使用 Sigmoid)具有更平滑、更现代的特性。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要应用于以下几类场景:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>Transformer 及其变体 的前馈网络(FFN)中,作为 Linear -> Activation -> Linear 的替代;</span></li>
<li><span>被 Google 的 T5 模型、DeepMind 的 Chinchilla 以及 PaLM 等大型语言模型广泛采用;</span></li>
<li><span>在视觉 Transformer(ViT)、多模态模型中也较为常见。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>geglu</span></span><span style="line-height: 26px"><span>(x, split_ratio=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> GeGLU: Gated GELU</span><span><br></span><span> f(x) = x1 * GELU(x2)</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入向量(简化模拟数组)</span><span><br></span><span> split_ratio: 分割比例(默认 0.5 → 均分)</span><span><br></span><span> </span><span><br></span><span> 注意:</span><span><br></span><span> - 真实中为线性变换后拆分</span><span><br></span><span> - 此处简化:直接将输入 x 拆为 x1 和 x2</span><span><br></span><span> """</span></span><span><br></span><span> mid = int(len(x) * split_ratio)</span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 为可视化,我们让 x1 和 x2 共享相同的 x 轴(广播模拟)</span></span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 实际中 x1 和 x2 是线性投影后的不同特征</span></span><span><br></span><span> x1 = x</span><span><br></span><span> x2 = x </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 简化:假设 x2 与 x1 相同(仅用于形状匹配)</span></span><span><br></span><span> </span><span><br></span><span> gelu_x2 = gelu(x2) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用已定义的 GELU 函数</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x1 * gelu_x2</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>geglu_grad</span></span><span style="line-height: 26px"><span>(x, split_ratio=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> GeGLU 梯度(简化版)</span><span><br></span><span> f(x) = x * GELU(x)</span><span><br></span><span> f'(x) = GELU(x) + x * GELU'(x)</span><span><br></span><span> """</span></span><span><br></span><span> g = gelu(x)</span><span><br></span><span> g_grad = gelu_grad(x)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> g + x * g_grad</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 可视化:注意 GeGLU 本质是逐元素操作,但依赖门控</span></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: geglu(x, split_ratio=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: geglu_grad(x, split_ratio=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'GeGLU (x * GELU(x))'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 672px !important; height: 449px !important" data-ratio="0.6681547619047619" data-type="png" data-w="672" data-imgfileid="100003065" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTELtSyeT6RiaJumqBuibz20rWEiaEO1dSXmEh9icr1I9vO7wASdAgWfy4Svw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=36" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="37"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>SwiGLU</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>SwiGLU 是当前大语言模型中最流行的门控机制之一。它是 GLU 家族 的一种高性能变体,使用 Swish(或 SiLU) 作为门控函数,结合了门控机制与平滑非线性的优势。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要应用于:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>Llama 系列大模型(Llama, Llama2, Llama3)的前馈网络(FFN)中,作为核心激活结构;</span></li>
<li><span>其他现代大语言模型(如 Phi-2、Falcon)中也有使用;</span></li>
<li><span>替代传统的 ReLU + Linear 或 GeGLU 结构,提升模型表达能力;</span></li>
<li><span>特别适合自回归语言建模任务。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>优点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>门控机制允许模型动态控制信息流动,增强非线性表达;</span></li>
<li><span>Swish 函数平滑且无上界、有下界,梯度特性优于 ReLU 和 Sigmoid;</span></li>
<li><span>在相同参数量下,SwiGLU 比 ReLU、GeLU、GeGLU 等具有更强的建模能力;</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>swiglu</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> SwiGLU: x1 * Swish(x2)</span><span><br></span><span> 简化版:假设 x1 == x2 == x</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x * swish(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>swiglu_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> SwiGLU 梯度:</span><span><br></span><span> f(x) = x * swish(x)</span><span><br></span><span> f'(x) = swish(x) + x * swish'(x)</span><span><br></span><span> """</span></span><span><br></span><span> s = sigmoid(x)</span><span><br></span><span> swish_val = x * s</span><span><br></span><span> swish_grad = s + x * s * (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span> - s)</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> swish_val + x * swish_grad</span><span><br></span><span><br></span><span>plot_activation(swiglu, swiglu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'SwiGLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 672px !important; height: 449px !important" data-ratio="0.6681547619047619" data-type="png" data-w="672" data-imgfileid="100003062" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEMtc3icmNCqh9gAMKGibV0gzWPUBuqg1IkgCUUg5oeREvwadd8woNRKQQ/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=37" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="38"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>ReGLU</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>ReGLU 是 GLU(Gated Linear Unit) 家族的一种变体,使用 ReLU 作为门控函数,结合了门控机制与稀疏非线性的特性。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>它是在 Google 的《GLU Variants Improve Transformer》中提出的。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要应用于:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>Transformer 的前馈网络(FFN) 中,作为传统 Linear -> ReLU -> Linear 结构的改进;</span></li>
<li><span>在某些高效模型或早期 GLU 变体研究中出现;</span></li>
<li><span>与 GeGLU、SwiGLU 并列,作为探索不同门控函数性能的基准之一。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>优点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>门控机制允许模型动态控制信息流动,增强表达能力;</span></li>
<li><span>ReLU 计算简单、速度快,无指数或 erf 运算,效率高;</span></li>
<li><span>相比标准 FFN,引入了特征交互,提升建模能力。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>缺点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>与ReLU 类似,存在“神经元死亡”问题,可能导致部分门控通道永久关闭;</span></li>
<li><span>不如 GeGLU 或 SwiGLU 平滑,在训练稳定性上略逊一筹;</span></li>
<li><span>实验表明,在大模型中性能通常低于 SwiGLU 和 GeGLU。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>reglu</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> ReGLU: x1 * ReLU(x2)</span><span><br></span><span> 简化:x1 == x2 == x</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x * relu(x)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>reglu_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> ReGLU 梯度:</span><span><br></span><span> f(x) = x * max(0, x)</span><span><br></span><span> f'(x) = max(0,x) + x * (1 if x>0 else 0)</span><span><br></span><span> = ReLU(x) + (x if x>0 else 0)</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, x + x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span><span><br></span><span><br></span><span>plot_activation(reglu, reglu_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'ReGLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 672px !important; height: 449px !important" data-ratio="0.6681547619047619" data-type="png" data-w="672" data-imgfileid="100003063" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEFZW8oVGVT6icgStdhT2C1gjscdIv87XqNoZu64ciadGW9ASJlCA32u9w/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=38" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="39"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>轻量化与边缘设备专用</span></span></h2>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Hard Swish</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Hard Swish 是 Swish 函数的分段线性近似,计算效率高,无指数或 sigmoid 操作,特别适合移动端和嵌入式设备上的深度网络(如 MobileNetV3、EfficientNet-Lite)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>在保持接近 Swish 性能的同时,可以显著降低计算开销。一般用于资源受限场景下的隐藏层激活。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>hard_swish</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> x * np.clip(x + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span>) / </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>hard_swish_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> cond1 = x <= </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-3</span></span><span><br></span><span> cond2 = x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where(cond1, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, np.where(cond2, (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span>*x + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>3</span></span><span>)/</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>6</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>))</span><span><br></span><span><br></span><span>plot_activation(hard_swish, hard_swish_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Hard Swish'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 665px !important; height: 449px !important" data-ratio="0.675187969924812" data-type="png" data-w="665" data-imgfileid="100003064" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEbRicqsmMibL4jPzH4gzsvTV61bIiaLVf2kZE5icBUdOajIk2OdmSKj0gNg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=39" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="40"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>Hard Sigmoid</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>Hard Sigmoid 是标准 Sigmoid 的分段线性近似,计算更高效,输出范围 。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>hard_sigmoid</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.clip((</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.2</span></span><span> * x) + </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.5</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>hard_sigmoid_grad</span></span><span style="line-height: 26px"><span>(x)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.where((x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-2.5</span></span><span>) & (x < </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2.5</span></span><span>), </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.2</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>)</span><span><br></span><span><br></span><span>plot_activation(hard_sigmoid, hard_sigmoid_grad, </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'Hard Sigmoid'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003067" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEEZQLXPzictpOCTpliaav6cXRHLp8TiaoEepUyia0HaicP75iaYg6fibQRJsMA/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=40" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="41"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>QuantReLU(Quantized ReLU)</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>QuantReLU 并不是一个“新”的非线性函数,而是 ReLU 与量化操作的结合,用于模拟或实现低精度神经网络(如 INT8、INT4 甚至二值化网络)。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要应用于:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>模型压缩与加速:在移动端、嵌入式设备或边缘计算中部署轻量模型;</span></li>
<li><span>量化感知训练(Quantization-Aware Training, QAT):在训练时模拟量化误差,提升量化后模型精度;</span></li>
<li><span>低比特神经网络:配合定点运算硬件(如 TPU、NPU)提升推理效率。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>quant_relu</span></span><span style="line-height: 26px"><span>(x, levels=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> Quantized ReLU: 将 ReLU 的输出限制在有限个离散值上。将 区间等分为 levels + 1 个离散值(例如 levels=4 → 0, 0.25, 0.5, 0.75, 1.0)。</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入数组</span><span><br></span><span> levels: 量化等级数(例如 4 表示 )。</span><span><br></span><span> 返回:</span><span><br></span><span> 量化后的 ReLU 输出</span><span><br></span><span> """</span></span><span><br></span><span> x_clipped = np.clip(x, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 先做 ReLU 并限制在 </span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.round(x_clipped * levels) / levels</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>quant_relu_grad</span></span><span style="line-height: 26px"><span>(x, levels=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> QuantReLU 的梯度(前向传播时为 1,但反向传播中量化不可导,通常使用 STE - Straight-Through Estimator)</span><span><br></span><span> 这里使用 STE:梯度在有效区间内视为 1,其余为 0</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> (x > </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>).astype(float)</span><span><br></span><span><br></span><span>plot_activation(</span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: quant_relu(x, levels=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>), </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: quant_relu_grad(x, levels=</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>4</span></span><span>), </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'QuantReLU'</span></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003066" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEEdkVDfKJiaJSYT3Mz7bNO9P9Icew120Fe3Xyrib8ic7xzxCVBgVHnRLxw/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=41" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="42"></span>
<h3 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-radius: 0; box-shadow: none; display: flex; flex-direction: unset; float: unset; height: auto; justify-content: center; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 17px; color: rgba(89, 89, 89, 1); border-bottom: 2px solid rgba(222, 198, 251, 1); line-height: 1.5em; letter-spacing: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); border-top: 1px none rgba(0, 0, 0, 1); border-left: 1px none rgba(0, 0, 0, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; box-shadow: none; display: inline; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; padding: 0; margin: 0"><span>LUT-based Activation</span></span></h3>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>LUT-based Activation 是一种通用、灵活的非线性建模方法,将激活函数视为一个“黑箱”映射,通过查表实现,而非解析表达式。</span></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>主要应用于:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>硬件加速与边缘计算:在 FPGA、ASIC 或低功耗芯片上,查表比计算 exp, tanh, erf 等函数更快、更节能;</span></li>
<li><span>模型压缩:用小规模 LUT 替代复杂激活函数(如 GELU、Swish),减少计算开销;</span></li>
<li><span>可学习激活函数:让网络自动学习最优的非线性形状(如 PULP、PAU 等);</span></li>
<li><span>神经拟态计算(Neuromorphic Computing):模拟生物神经元的非线性响应。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>优点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>计算高效:O(1) 查表操作,适合资源受限设备;</span></li>
<li><span>表达能力强:理论上可逼近任意一维函数;</span></li>
<li><span>易于硬件实现:只需存储器和索引逻辑;</span></li>
<li><span>支持可学习机制:LUT 条目可作为参数训练。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>缺点:</span></p>
<ul class="list-paddingleft-1" style="list-style-type: circle; margin: 8px 0; padding: 0 0 0 25px; color: rgba(0, 0, 0, 1)">
<li><span>内存占用:高精度 LUT 需要大量存储(如 8-bit 精度需 256 项,10-bit 需 1024 项);</span></li>
<li><span>维度灾难:难以扩展到多维激活(如 (x,y) 映射),通常限于逐元素(element-wise)操作;</span></li>
<li><span>插值误差:对未在表中的值需插值(线性/最近邻),可能引入噪声;</span></li>
<li><span>训练挑战:稀疏梯度更新,需结合 STE(直通估计器)或平滑插值。</span></li>
</ul>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>公式</span></strong></p>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>实现</span></strong></p>
<pre data-tool="mdnice编辑器"><code style="overflow-x: auto; padding: 15px 16px 16px; color: rgba(171, 178, 191, 1); background: rgba(40, 44, 52, 1); border-radius: 5px; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>lut_activation</span></span><span style="line-height: 26px"><span>(x, lut=None, input_range=</span><span style="line-height: 26px"><span>(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-5</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>)</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> 基于查找表 (LUT) 的激活函数。</span><span><br></span><span> </span><span><br></span><span> 参数:</span><span><br></span><span> x: 输入数组</span><span><br></span><span> lut: 查找表,形状为 (N,),对应 input_range 内的 N 个等分点</span><span><br></span><span> input_range: LUT 覆盖的输入范围</span><span><br></span><span> </span><span><br></span><span> 返回:</span><span><br></span><span> 插值得到的激活值</span><span><br></span><span> """</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>if</span></span><span> lut </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>is</span></span><span style="color: rgba(86, 182, 194, 1); line-height: 26px"><span>None</span></span><span>:</span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 默认使用类似 Sigmoid 的查找表作为示例</span></span><span><br></span><span> lut = np.array([</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.00</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.05</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.12</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.20</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.30</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.40</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.50</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.60</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.70</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.80</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.88</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.95</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.00</span></span><span>])</span><span><br></span><span> </span><span><br></span><span> x_clipped = np.clip(x, input_range[</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>], input_range[</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>])</span><span><br></span><span> x_norm = (x_clipped - input_range[</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>]) / (input_range[</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>] - input_range[</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0</span></span><span>]) </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 归一化到 </span></span><span><br></span><span> indices = x_norm * (len(lut) - </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1</span></span><span>)</span><span><br></span><span> </span><span><br></span><span> </span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 使用 numpy 插值</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> np.interp(indices, np.arange(len(lut)), lut)</span><span><br></span><span><br></span><span style="line-height: 26px"><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>def</span></span><span> </span><span style="color: rgba(97, 174, 238, 1); line-height: 26px"><span>lut_activation_grad</span></span><span style="line-height: 26px"><span>(x, lut=None, input_range=</span><span style="line-height: 26px"><span>(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-5</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>)</span></span><span>)</span></span><span>:</span></span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>"""</span><span><br></span><span> LUT 激活函数的近似梯度(使用有限差分法)</span><span><br></span><span> """</span></span><span><br></span><span> h = </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1e-5</span></span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>return</span></span><span> (lut_activation(x + h, lut, input_range) - lut_activation(x - h, lut, input_range)) / (</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>2</span></span><span> * h)</span><span><br></span><span><br></span><span style="color: rgba(92, 99, 112, 1); font-style: italic; line-height: 26px"><span># 创建一个示例 LUT(模拟非线性变换)</span></span><span><br></span><span>example_lut = np.array([</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.0</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.1</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.15</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.25</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.4</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.6</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.75</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.85</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.9</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>0.95</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>1.0</span></span><span>])</span><span><br></span><span><br></span><span>plot_activation(</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: lut_activation(x, lut=example_lut, input_range=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-5</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>)),</span><span><br></span><span> </span><span style="color: rgba(198, 120, 221, 1); line-height: 26px"><span>lambda</span></span><span> x: lut_activation_grad(x, lut=example_lut, input_range=(</span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>-5</span></span><span>, </span><span style="color: rgba(209, 154, 102, 1); line-height: 26px"><span>5</span></span><span>)),</span><span><br></span><span> </span><span style="color: rgba(152, 195, 121, 1); line-height: 26px"><span>'LUT-based Activation'</span></span><span><br></span><span>)</span><span><br></span></code></pre>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><strong style="color: rgba(145, 109, 213, 1); font-weight: bold; background: none left top / auto no-repeat scroll padding-box border-box rgba(0, 0, 0, 0); width: auto; height: auto; border-radius: 0; border: 3px none rgba(0, 0, 0, 0.4); padding: 0; margin: 0"><span>图像</span></strong></p>
<span><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123545071-988419985.png" alt="Image" class="rich_pages wxw-img js_img_placeholder wx_img_placeholder" style="display: block; margin: 0 auto; max-width: 100%; border: 3px none rgba(0, 0, 0, 0.4); border-radius: 8px; object-fit: fill; box-shadow: 0 0 rgba(0, 0, 0, 0); width: 669px !important; height: 449px !important" data-ratio="0.6711509715994021" data-type="png" data-w="669" data-imgfileid="100003069" data-src="https://mmbiz.qpic.cn/mmbiz_png/J8ufl5q3zgOJgw7WsL0LtID9Ov9wldTEatLEKBP2UrrKwa5ictB5ovkKicrL9KdlhIacelaiaY5aF3gsnUghacovg/640?wx_fmt=png&from=appmsg&watermark=1#imgIndex=42" data-original-style="display: block;margin: 0px auto;max-width: 100%;border-style: none;border-width: 3px;border-color: rgba(0, 0, 0, 0.4);border-radius: 8px;object-fit: fill;box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;height: auto !important;" data-index="43"></span>
<h2 style="margin: 30px 0 15px; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; border-radius: 0; box-shadow: none; display: block; flex-direction: unset; float: unset; height: auto; justify-content: unset; line-height: 1.5em; overflow-x: unset; overflow-y: unset; text-align: left; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; border: 1px none rgba(0, 0, 0, 1); padding: 0" data-tool="mdnice编辑器"><span style="font-size: 18px; color: rgba(89, 89, 89, 1); line-height: 1.8em; letter-spacing: 0; padding: 0 0 0 10px; border-top: 1px none rgba(0, 0, 0, 1); border-bottom: 1px none rgba(0, 0, 0, 1); border-left: 5px solid rgba(222, 198, 251, 1); border-right: 1px none rgba(0, 0, 0, 1); border-radius: 0; align-items: unset; background: none left top / auto no-repeat scroll padding-box border-box unset; box-shadow: none; display: block; font-weight: bold; flex-direction: unset; float: unset; height: auto; justify-content: unset; overflow-x: unset; overflow-y: unset; text-align: left; text-indent: 0; text-shadow: none; transform: none; width: auto; -webkit-box-reflect: unset; margin: 0"><span>整体对比</span></span></h2>
<p style="color: rgba(89, 89, 89, 1); font-size: 14px; line-height: 1.8em; letter-spacing: 0.02em; text-align: left; text-indent: 0; padding: 8px 0; margin: 0" data-tool="mdnice编辑器"><span>下表从数学形式、输出范围、是否可导、是否零中心、计算成本、适用场景、优点、缺点等维度,总结了常用的 20 多种激活函数的性质。</span></p>
<p><img src="https://img2024.cnblogs.com/blog/609124/202509/609124-20250928123428662-633691049.png" alt="image" loading="lazy"></p>
<p> </p>
<span><br></span>
</div>
<div id="MySignature" role="contentinfo">
『注:本文来自博客园“小溪的博客”,若非声明均为原创内容,请勿用于商业用途,转载请注明出处http://www.cnblogs.com/xiaoxi666/』<br><br>
来源:https://www.cnblogs.com/xiaoxi666/p/19116433
頁:
[1]