数据分析师的SQL功底该学到什么程度?
<p><img title="数据分析师的SQL功底该学到什么程度?" alt="数据分析师的SQL功底该学到什么程度?" border="0" height="auto" src="https://zhuji.jb51.net/uploads/img/202305/0210f51302d5a883b6f28fc443495dad.jpg" width="auto"></p>
<p>
常有朋友问,数据分析师的SQL功底该学到什么程度。今天就先谈谈 T-SQL 中的 Window Function.</p>
<p>
Window Function 包含了 4 个大类。分别是:</p>
<ul>
<li>
1 - Rank Function</li>
<li>
2 - Aggregate Function</li>
<li>
3 - Offset Function</li>
<li>
4 - Distribution Function.</li>
</ul>
<h3>
1 - Rank Function 平常用到最多</h3>
<ul>
<li>
1.1 Rank() Over()</li>
<li>
1.2 Row_Number() Over()</li>
<li>
1.3 Dense_Rank() Over()</li>
<li>
1.4 NTILE(N) Over()</li>
</ul>
<p>
这四个函数,要注意的地方有两点:</p>
<p>
<strong>a. Rank() Over() 与 Row_Number() Over() :</strong></p>
<p>
两者唯一的区别,就在于Row_Number() Over() 真正实现了相同条件的两条或者多条记录是用唯一值来区别的</p>
<p>
<strong>b. Rank() Over() 与 Dense_Rank() Over() :</strong></p>
<p>
这两者的区别,在于他们对位于相同排名之后的名次,是接着相同排名的连续数(Rank) 还是相隔 N 个相同记录个数之后的连续数(Dense_Rank)。</p>
<p>
所以 Rank 出来的结果都是连续数字,而 Dense_Rank 出来的结果有可能有跳格数。</p>
<p>
<strong>c. 除了有用法上的区别外,顺带说说分页的实现:</strong></p>
<p>
第一种,我们平常用 Row_Number() 加 Top (N) 来实现 :</p>
<ol class="dp-sql">
<li class="alt">
<span><span class="keyword">select</span><span> </span><span class="keyword">top</span><span>(100) * </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">from</span><span> ( </span><span class="keyword">select</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> OrderId </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , OrderMonth </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , OrderAmount </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , Row_Number() Over( </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> OrderBy OrderAmount <span class="keyword">DESC</span><span>) </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">AS</span><span> Amt_Order </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">from</span><span> FctSales) tmp </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">Where</span><span> Amt_Order </span><span class="op">between</span><span> 2000 </span><span class="op">and</span><span> 3000 </span></span>
</li>
</ol>
<p>
第二种,SQL Server 2012 之后的新功能:</p>
<ol class="dp-sql">
<li class="alt">
<span><span class="keyword">Select</span><span> OrderId </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , OrderMonth </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , OrderAmount </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">From</span><span> FctSales </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">Order</span><span> </span><span class="keyword">by</span><span> OrderAmount </span><span class="keyword">Desc</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> OffSet 2000 <span class="keyword">ROWS</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">Fetch</span><span> </span><span class="keyword">Next</span><span> 100 </span><span class="keyword">ROWS</span><span> </span><span class="keyword">Only</span><span> </span></span>
</li>
</ol>
<p>
按照量的大小倒序排,取第 2000 条后的记录中前 100 条。</p>
<h3>
2 - Aggregate Function. 聚合数据</h3>
<ul>
<li>
2.1 - Sum() Over()</li>
<li>
2.2 - Count() Over()</li>
<li>
2.3 - AVG() Over()</li>
<li>
2.4 - MIN() Over()</li>
<li>
2.5 - MAX() Over()</li>
</ul>
<p>
在使用 Aggregation 函数的时候,唯一要注意的地方就是 Order 子句。</p>
<ol class="dp-sql">
<li class="alt">
<span><span>function_name(<arguments>) Over( </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <window partition clause>] </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <window <span class="keyword">Order</span><span> clause> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <window frame clause>] </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> ]) </span>
</li>
</ol>
<p>
Over::</p>
<ol class="dp-sql">
<li class="alt">
<span><span>Over( </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <PARTITION <span class="keyword">BY</span><span> clause> ] </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <<span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> clause> ] </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> [ <ROW <span class="op">or</span><span> RANGE clause> ] </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> ) </span>
</li>
</ol>
<p>
::窗口中的窗口</p>
<ol class="dp-sql">
<li class="alt">
<span><span class="keyword">ROWS</span><span> | RANGE </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="op">BETWEEN</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> UNBOUNDED PRECDEDING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <N> PRECEDING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <N> FOLLOWING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">CURRENT</span><span> ROW </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="op">AND</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> UNBOUNDED FOLLOWING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <N> PRECEDING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <N> FOLLOWING | </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">CURRENT</span><span> ROW </span></span>
</li>
</ol>
<p>
举一个例子:</p>
<ol class="dp-sql">
<li class="alt">
<span><span class="keyword">select</span><span> custid </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , ordermonth </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , ordervolume </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> , <span class="func">sum</span><span>(ordervolume) </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> over( partition <span class="keyword">by</span><span> custid </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">order</span><span> </span><span class="keyword">by</span><span> ordermonth </span><span class="keyword">asc</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">rows</span><span> </span><span class="op">between</span><span> </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> unbounded preceding </span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="op">and</span><span> </span><span class="keyword">current</span><span> row) </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">as</span><span> cumulatedVolume </span></span>
</li>
<li>
<span> </span>
</li>
<li class="alt">
<span> <span class="keyword">from</span><span> FctSales </span></span>
</li>
</ol>
<p>
统计了截止到目前为止,每一天的累计总量。</p>
<h3>
3 - Offset Function:定位记录</h3>
<ul>
<li>
3.1 Lead()</li>
<li>
3.2 LAG()</li>
<li>
3.3 First_Value()</li>
<li>
3.4 Last_Value()</li>
<li>
3.5 Nth_Value()</li>
</ul>
<p>
这一类比较好理解,根据当前的记录,获取前后 N 条数据。</p>
<h3>
4 - Distribution Function: 分布函数</h3>
<ul>
<li>
4.1- PERCENT_RANK()</li>
<li>
4.2 - CUME_DIST()</li>
<li>
4.3 - PERCENT_COUNT()-</li>
<li>
4.4 - PERCENT_DISC()</li>
</ul>
<p>
这一类应用,到目前为止,未用过。适用于财会类的统计。</p>
<p>
原文链接:https://mp.weixin.qq.com/s/vnmeSZWEKThfOmHsusDGjA</p>
頁:
[1]