舟越大地 發表於 2020-4-13 17:44:00

【Mongodb】聚合查询 && 固定集合

<p class="ws-title">概述</p>
<div class="ws-content">
<p class="line" data-line="2">数据存储是为了可查询,统计。若数据只需存储,不需要查询,这种数据也没有多大价值</p>
<p class="line" data-line="4">本篇介绍Mongodb</p>
<ul>
<li>聚合查询(Aggregation)</li>
<li>固定集合(Capped Collections)</li>
</ul>
<p>&nbsp;</p>
</div>
<p class="ws-title">准备工作</p>
<div class="ws-content">
<p class="line" data-line="11">准备10000条数据</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">var</span> orders = <span style="color: rgba(0, 0, 255, 1)">new</span><span style="color: rgba(0, 0, 0, 1)"> Array();
</span><span style="color: rgba(0, 0, 255, 1)">for</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> i = 10000; i &lt; 20000; i++<span style="color: rgba(0, 0, 0, 1)">) {
    orders </span>=<span style="color: rgba(0, 0, 0, 1)"> {
      orderNo: i </span>+ Math.random().toString().substr(3, 3<span style="color: rgba(0, 0, 0, 1)">),
      price: Math.round(Math.random() </span>* 10000) / 100<span style="color: rgba(0, 0, 0, 1)">,
      qty: Math.floor(Math.random() </span>* 10) + 1<span style="color: rgba(0, 0, 0, 1)">,
      orderTime: </span><span style="color: rgba(0, 0, 255, 1)">new</span> Date(<span style="color: rgba(0, 0, 255, 1)">new</span> Date().setSeconds(Math.floor(Math.random() * 10000<span style="color: rgba(0, 0, 0, 1)">)))
    };
}
db.order.insert(orders);</span></pre>
</div>
<p>&nbsp;</p>
<p>&nbsp;</p>
</div>
<p class="ws-title">聚合查询</p>
<div class="ws-content">
<p class="line" data-line="27">Mongodb的聚合函数操作都在db.collection.aggregate,通过定义聚合管道(一组规则),达到分组,统计等功能,下面介绍常用的几种聚合函数</p>
<p>&nbsp;</p>
<p><strong>分组管道($group)</strong></p>
<p>格式</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 0, 1)">{
$group:
    {
      _id: </span>&lt;expression&gt;, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Group By Expression</span>
      &lt;field1&gt;: { &lt;accumulator1&gt; : &lt;expression1&gt;<span style="color: rgba(0, 0, 0, 1)"> },
      ...
    }
}</span></pre>
</div>
<p>_id 是分组字段,若指定_id = null 或常量字段,就是将整个结果集分组。</p>
<p class="line" data-line="41">分组统计字段格式{ &lt;accumulator1&gt; : &lt;expression1&gt; }</p>
<p>累计器操作(Accumulator Operator)参考Accumulator Operator</p>
<p>&nbsp;</p>
<p class="line" data-line="47">假设现在需要统计每天每个小时的订单总价格,平均价格,最大,最小,总订单数等</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 0, 1)">db.order.aggregate([
    {
      $group: {
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">分组字段,这里用到$dateToString格式化,这里按小时统计</span>
            _id: { $dateToString: { format: "%Y-%m-%d %H", date: "$orderTime"<span style="color: rgba(0, 0, 0, 1)"> } },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">总价格</span>
            totalPrice: { $sum: "$price"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">分组第一个订单</span>
            firstOrder: { $first: "$orderNo"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">分组最后一个订单</span>
            lastOrder: { $last: "$orderNo"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">平均价格</span>
            averagePrice: { $avg: "$price"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">最大价格</span>
            maxPrice: { $max: "$price"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">最小价格</span>
            minPrice: { $min: "$price"<span style="color: rgba(0, 0, 0, 1)"> },
            </span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">总订单数</span>
            totalOrders: { $sum: 1<span style="color: rgba(0, 0, 0, 1)"> },
      }
    }
])</span></pre>
</div>
<p class="line" data-line="73">返回结果</p>
<div class="cnblogs_code">
<pre>{ "_id" : "2020-04-12 15", "totalPrice" : 172813.68, "firstOrder" : "10000263", "lastOrder" : "19999275", "averagePrice" : 49.20662870159453, "maxPrice" : 99.94, "minPrice" : 0.01, "totalOrders" : 3512<span style="color: rgba(0, 0, 0, 1)"> }
{ </span>"_id" : "2020-04-12 13", "totalPrice" : 80943.98, "firstOrder" : "10004484", "lastOrder" : "19991554", "averagePrice" : 50.780414052697616, "maxPrice" : 99.81, "minPrice" : 0.08, "totalOrders" : 1594<span style="color: rgba(0, 0, 0, 1)"> }
{ </span>"_id" : "2020-04-12 14", "totalPrice" : 181710.15, "firstOrder" : "10001745", "lastOrder" : "19998830", "averagePrice" : 49.76996713229252, "maxPrice" : 99.93, "minPrice" : 0.01, "totalOrders" : 3651<span style="color: rgba(0, 0, 0, 1)"> }
{ </span>"_id" : "2020-04-12 16", "totalPrice" : 63356.12, "firstOrder" : "10002711", "lastOrder" : "19995793", "averagePrice" : 50.97032984714401, "maxPrice" : 99.95, "minPrice" : 0.01, "totalOrders" : 1243 }</pre>
</div>
<p>&nbsp;</p>
<p><strong>筛选管道($match)</strong></p>
<p>格式</p>
<div class="cnblogs_code">
<pre>{ $match: { &lt;query&gt; } }</pre>
</div>
<p>这个比较简单,就是筛选数据</p>
<p class="line" data-line="86">假设我现在需要筛选金额在(10,15)之间的</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 0, 1)">db.orders.aggregate([
    {
      $match: {
            </span>"price": { $gt: 10, $lt: 15<span style="color: rgba(0, 0, 0, 1)"> }
      }
    }
])</span></pre>
</div>
<p>&nbsp;</p>
<p><strong>排序管道($sort)</strong></p>
<p>格式</p>
<div class="cnblogs_code">
<pre>{ $sort: { &lt;field1&gt;: &lt;sort order&gt;, &lt;field2&gt;: &lt;sort order&gt; ... } }</pre>
</div>
<p class="line" data-line="102">指定字段排序,1:升序,-1:倒序</p>
<p>&nbsp;</p>
<p><strong>限制条数($limit)</strong></p>
<p>格式</p>
<div class="cnblogs_code">
<pre>{ $limit: &lt;positive integer&gt; }</pre>
</div>
<p class="line" data-line="109">&nbsp;</p>
<p class="line" data-line="109">Mongodb的聚合管道有很多,具体不一一列出,参考Aggregation Pipeline Stages — MongoDB Manual</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/413875/202004/413875-20200413172622810-829850374.png" alt=""></p>
<p>&nbsp;</p>
<p>带有(aggregation)就是都可以用于聚合管道</p>
<p>说了那么多,其实都没有使用Mongodb聚合函数最强大的功能,就是组合管道使用,查询我们需要数据,因为Mongodb提供的聚合管道函数非常多,所以组合起来使用是非常强大。</p>
值得注意是管道的顺序,Mongodb是按你定义的顺序,将每一步执行的结果集传给下一个管道处理,输出是最后一个管道的结果集,所以不同的管道顺序会有可能得到不是预期的结果,甚至报错(这种情况报错甚至比得到不是预期的结果可能还好)</div>
<div class="ws-content">假设现在按每天小时统计符合下列条件的订单</div>
<div class="ws-content">
<ul>
<li>订单金额大于10元 小于 50元 &amp;&amp; 数量小于等于5 and</li>
<li>去掉金额最小的50条订单 &amp;&amp; 去掉金额最大的50条订单 and</li>
<li>统计每个小时内订单数量,订单金额</li>
<li>按订单金额升序输出</li>
</ul>
<div class="cnblogs_code"><img id="code_img_closed_d60183a9-033a-4e62-907a-4a5043c56ab5" class="code_img_closed" src="https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif" alt=""><img id="code_img_opened_d60183a9-033a-4e62-907a-4a5043c56ab5" class="code_img_opened" style="display: none" src="https://images.cnblogs.com/OutliningIndicators/ExpandedBlockStart.gif" alt="">
<div id="cnblogs_code_open_d60183a9-033a-4e62-907a-4a5043c56ab5" class="cnblogs_code_hide">
<pre><span style="color: rgba(0, 0, 0, 1)">db.order.aggregate([
    {
      $match: { </span>"price": { $gt: 10, $lt: 50 }, "qty": { $lte: 5<span style="color: rgba(0, 0, 0, 1)"> } }
    },
    {
      $sort: {
            </span>"price": -1<span style="color: rgba(0, 0, 0, 1)">
      }
    },
    {
      $skip: </span>50<span style="color: rgba(0, 0, 0, 1)">
    },
    {
      $sort: {
            </span>"price": 1<span style="color: rgba(0, 0, 0, 1)">
      }
    },
    {
      $skip: </span>50<span style="color: rgba(0, 0, 0, 1)">
    },
    {
      $group: {
            _id: { $dateToString: { format: </span>"%Y-%m-%d %H", date: "$orderTime"<span style="color: rgba(0, 0, 0, 1)"> } },
            totalPrice: { $sum: </span>"$price"<span style="color: rgba(0, 0, 0, 1)"> },
            totalOrders: { $sum: </span>1<span style="color: rgba(0, 0, 0, 1)"> }

      }
    },
    {
      $sort: {
            </span>"totalPrice": 1<span style="color: rgba(0, 0, 0, 1)">
      }
    }
])</span></pre>
</div>
<span class="cnblogs_code_collapse">View Code</span></div>
<p><img src="https://img2020.cnblogs.com/blog/413875/202004/413875-20200413172825319-111607599.png" alt=""></p>
<p class="line" data-line="160">解决思路</p>
<ol>
<li>筛选符合条件的记录($match)</li>
<li>按金额倒序($sort:-1)</li>
<li>跳过金额最大的50条记录($skip:50)</li>
<li>按金额升序($sort:1)</li>
<li>跳过金额最小的50条记录($skip:50)</li>
<li>按每天每小时统计($group)</li>
<li>统计结果总金额升序($sort:1)</li>
</ol>
<p>&nbsp;</p>
</div>
<p class="ws-title">固定集合</p>
<div class="ws-content">
<p><strong>概述</strong></p>
<blockquote>
<p>capped-collection are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.</p>
</blockquote>
<p class="line" data-line="174">从上面定义可以看出固定集合具有几个特性</p>
<ul>
<li>固定大小</li>
<li>高吞吐量</li>
<li>根据插入顺序检索文档</li>
<li>超过限制大小覆盖旧的文档</li>
</ul>
<p class="line" data-line="180">根据固定集合特性,固定集合适合用于以下场景</p>
<ul>
<li>只需保留最近的日志查询系统</li>
<li>缓存数据(热点数据)</li>
<li>等等</li>
</ul>
<p class="line" data-line="185">固定集合限制</p>
<ul>
<li>固定集合的大小创建之后不能修改</li>
<li>不能删除固定集合里的文档,只能删除集合再重新建固定集合</li>
<li>固定集合不能使用固定分区</li>
<li>聚合管道$out不能使用在固定集合</li>
</ul>
<p><strong>固定集合使用</strong></p>
<p>&nbsp;1. 创建固定集合</p>
<div class="cnblogs_code">
<pre>db.createCollection("log", { capped : <span style="color: rgba(0, 0, 255, 1)">true</span>, size : 4096, max : 5000 } )</pre>
</div>
<table>
<thead>
<tr><th>字段</th><th>必须</th><th>说明</th></tr>
</thead>
<tbody>
<tr>
<td>capped</td>
<td>是&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
<td>是否创建固定集合</td>
</tr>
<tr>
<td>size</td>
<td>是</td>
<td>固定集合大小,单位:字节</td>
</tr>
<tr>
<td>max</td>
<td>否</td>
<td>文档数量大小限制</td>
</tr>
</tbody>
</table>
<p class="line" data-line="202">size 和 max 是或关系,超出其中一个限制都会覆盖旧文档</p>
<p class="line" data-line="202">&nbsp;</p>
<p class="line" data-line="202">2.&nbsp;检查集合是否固定集合</p>
<div class="cnblogs_code">
<pre>db.collection.isCapped()</pre>
</div>
<p>&nbsp;</p>
<p>3. 将一个非固定的集合转换固定集合</p>
<div class="cnblogs_code">
<pre>db.runCommand({"convertToCapped": "mycoll", size: 100000});</pre>
</div>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>测试固定集合</strong></p>
<p>1.&nbsp;超过限制文档数</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 1. 创建固定集合,大小1M,最大文档数量10</span>
db.createCollection("log", { capped: <span style="color: rgba(0, 0, 255, 1)">true</span>, size: 1024 * 1024, max: 10<span style="color: rgba(0, 0, 0, 1)"> });

</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 2. 插入200条数据</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> i = 0; i &lt; 200; i++<span style="color: rgba(0, 0, 0, 1)">) {
    db.log.insertOne({
      </span>"_id": i + 1<span style="color: rgba(0, 0, 0, 1)">,
      </span>"userId": Math.floor(Math.random() * 1000<span style="color: rgba(0, 0, 0, 1)">),
      </span>"content": "登录" + ("0000" + i).slice(-4<span style="color: rgba(0, 0, 0, 1)">),
      </span>"createTime": <span style="color: rgba(0, 0, 255, 1)">new</span><span style="color: rgba(0, 0, 0, 1)"> Date(),
    });
}</span></pre>
</div>
<p>&nbsp;</p>
<p class="line" data-line="231">再查询现在Mongodb存储情况</p>
<div class="cnblogs_code">
<pre>db.log.stats()</pre>
</div>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/413875/202004/413875-20200413173317235-745569612.png" alt=""></p>
<p class="line" data-line="238">可以看出每个对象都是占有78个字节,因为字段都是定长的</p>
<p class="line" data-line="238">&nbsp;</p>
<p>2. 验证操作存储大小</p>
<blockquote>
<p>If the size field is less than or equal to 4096, then the collection will have a cap of 4096 bytes. Otherwise, MongoDB will raise the provided size to make it an integer multiple of 256.</p>
</blockquote>
<p>如果size的字段设置小于4096,Mongodb将会提供一个256的倍数的数据存储大小</p>
<p class="line" data-line="245">假设256的大小,256 / 78 = 3.282051282051282,应该能存3个文档</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 1. 删除之前固定集合</span>
<span style="color: rgba(0, 0, 0, 1)">db.log.drop();

</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 2. 创建固定集合,size &lt; 78 , 验证是否创建一个256的大小</span>
db.createCollection("log", { capped: <span style="color: rgba(0, 0, 255, 1)">true</span>, size: 78<span style="color: rgba(0, 0, 0, 1)"> });

</span><span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 2. 插入200条数据</span>
<span style="color: rgba(0, 0, 255, 1)">for</span> (<span style="color: rgba(0, 0, 255, 1)">var</span> i = 0; i &lt; 200; i++<span style="color: rgba(0, 0, 0, 1)">) {
    db.log.insertOne({
      </span>"_id": i + 1<span style="color: rgba(0, 0, 0, 1)">,
      </span>"userId": Math.floor(Math.random() * 1000<span style="color: rgba(0, 0, 0, 1)">),
      </span>"content": "登录" + ("0000" + i).slice(-4<span style="color: rgba(0, 0, 0, 1)">),
      </span>"createTime": <span style="color: rgba(0, 0, 255, 1)">new</span><span style="color: rgba(0, 0, 0, 1)"> Date(),
    });
}</span></pre>
</div>
<p class="line" data-line="265">查看集合统计</p>
<div class="cnblogs_code">
<pre>db.log.stats()</pre>
</div>
<p><img src="https://img2020.cnblogs.com/blog/413875/202004/413875-20200413173524948-1202438745.png" alt=""></p>
<p>可以看出log集合使用了234个字节(78 * 3),也即3个文档的大小,最大能使用大小是256</p>
<p>&nbsp;</p>
<p>3. 查询固定集合</p>
<p>Mongodb若没指定排序字段,是按存入顺序检索,可以使用.sort( { $natural: -1 } )改变输出顺序</p>
<div class="cnblogs_code">
<pre>db.log.find({}).sort( { $natural: -1 } )</pre>
</div>
<p>&nbsp;</p>
<p>&nbsp;4.&nbsp;将非固定集合转换固定集合</p>
<p>将order转换试试</p>
<div class="cnblogs_code">
<pre>db.runCommand({"convertToCapped": "order", size: 8096});</pre>
</div>
<p>查看order集合统计</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/413875/202004/413875-20200413173834681-1432528710.png" alt=""></p>
<p>&nbsp;</p>
<p>&nbsp;只剩下90条数据</p>
<p>&nbsp;</p>
</div>

<div class="ws-copyright">
<p>转发请标明出处:https://www.cnblogs.com/WilsonPan/p/12692642.html</p>
</div>
<div class="ws-content">
<p>参考文章</p>
<p>Aggregation — MongoDB Manual<br>Capped Collections — MongoDB Manual<br></p>
</div>
<p></p><br><br>
来源:https://www.cnblogs.com/WilsonPan/p/12692642.html
頁: [1]
查看完整版本: 【Mongodb】聚合查询 && 固定集合