MongoDB 数据库 — 查询方法 - Powered by Discuz! Archiver

窗外清风 發表於 2019-6-17 15:40:00

MongoDB 数据库 — 查询方法

接上篇，本篇专门整理 MongoDB 查询方法。
<h1 id="4-基本查询">4 基本查询</h1>
<ul>
<li>你可以在数据库中使用 find 或者 findOne 函数来执行专门的查询；</li>
<li>你可以查询范围、集合、不等式，也可以使用 $-条件 执行更多的操作；</li>
<li>查询结果是一个数据库游标（cursor），当需要的时候返回你需要的文档。</li>
<li>你可以在 cursor 上执行许多元操作（metaoperations），包括 skipping 一定数量的结果，limiting 返回结果的数量，和 sorting 结果。</li>
</ul>
<h2 id="41-find">4.1 find</h2>
<pre><code class="language-python"># find 的第一个参数指定了查询准则
db.users.find()#匹配集合中的所有文档
db.users.find({"age": 27})

# 多条件查询可以通过增加更多的 key/value 对，可以解释为 *condition1* AND *condition2* AND ... AND *conditionN*
db.users.find({"username": "joe", "age": 27)
</code></pre>
<h3 id="411-指定返回的键">4.1.1 指定返回的键</h3>
find（或者 findOne）的第二个参数指定返回的键，虽然 "_id" 键没有被指定，但是默认返回。也可以指定需要排除的 key/value 对。
<pre><code class="language-python"># "_id" 键默认返回
> db.users.find({}, {"username": 1, "email": 1})

# 结果
{
"_id" : ObjectId("4ba0f0dfd22aa494fd523620"),
"username" : "joe",
"email" : "joe@example.com"
}

# 阻止 "_id" 键返回
> db.users.find({}, {"username": 1, "email": 1, "_id": 0})

# 结果
{
"username" : "joe",
"email" : "joe@example.com"
}
</code></pre>
<h3 id="412-限制limitations">4.1.2 限制（Limitations）</h3>
数据库所关心的查询文档的值必须是常量，也就是不能引用文档中其他键的值。例如，要想保持库存，有原库存 "in_stock" 和 "num_sold" 两个键，我们不能像下面这样比较两者的值：
<pre><code>> db.stock.find({"in_stock" : "this.num_sold"})// doesn't work
</code></pre>
<h2 id="42-查询准则criteria">4.2 查询准则（Criteria）</h2>
<h3 id="421-查询条件">4.2.1 查询条件</h3>
比较操作："$\$$lt"，"$\$$lte"，"$\$$gt"，"$\$$gte"，"$\$$ne"
<pre><code>> db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

> db.users.find({"username" : {"$ne" : "joe"}})
</code></pre>
<h3 id="422-or-查询">4.2.2 OR 查询</h3>
MongoDB 中有两种 OR 查询。"$\$$in" 用作对一个 key 查询多个值；"$\$$or" 用作查询多个 keys 给定的值。
<pre><code class="language-python"># 单个键，一种类型的值
> db.raffle.find({"ticket_no" : {"$in" : }})

# 单个键，多种类型的值
> db.users.find({"user_id" : {"$in" : })

# "$nin"
> db.raffle.find({"ticket_no" : {"$nin" : }})

# 多个键
> db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})

# 多个键，带有条件
> db.raffle.find({"$or" : [{"ticket_no" : {"$in" : }}, {"winner" : true}]})
</code></pre>
<h3 id="423-not">4.2.3 $not</h3>
"$\$$not" 是元条件句，可以用在任何条件之上。例如，取模运算符 "$\$$mod" 会将查询的值除以第一个给定的值，若余数等于第二个给定值，则返回该结果：
<pre><code>db.users.find({"id_num" : {"$mod" : }})
</code></pre>
上面的查询会返回 "id_num" 值为 1、6、11、16 等的用户，但要返回 "id_num" 为 2、3、4、6、7、8 等的用户，就要用 "$not" 了：
<pre><code>> db.users.find({"id_num" : {"$not" : {"$mod" : }}})
</code></pre>
"$not" 与正则表达式联合使用的时候极为有用，用来查找那些与特定模式不符的文档。
<h3 id="424-条件句的规则">4.2.4 条件句的规则</h3>
比较更新修改器和查询文档，会发现以 $ 开头的键处在不同的位置。条件句是内层文档的键，而修改器是外层文档的键。可以对一个键应用多个条件，但是一个键不能对应多个修改器。
<pre><code class="language-python">
> db.users.find({"age" : {"$lt" : 30, "$gt" : 20}})

# 修改了年龄两次，错误
> db.users.find({"$inc" : {"age" : 1}, "$set" : {age : 40}})
</code></pre>
但是，也有一些元操作可以用在外层文档："$\$$and"，"$\$$or"，和 "$\$$nor"：
<pre><code>> db.users.find({"$and" : [{"x" : {"$lt" : 1}}, {"x" : 4}]})
</code></pre>
看上去这个条件相矛盾，但是如果 x 是一个数组的话：{"x" : } 是符合的。
<h2 id="43-特定类型的查询">4.3 特定类型的查询</h2>
<h3 id="431-null">4.3.1 null</h3>
null 表现起来有些奇怪，它不但匹配它自己，而且能匹配 "does not exist"，所以查询一个值为 null 的键，会返回缺乏那个键的所有文档。
<pre><code class="language-python">> db.c.find()

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

> db.c.find({"y" : null})

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }

> db.c.find({"z" : null})

{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }
</code></pre>
如果我们想要找到值为 null 的键，我们可以使用 "$\$$exists" 检查键是 null，并且存在。如下，不幸的是，没有 "$\$$eq"操作，但是带有一个元素的 "$\$$in" 和它等价。
<pre><code>> db.c.find({"z" : {"$in" : , "$exists" : true}})
</code></pre>
<h3 id="432-正则表达式">4.3.2 正则表达式</h3>
<ol>
<li>MongoDB 使用 $regex 操作符来设置匹配字符串的正则表达式。</li>
<li>MongoDB 使用 Perl Compatible Regular Expression (PCRE) 库匹配正则表达式；任何在 PCRE 中可以使用的正则表达式语法都可以在 MongoDB 中使用。在使用正则表达式之前可以先在 JavaScript shell 中检查语法，看是否是你想要匹配的。</li>
</ol>
<pre><code class="language-python"># 文档结构
{
"post_text": "enjoy the mongodb articles on runoob",
"tags": [
 "mongodb",
 "runoob"
]
}

# 使用正则表达式查找包含 runoob 字符串的文章
> db.posts.find({post_text:{$regex:"runoob"}})

# 以上操作还可以表示如下
> db.posts.find({post_text:/runoob/})

# 不区分大小写的正则表达式
# 如果为 $options: "$s"，表示允许点字符（dot character，即 .）匹配包括换行字符（newline characters）在内的所有字符
> db.posts.find({post_text:{"$regex":"runoob", "$options":"$i"}})
# 或者
> db.posts.find({post_text: /runoob/i})
</code></pre>
<h3 id="433-查询内嵌文档">4.3.3 查询内嵌文档</h3>
有两种方式查询内嵌文档：查询整个文档，或者只针对它的键/值对进行查询。
<pre><code>{
"name" : {
 "first" : "Joe",
 "last" : "Schmoe"
 },
"age" : 45
}
</code></pre>
可以使用如下方式进行查：
<pre><code>> db.people.find({"name" : {"first" : "Joe", "last" : "Schmoe"}})
</code></pre>
但是，对于子文档的查询必须精确匹配子文档。如果 Joe 决定去添加一个中间的名字，这个查询就将不起作用，这类查询也是 order-sensitive 的，{"last" : "Schmoe", "first" : "Joe"}将不能匹配。
仅仅查询内嵌文档特定的键通常是一个好主意。这样的话，如果你的数据模式改变，也不会导致所有查询突然失效，因为他们不再是精确匹配。可以通过使用 dot-记号 查询内嵌的键：
<pre><code>> db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})
</code></pre>
当文档更加复杂的时候，内嵌文档的匹配有些技巧。例如，假设有博客文章若干，要找到由 Joe 发表的 5 分以上的评论。博客文章的结构如下所示：
<pre><code>> db.blog.find()
{
 "content" : "...",
 "comments" : [
 {
 "author" : "joe",
 "score" : 3,
 "comment" : "nice post"
 },
 {
 "author" : "mary",
 "score" : 6,
 "comment" : "terrible post"
 }
]
}
</code></pre>
查询的方式如下：
<pre><code class="language-python"># 错误，内嵌的文档必须匹配整个文档，这个没有匹配 "comment" 键
> db.blog.find({"comments" : {"author" : "joe", "score" : {"$gte" : 5}}})

# 错误，因为符合 author 条件的评论和符合 score 条件的评论可能不是同一条评论
> db.blog.find({"comments.author" : "joe", "comments.score" : {"$gte" : 5}})

# 正确，"$elemMatch" 将限定条件进行分组，仅当对一个内嵌文档的多个键进行操作时才会用到
> db.blog.find({"comments" : {"$elemMatch" : {"author" : "joe", "score" : {"$gte" : 5}}}})
</code></pre>
<h1 id="5-聚合aggregation">5 聚合（aggregation）</h1>
将数据存储在 MongoDB 中后，我们就可以进行检索，然而，我们可能想在它上面做更多的分析工作。
<h2 id="51-聚合框架the-aggregation-framework">5.1 聚合框架（The aggregation framework）</h2>
聚合框架可以在一个集合中转化（transform）和混合（combine）文档。基本的，你可以通过几个创建模块（filtering, projecting, grouping, sorting, limiting, and skipping）来建立处理一批文档的管道。
例如，如果有一个杂志文章的集合，你可能想找出谁是最多产的作者。假设每一篇文章都作为一个文档存储在 MongoDB 中，你可以通过以下几步来创建一个管道：
<pre><code class="language-python"># 1. 将每篇文章文档的作者映射出来
{"$project" : {"author" : 1}}

# 2. 通过名字将作者分组，统计文档的数量
{"$group" : {"_id" : "$author", "count" : {"$sum" : 1}}}

# 3. 通过文章数量，降序排列作者
{"$sort" : {"count" : -1}}

# 4. 限制前5个结果
{"$limit" : 5}

# 在 MonoDB 中，将每个操作传递给 aggregate() 函数
> db.articles.aggregate( {"$project" : {"author" : 1}},
 {"$group" : {"_id" : "$author", "count" : {"$sum" : 1}}},
 {"$sort" : {"count" : -1}},
 {"$limit" : 5}
 )

# 输出结果，返回一个结果文档数组
{
 "result" : [
 {
 "_id" : "R. L. Stine",
 "count" : 430
 },
 {
 "_id" : "Edgar Wallace",
 "count" : 175
 },
 {
 "_id" : "Nora Roberts",
 "count" : 145
 },
 {
 "_id" : "Erle Stanley Gardner",
 "count" : 140
 },
 {
 "_id" : "Agatha Christie",
 "count" : 85
 }
 ],
 "ok" : 1
}
</code></pre>
<blockquote>
注：aggregate 框架不会写入到集合，所以所有的结果必须返回客户端。因此，aggregation 返回的数据结果限制在 16MB。
</blockquote>
<h2 id="52-管道操作pipeline-operations">5.2 管道操作（Pipeline Operations）</h2>
<h3 id="521-match">5.2.1 $match</h3>
$\$$match 过滤文档，以致于你可以在文档子集上运行聚合操作。通常，尽可能的将 "$\$$match" 操作放到管道操作的前面。这样做主要有两个优点：1. 可以快速过滤掉不需要的文档（留下管道操作需要执行的文档），2. 可以在 projections 和 groupings 之前使用 indexes 查询。
<h3 id="522-project">5.2.2 $project</h3>
映射在管道中操作比在“标准的”查询语言中（find函数的第二个参数）更加强有力。
<pre><code class="language-python"># 映射，"_id" 总是默认返回，此处指定不返回
> db.articles.aggregate({"$project" : {"author" : 1, "_id" : 0}})

# 重命名被映射的域 "_id"
> db.users.aggregate({"$project" : {"userId" : "$_id", "_id" : 0}})

# 如果 originalFieldname 是索引，则在重命名之后就不再默认为索引了
> db.articles.aggregate({"$project" : {"newFieldname" : "$originalFieldname"}},
 {"$sort" : {"newFieldname" : 1}})
</code></pre>
"$\$$fieldname" 语法被用来在 aggregation framework 中引用 fieldname 的值。比如上面例子中，"$\$$_id" 将会被 _id 域的内容取代。当然，如果重命名了，则就不要返回两次了，正如上例所示，当 "_id" 被重命名之后就不再返回。
管道表达式
最简单的 "$project" 表达式是包含、排除和域名重命名。也可以使用其它的表达式。
数学表达式
"$\$$add", "$\$$subtract", "$\$$multiply", "$\$$divide", "$\$$mod"
<pre><code class="language-python"># 域 "salary" 和域 "bonus" 相加
> db.employees.aggregate(
 {
 "$project" : {
 "totalPay" : {
 "$add" : ["$salary", "$bonus"]
 }
 }
 })

# "$subtract" 表达式，减掉 401k
> db.employees.aggregate(
 {
 "$project" : {
 "totalPay" : {
 "$subtract" : [{"$add" : ["$salary", "$bonus"]}, "$401k"]
 }
 }
 })
</code></pre>
日期表达式
aggregation 有一个可以提取日期信息的表达式集合： "$\$$year"， "$\$$month"， "$\$$week"，"$\$$dayOfMonth"， "$\$$dayOfWeek"， "$\$$dayOfYear"， "$\$$hour"， "$\$$minute" 和 "$\$$second"。
<pre><code class="language-python"># 返回每个员工被雇佣的月
> db.employees.aggregate(
 {
 "$project" : {
 "hiredIn" : {"$month" : "$hireDate"}
 }
 })

# 计算员工在公司工作的年数
> db.employees.aggregate(
 {
 "$project" : {
 "tenure" : {
 "$subtract" : [{"$year" : new Date()}, {"$year" : "$hireDate"}] }
 }
 }
 }
</code></pre>
字符串表达式
<ul>
<li>
"$substr" : 返回第一个参数的子串，起始于第 startOffset 个字节，包含 numToReturn 个字节（注意，这个以字节测量，而不是字符，所以多字节编码需要小心）。
</li>
<li>
"$concat" : ]连接每一个给定的字符串。
</li>
<li>
"$toLower" : expr以小写的形式返回字符串。
</li>
<li>
"$toUpper" : expr以大写的形式返回字符串。
</li>
</ul>
<pre><code class="language-python">> db.employees.aggregate(
 {
 "$project" : {
 "email" : {
 "$concat" : [
 {"$substr" : ["$firstName", 0, 1]},
 ".",
 "$lastName",
 "@example.com"
 ]
 }
 }
 })
</code></pre>
逻辑表达式
比较表达式
<ul>
<li>
"$cmp" : 比较表达式 expr1 和 expr2，如果相等返回 0，如果 expr1 小于 expr2 返回负值，如果 expr2 小于 expr1 返回正值。
</li>
<li>
"$strcasecmp" : 比较 string1 和 string2，必须为罗马字符。
</li>
<li>
"$\$$eq"，"$\$$ne"， "$\$$gt"， "$\$$gte"， "$\$$lt"， "$\$$lte" : 
</li>
</ul>
布尔表达式：
<ul>
<li>
"$and" : ]
</li>
<li>
"$or" : ]
</li>
<li>
"$not" : expr
</li>
</ul>
控制语句：
<ul>
<li>
"$cond" : booleanExpr 为 true 时返回 trueExpr，否则返回 falseExpr。
</li>
<li>
"$ifNull" : 如果 expr 为空返回 replacementExpr，否则返回 expr。
</li>
</ul>
一个例子
<pre><code class="language-python">> db.students.aggregate(
 {
 "$project" : {
 "grade" : {
 "$cond" : [
 "$teachersPet",
 100, // if
 { // else
 "$add" : [
 {"$multiply" : [.1, "$attendanceAvg"]},
 {"$multiply" : [.3, "$quizzAvg"]},
 {"$multiply" : [.6, "$testAvg"]}
 ]
 }
 ]
 }
 }
 })
</code></pre>
<h3 id="523-group">5.2.3 $group</h3>
算数操作符
<pre><code class="language-python"># 在多个国家销售数据的集合，计算每个国家的总收入
> db.sales.aggregate(
 {
 "$group" : {
 "_id" : "$country",
 "totalRevenue" : {"$sum" : "$revenue"}
 }
 })

# 返回每个国家的平均收入和销售的数量
> db.sales.aggregate(
 {
 "$group" : {
 "_id" : "$country",
 "totalRevenue" : {"$average" : "$revenue"},
 "numSales" : {"$sum" : 1}
 }
 })
</code></pre>
极端操作符（Extreme operators）
如果你的数据已经排序好了，使用 $\$$first 和 $\$$last 比 $\$$min 和 $\$$max 更有效率。如果数据事先没有排序，则使用 $\$$min 和 $\$$max 比先排序然后 $\$$first 和 $\$$last 更有效率。
<pre><code class="language-python"># 在一次测验中学生分数的集合，找出每个年级的局外点
> db.scores.aggregate(
 {
 "$group" : {
 "_id" : "$grade",
 "lowestScore" : {"$min" : "$score"},
 "highestScore" : {"$max" : "$score"}
 }
 }

# 或者
> db.scores.aggregate(
 {
 "$sort" : {"score" : 1}
 },
 {
 "$group" : {
 "_id" : "$grade",
 "lowestScore" : {"$first" : "$score"},
 "highestScore" : {"$last" : "$score"}
 }
 })
</code></pre>
数组操作符（Array operators）
<ul>
<li>
"$addToSet": expr保持一个数组，如果 expr 不在数组中，添加它。每一个值在数组中最多出现一次，不一定按照顺序。
</li>
<li>
"$push": expr不加区分的将每一个看到的值添加到数组，返回包含所有值得数组。
</li>
</ul>
<h3 id="524-unwind展开">5.2.4 $unwind（展开）</h3>
unwind 将数组的每个域转化为一个单独的文档。例如，如果我们有一个有多条评论的博客，我们可以使用 unwind 将每个评论转化为自己的文档。
<pre><code class="language-python">> db.blog.findOne()
 {
 "_id" : ObjectId("50eeffc4c82a5271290530be"),
 "author" : "k",
 "post" : "Hello, world!",
 "comments" : [
 {
 "author" : "mark",
 "date" : ISODate("2013-01-10T17:52:04.148Z"),
 "text" : "Nice post"
 },
 {
 "author" : "bill",
 "date" : ISODate("2013-01-10T17:52:04.148Z"),
 "text" : "I agree"
 }
 ]
}

# unwind
> db.blog.aggregate({"$unwind" : "$comments"})
 {
 "results" :
 {
 "_id" : ObjectId("50eeffc4c82a5271290530be"),
 "author" : "k",
 "post" : "Hello, world!",
 "comments" : {
 "author" : "mark",
 "date" : ISODate("2013-01-10T17:52:04.148Z"),
 "text" : "Nice post"
 }
 },
 {
 "_id" : ObjectId("50eeffc4c82a5271290530be"),
 "author" : "k",
 "post" : "Hello, world!",
 "comments" : {
 "author" : "bill",
 "date" : ISODate("2013-01-10T17:52:04.148Z"),
 "text" : "I agree"
 }
 }
 "ok" : 1
 }
</code></pre>
<h3 id="525-sort">5.2.5 $sort</h3>
<pre><code class="language-python"># 1 是 ascending，-1 是 descending
> db.employees.aggregate(
 {
 "$project" : {
 "compensation" : {
 "$add" : ["$salary", "$bonus"]
 },
 "name" : 1
 }
 },
 {
 "$sort" : {"compensation" : -1, "name" : 1}
 }
)
</code></pre>
<h3 id="526-limit">5.2.6 $limit</h3>
$limit 接收数值 n，然后返回前 n 个结果文档。
<h3 id="527-skip">5.2.7 $skip</h3>
$limit 接收数值 n，然后从结果集中剔除前 n 个文档。对于标准查询，一个大的 skips 效率比较低，因为它必须找出所有匹配被 skipped 的文档，然后剔除它们。
<h3 id="528-使用管道">5.2.8 使用管道</h3>
在使用 "$\$$project"、"$\$$group" 或者 "$\$$unwind" 操作之前，最好尽可能过滤出更多的文档（和更多的域）。一旦管道不使用直接来自集合中的数据，索引（index）就不再能够帮助取过滤（filter）和排序（sort）。如果可能的话，聚合管道试图为你重新排序这些操作，以便能使用索引。
MongoDB 不允许单一聚合操作使用超过一定比例的系统内存：如果它计算得到一个聚合操作占用超过 20% 的内存，聚合就会出错。允许输出被输送到一个集合中（这样可以最小化所需内存的数量）是为将来作计划。
<h1 id="参考资料">参考资料</h1>
<ol>
<li>
MongoDB: The Definitive Guide, Second Edition
</li>
<li>
MongoDB 正则表达式
</li>
<li>
dateToString
</li>
</ol> 
来源：https://www.cnblogs.com/shaocf/p/10988551.html

頁: [1]

圆梦公社's Archiver

MongoDB 数据库 — 查询方法