MongoDB异常MongoCursorNotFoundException

羊城之友 發表於 2020-12-4 16:47:00

<blockquote>
昨天在测试数据导出的时候发现，若连续导出多次，则会报如下异常：
</blockquote>
<img src="https://img2020.cnblogs.com/blog/1890572/202012/1890572-20201204164659142-139320102.png">
com.mongodb.MongoCursorNotFoundException: Query failed with error code -5
异常信息为Mongo查询的游标找不到导致查询失败; 
网上的解决办法大多包含如下几种：
<ol>
<li>noCursorTimeout 设置cursor无超时时间
<blockquote>
此种操作查询完成后需要手动清理cursor，若因为异常或网络则会导致游标一直存在，所以不推荐此方法
</blockquote>
</li>
<li>batchSize 指定在 MongoDB 实例的每批响应中要返回的文档数
<blockquote>
https://www.docs4dev.com/docs/zh/mongodb/v3.6/reference/reference-method-cursor.batchSize.html
</blockquote>
</li>
</ol>
<h4 id="官方文档">官方文档：</h4>
<ul>
<li>关于cursor的说明 
<img src="https://img2020.cnblogs.com/blog/1890572/202012/1890572-20201204164712761-986399438.png"></li>
</ul>
<hr>
看完上面的解决方法，应该为虎躯一震，原来是这样。但是不要忽略开篇第一句的问题所在，若连续导出多次，则会报如下异常 
由此可知，我们的异常并不是因为cursor的过期而导致的，那为什么会出现cursor not found呢？
我们先看下find的部分查询源码：
<pre><code class="language-java"> /**
* Internal method using callback to do queries against the datastore that requires reading a collection of objects.
* It will take the following steps
* <ol>
* <li>Execute the given {@link ConnectionCallback} for a {@link DBCursor}.</li>
* <li>Prepare that {@link DBCursor} with the given {@link CursorPreparer} (will be skipped if {@link CursorPreparer}
* is {@literal null}</li>
* <li>Iterate over the {@link DBCursor} and applies the given {@link DocumentCallback} to each of the
* {@link Document}s collecting the actual result {@link List}.</li>
* <ol>
*
* @param <T>
* @param collectionCallback the callback to retrieve the {@link DBCursor} with
* @param preparer the {@link CursorPreparer} to potentially modify the {@link DBCursor} before iterating over it
* @param objectCallback the {@link DocumentCallback} to transform {@link Document}s into the actual domain type
* @param collectionName the collection to be queried
* @return
*/
private <T> List<T> executeFindMultiInternal(CollectionCallback<FindIterable<Document>> collectionCallback,
@Nullable CursorPreparer preparer, DocumentCallback<T> objectCallback, String collectionName) {

try {

MongoCursor<Document> cursor = null;

try {

FindIterable<Document> iterable = collectionCallback
.doInCollection(getAndPrepareCollection(doGetDatabase(), collectionName));

if (preparer != null) {
iterable = preparer.prepare(iterable);
}

cursor = iterable.iterator();

List<T> result = new ArrayList<>();

while (cursor.hasNext()) {
Document object = cursor.next();
result.add(objectCallback.doWith(object));
}

return result;
} finally {

if (cursor != null) {
cursor.close();
}
}
} catch (RuntimeException e) {
throw potentiallyConvertRuntimeException(e, exceptionTranslator);
}
}
</code></pre>
摘自网络博客：
<blockquote>
当我们在使用db.collection.find()命令查询mongodb数据时，直接返回给你的并不是数据本身，而是一个游标，每个游标都有对应的一个游标ID，服务器会记录这个游标ID，真正获取数据时，是通过对游标进行遍历拿到数据，对应的遍历方法主要是hashNext()和next()，跟iterator迭代器一样使用（命令行客户端之所以通过find()命令就得到数据，是因为它自动帮你遍历了游标，且默认展示了20条数据），客户端通过游标从服务端获取数据时并不是一条一条的，而是一批一批的，这样可以提升IO性能，每批数据都缓存在客户端内存中，通过next()遍历完后，继续通过getMore()方法去服务器获取下一批数据，而此时需要携带cursorid的，服务器通过cursorid辨别是取什么数据，当服务器端没有这个cursorid时，就会发生这个游标找不到的错误。
</blockquote>
以此我们知道了find命令是依赖batchsize配置来进行迭代多次查询的，那么如果说cursor并没有过期，只是多次获取时找不到了呢？
我们继续查阅源码，在Mongo驱动的代码中找到了获取连接的代码：
<pre><code class="language-java"> // 摘自com.mongodb.operation.QueryBatchCursor类中
private void getMore() {
 Connection connection = connectionSource.getConnection();
 try {
 if (serverIsAtLeastVersionThreeDotTwo(connection.getDescription())) {
 try {
 initFromCommandResult(connection.command(namespace.getDatabaseName(),
 asGetMoreCommandDocument(),
 NO_OP_FIELD_NAME_VALIDATOR,
 ReadPreference.primary(),
 CommandResultDocumentCodec.create(decoder, "nextBatch"),
 connectionSource.getSessionContext()));
 } catch (MongoCommandException e) {
 throw translateCommandException(e, serverCursor);
 }
 } else {
 QueryResult<T> getMore = connection.getMore(namespace, serverCursor.getId(),
 getNumberToReturn(limit, batchSize, count), decoder);
 initFromQueryResult(getMore);
 }
 if (limitReached()) {
 killCursor(connection);
 }
 if (serverCursor == null) {
 this.connectionSource.release();
 this.connectionSource = null;
 }
 } finally {
 connection.release();
 }
}
</code></pre>
<ul>
<li>
以此可以看见finally中执行了connection.release() 即每次连接后都会断开连接;
</li>
<li>
那么会不会存在mongo集群下，连接到另一台机器的情况呢？ 
查阅资料：
<blockquote>
正常情况下，当我们使用mongodb集群时，将所有mongodb服务器以 IP1:PORT1,IP2:PORT2,IP3:PORT3的形式传给驱动，驱动能够自动完成负载均衡和保持会话转发到同一个服务器，这时候不会出现问题；
</blockquote>
</li>
<li>
一旦我们自己实现负载均衡，即用了统一域名或者ip分发了Ip.就会存在每次连接到不同机器，导致找不到cursor，也因此会抛出MongoCursorNotFoundException的错误;
</li>
<li>
当然，如果自己实现的负载根据Ip来进行了机器分发，确保相同ip每次分发请求到同一台机器，那么也不会存在此类问题；
</li>
</ul>
后来问了我们这边的dba，发现mongo集群的确是自己实现了负载，且存在此类问题；
<h4 id="结论">结论:</h4>
知道这个问题的原因后，可以知道之前的修改batchSize也是行不通的，之所以修改后避免了问题，只是因为batchsize修改的足够大，避免了多次获取游标；
那么我们可以得到解决方案，将Mongo的配置改为真实的mongo机器IP，以 IP1:PORT1,IP2:PORT2,IP3:PORT3的形式传给驱动，由驱动自动完成负载。

</div>
<div id="MySignature" role="contentinfo">
时在中春,阳和方起 
来源：https://www.cnblogs.com/heyouxin/p/14086643.html

頁: [1]

圆梦公社's Archiver

MongoDB异常MongoCursorNotFoundException