李贵德 發表於 2020-5-6 13:15:00

Mongodb 之 oplog

<p>一开始我就以为 oplog 应该就类似于 mysql bin-log 而事实上,确实差不多。oplog 也是用于复制集间由 Primary 记录,Secondary 用来同步。从而保持数据一致。</p>
<p>最近遇到了误删db(删库不能跑路)的事情,所以,实验了N多次的 oplog&nbsp;恢复数据。</p>
<p>特地记录一下,以备后查。</p>
<div class="cnblogs_code">
<p><span style="color: rgba(0, 128, 0, 1)"># ------------------------------ oplog ---------------------------------</span><br><span style="color: rgba(0, 128, 0, 1)">## 1. 在复制集中使用 oplog ,可以使用以下命令查看oplog情况:</span><br>rpset1:PRIMARY&gt; <span style="color: rgba(0, 0, 255, 1)">rs.printReplicationInfo()</span> <br>configured oplog size:   10240MB<br>log length start to end: 149092secs (41.41hrs)<br>oplog first event time:Sun Apr 26 2020 20:25:46 GMT+0800 (CST)<br>oplog last event time:   Tue Apr 28 2020 13:50:38 GMT+0800 (CST)<br>now:                     Tue Apr 28 2020 13:50:38 GMT+0800 (CST)</p>
<p>rpset1:SECONDARY&gt;<span style="color: rgba(0, 0, 255, 1)"> rs.printReplicationInfo()</span><br>configured oplog size:   10240MB<br>log length start to end: 149937secs (41.65hrs)<br>oplog first event time:Sun Apr 26 2020 20:10:59 GMT+0800 (CST)<br>oplog last event time:   Tue Apr 28 2020 13:49:56 GMT+0800 (CST)<br>now:                     Tue Apr 28 2020 13:49:56 GMT+0800 (CST)</p>
<p>rpset1:SECONDARY&gt; <span style="color: rgba(0, 0, 255, 1)">rs.printReplicationInfo()</span> <br>configured oplog size:   10240MB<br>log length start to end: 148635secs (41.29hrs)<br>oplog first event time:Sun Apr 26 2020 20:32:00 GMT+0800 (CST)<br>oplog last event time:   Tue Apr 28 2020 13:49:15 GMT+0800 (CST)<br>now:                     Tue Apr 28 2020 13:49:16 GMT+0800 (CST)</p>
<p><span style="color: rgba(0, 128, 0, 1)"># 配置文件 conf/slave.conf 中的oplogSize</span><br>replication:<br>      oplogSizeMB: 10240<br>      replSetName: rpset1</p>










</div>
<p>&nbsp;从以上的命令中可以看出,这个复制集的 oplog&nbsp;有41小时的容量,而这个 mongodb&nbsp;每天都有定时备份。所以,这个容量肯定是够用了。</p>
<p>使用 oplogReplay&nbsp;恢复数据,官文说必须要有一个特殊的权限。</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">## 2</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">. 创建专门的角色使用 oplogReplay此角色必须有 anyResource 和 anyAction
# 备份时不需要此权限,但恢复时必须要有此权限,否则恢复失败且没有报错信息。</span>
use admin
db.createRole(
   {
    </span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">role</span><span style="color: rgba(128, 0, 0, 1)">"</span> : <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">sysadmin</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">,
    </span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">privileges</span><span style="color: rgba(128, 0, 0, 1)">"</span> : [{ <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">resource</span><span style="color: rgba(128, 0, 0, 1)">"</span> : {<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">anyResource</span><span style="color: rgba(128, 0, 0, 1)">"</span> : <span style="color: rgba(0, 0, 255, 1)">true</span>}, <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">actions</span><span style="color: rgba(128, 0, 0, 1)">"</span> : [<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">anyAction</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">] }],
    </span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">roles</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)"> : []
   }
)

<span style="color: rgba(0, 128, 0, 1)"># 创建专门的用户使用此角色</span>
db.createUser({user:</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">admin</span><span style="color: rgba(128, 0, 0, 1)">"</span>, <span style="color: rgba(0, 0, 255, 1)">pwd</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">admin</span><span style="color: rgba(128, 0, 0, 1)">"</span>, roles:[{role:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">sysadmin</span><span style="color: rgba(128, 0, 0, 1)">"</span>, db:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">admin</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">}]})
<span style="color: rgba(0, 128, 0, 1)"># 或者授权某个用户</span> db.grantRolesToUser( </span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">root</span><span style="color: rgba(128, 0, 0, 1)">"</span> , [ { role: <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">sysadmin</span><span style="color: rgba(128, 0, 0, 1)">"</span>, db: <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">admin</span><span style="color: rgba(128, 0, 0, 1)">"</span> } ])</pre>
</div>
<p>&nbsp;</p>
<p>检查一下定时备份db的命令,找到如下:</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">## 3</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">. 日常全量备份</span>
.</span>/mongodump -h <span style="color: rgba(128, 0, 128, 1)">10.170</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u admin -p admin --authenticationDatabase admin --<span style="color: rgba(0, 0, 255, 1)">gzip</span> -o /data/tmp/rs0

<span style="color: rgba(0, 128, 0, 1)"># 备份时如果有 --oplog 选项,输出目录下就会有 oplog.bson 文件</span>
<span style="color: rgba(0, 128, 0, 1)"># ./mongodump -h 10.170.6.116:27000 -u rsroot -p abcd1234 --authenticationDatabase admin --oplog -o /data/tmp/rs0

</span></pre>
</div>
<p>&nbsp;</p>
<p>因为备份时没有带 --oplog&nbsp;参数,所以进行恢复时,使用先恢复备份,再 oplogReplay的方式完成,也就是参考下面的第9点。</p>
<p>而4到8点,用来在恢复备份的同时带上 oplogReplay&nbsp;的方式。</p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">## 4. 假设上次日常备份之后的某个时间点出现了误删除操作,就需要利用 oplogReplay 来恢复这段时间的新数据
# 先检查上次日常备份的时间点(如果 dump 时使用了 --</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">oplog 参数,就会有oplog.bson文件。如果没有,可参考第9条):</span>
.</span>/bsondump /data/tmp/rs0/oplog.bson &gt; /data/tmp/<span style="color: rgba(128, 0, 128, 1)">0</span>
<span style="color: rgba(0, 0, 255, 1)">cat</span> /data/tmp/<span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 128, 0, 1)">
# 找到第一行 {"ts":{"$timestamp":{"t":1588138496,"i":1}}, ...
<br># 字段的意思:<br></span></pre>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 0, 1)">ts: 操作发生的时间,t: unix时间戳, i: 可以认为是同一时间内的第几个.
h: 记录的唯一ID
v: 版本信息
op: 写操作的类型
   n: no-op
   c: db cmd
   i: insert
   u: update
   d: delete

ns: 操作的namespace</span>, <span style="color: rgba(0, 0, 0, 1)">即: 数据库.集合
o: 操作所对应的文档
o2: 更新时所对应的where条件,更新时才有</span></pre>
</div>
<pre><span style="color: rgba(0, 128, 0, 1)"># 起始时间戳可自由指定,不必oplog中找记录。稍微早于需要的时间点即可。</span>
./mongodump -h 192.168.6.116:27017 -u admin -p admin --authenticationDatabase admin -d local -c oplog.rs -q '{"ts":{"$gt": {"$timestamp":{"t":1588138300,"i":1}}}}' -o /data/tmp/rs1
</pre>
<pre><span style="color: rgba(0, 128, 0, 1)">

## 5. 导出当前的 local/oplog.rs注意 -q 选项的 JSON格式
# 因为备份整个 local/</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">oplog.rs 容量太大,恢复也会耗时过长,所以采用起始时间的方式:</span>
.</span>/mongodump -h <span style="color: rgba(128, 0, 128, 1)">192.168</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u admin -p admin --authenticationDatabase admin -d local -c oplog.rs -q <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">{"ts":{"$gt": {"$timestamp":{"t":1588138393,"i":1}}}}</span><span style="color: rgba(128, 0, 0, 1)">'</span> -o /data/tmp/<span style="color: rgba(0, 0, 0, 1)">rs1
<span style="color: rgba(0, 128, 0, 1)"># 也可以同时指定结束时间,如下:</span>
.</span>/mongodump -h <span style="color: rgba(128, 0, 128, 1)">192.168</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u rsroot -p abcd1234 --authenticationDatabase admin -d local -c oplog.rs -q <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">{"ts":{"$lte": {"$timestamp":{"t":1588142111,"i":1}}, "$gte": {"$timestamp":{"t":1588138393,"i":1}}}}</span><span style="color: rgba(128, 0, 0, 1)">'</span> -o /data/tmp/<span style="color: rgba(0, 0, 0, 1)">rs2
<span style="color: rgba(0, 128, 0, 1)"># 也可以使用 --queryFile=./n.json 的方式,指定查询文件(可能4.0.7以下版本会有错误提示)</span><br></span></pre>
<div class="cnblogs_code">
<pre>{"ts":{"$gte": {"$timestamp":{"t":1589042338,"i":1}}}, "ns":{"$not": {"$regex": "test.names"}}}</pre>
</div>
<pre></pre>
<p>&nbsp;# -q&nbsp;参数示例:</p>
<div class="cnblogs_code">
<pre> -q <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">{"ts":{"$gte": {"$timestamp":{"t":1589342458,"i":1}}}, "ns":{"$nin":["test.tlog","config.system.sessions"]}}</span><span style="color: rgba(128, 0, 0, 1)">'</span>

-q <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">{"ts":{"$gte": {"$timestamp":{"t":1589342458,"i":1}}}, "lsid":{"$exists": false }}</span><span style="color: rgba(128, 0, 0, 1)">'</span></pre>
</div>
<p>&nbsp;</p>
<pre><span style="color: rgba(0, 0, 0, 1)">

<span style="color: rgba(0, 128, 0, 1)">## </span></span><span style="color: rgba(0, 128, 0, 1)">6</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">. 检查 oplog.rs.bson 手工找出误删除的时间戳:</span>
.</span>/bsondump /data/tmp/rs1/local/oplog.rs.bson &gt; /data/tmp/<span style="color: rgba(128, 0, 128, 1)">1</span><span style="color: rgba(0, 128, 0, 1)">
# 打开 /data/tmp/1 手工查找,如果有删除表或库,则有 drop 信息, 如果有删除数据,则有 "op":"d"</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)"> 信息</span>


<span style="color: rgba(0, 128, 0, 1)">## </span></span><span style="color: rgba(0, 128, 0, 1)">7. 替换日常全备份中的 oplog.bson
</span><span style="color: rgba(0, 0, 255, 1)">rm</span> -rf /data/tmp/rs0/<span style="color: rgba(0, 0, 0, 1)">oplog.bson
</span><span style="color: rgba(0, 0, 255, 1)">mv</span> /data/tmp/rs1/local/oplog.rs.bson /data/tmp/rs0/<span style="color: rgba(0, 0, 0, 1)">oplog.bson


<span style="color: rgba(0, 128, 0, 1)">## </span></span><span style="color: rgba(0, 128, 0, 1)">8</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">. 执行恢复命令(注意用户权限)</span>
.</span>/mongorestore -h <span style="color: rgba(128, 0, 128, 1)">192.168</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u admin -p admin --authenticationDatabase admin --oplogReplay --oplogLimit <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">1588232764:1</span><span style="color: rgba(128, 0, 0, 1)">"</span> --<span style="color: rgba(0, 0, 255, 1)">dir</span> /data/tmp/rs0/<span style="color: rgba(0, 128, 0, 1)">
# 其中 1588232764 即是 $timestamp 中的"t",1 即是 $timestamp 中的 "i"</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)"> 这样配置后oplog将会
# 重放到这个时间点以前,即正好避开了第一条删除语句及其后面的操作,数据库停留在灾难前状态</span>


<span style="color: rgba(0, 128, 0, 1)">## </span></span><span style="color: rgba(0, 128, 0, 1)">9. 如果日常备份没有 --oplog 并且使用了 --gzip</span><span style="color: rgba(0, 0, 0, 1)"><span style="color: rgba(0, 128, 0, 1)">,可以先恢复此备份。
# 然后再使用oplogReplay 指定单独的 oplog.rs.bson 文件进行恢复.</span>
.</span>/mongorestore -h <span style="color: rgba(128, 0, 128, 1)">192.168</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u admin -p admin --authenticationDatabase admin /data/tmp/rs0/ --<span style="color: rgba(0, 0, 255, 1)">gzip</span><span style="color: rgba(0, 0, 0, 1)">
.</span>/mongorestore -h <span style="color: rgba(128, 0, 128, 1)">192.168</span>.<span style="color: rgba(128, 0, 128, 1)">6.116</span>:<span style="color: rgba(128, 0, 128, 1)">27017</span> -u admin -p admin --authenticationDatabase admin --oplogReplay --oplogLimit <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">1588232764:1</span><span style="color: rgba(128, 0, 0, 1)">"</span> /data/tmp/rs1/local/oplog.rs.bson<br><span style="color: rgba(0, 128, 0, 1)"># 有可能恢复时不成功,提示 “ applyOps field: no such field ” ,此时,只能使用上面的第8步的方式试试了。</span></pre>
</div>
<p>不必担心数据混乱。因为 oplog&nbsp;的幂等性,即使多次Replay&nbsp;也不会产生重复数据。&nbsp;&nbsp;已存在相同的 _id,即使其它字段不同,也不会恢复,不存在的 _id 则会恢复。</p>
<p>当然,也可以将备份和oplog恢复到某台单机上,再使用导出导入的方法将数据移到生产环境。</p>
<p>&nbsp;</p>
<p>试验往单机恢复的时候,同一个命令执行多次,有时出错有时成功,就不知道为什么了。操作时只能是多试几次了。</p>
<p><img src="https://img2020.cnblogs.com/blog/1371859/202005/1371859-20200507112613516-714876790.png"></p>
<p>&nbsp;</p><br><br>
来源:https://www.cnblogs.com/frx9527/p/oplog.html
頁: [1]
查看完整版本: Mongodb 之 oplog