python——pickle模块的详解
<p><span style="font-size: 16px"><strong>pickle模块详解</strong></span></p><p>该<code class="xref py py-mod docutils literal notranslate"><span class="pre">pickle</span></code>模块实现了用于序列化和反序列化Python对象结构的二进制协议。 “Pickling”是将Python对象层次结构转换为字节流的过程, “unpickling”是反向操作,从而将字节流(来自<span class="xref std std-term">二进制文件</span>或<span class="xref std std-term">类似字节的对象</span>)转换回对象层次结构。<code class="xref py py-mod docutils literal notranslate"><span class="pre">pickle</span></code>模块对于错误或恶意构造的数据是不安全的。<span style="font-size: 15px"><br></span></p>
<p><span style="font-size: 15px">pickle协议和JSON(JavaScript Object Notation)的区别 :</span></p>
<p><span style="font-size: 15px"> 1. JSON是一种文本序列化格式(它输出unicode文本,虽然大部分时间它被编码<code class="docutils literal notranslate"><span class="pre">utf-8</span></code>),而pickle是二进制序列化格式;</span></p>
<p><span style="font-size: 15px"> 2. JSON是人类可读的,而pickle则不是;</span></p>
<p><span style="font-size: 15px"> 3. JSON是可互操作的,并且在Python生态系统之外广泛使用,而pickle是特定于Python的;</span></p>
<p><span style="font-size: 15px">默认情况下,JSON只能表示Python内置类型的子集,而不能表示自定义类; pickle可以表示极其庞大的Python类型(其中许多是自动的,通过巧妙地使用Python的内省工具;复杂的案例可以通过实现特定的对象API来解决)。</span></p>
<p id="index-1"><span style="font-size: 15px">pickle 数据格式是特定于Python的。它的优点是没有外部标准强加的限制,例如JSON或XDR(不能代表指针共享); 但是这意味着非Python程序可能无法重建pickled Python对象。</span></p>
<p><span style="font-size: 15px">默认情况下,<code class="xref py py-mod docutils literal notranslate"><span class="pre">pickle</span></code>数据格式使用相对紧凑的二进制表示。如果您需要最佳尺寸特征,则可以有效地压缩数据。</span></p>
<p><strong><span style="font-family: "Microsoft YaHei"; font-size: 16px">模块接口</span></strong></p>
<p><span style="font-size: 15px">要序列化对象层次结构,只需调用该<code class="xref py py-func docutils literal notranslate"><span class="pre">dumps()</span></code>函数即可。同样,要对数据流进行反序列化,请调用该<code class="xref py py-func docutils literal notranslate"><span class="pre">loads()</span></code>函数。但是,如果您想要更多地控制序列化和反序列化,则可以分别创建一个<code class="xref py py-class docutils literal notranslate"><span class="pre">Pickler</span></code>或一个<code class="xref py py-class docutils literal notranslate"><span class="pre">Unpickler</span></code>对象。</span></p>
<p><span style="font-size: 15px"><code class="xref py py-mod docutils literal notranslate"><span class="pre">pickle</span></code>模块提供以下常量:</span></p>
<dl class="data"><dt id="pickle.HIGHEST_PROTOCOL"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">HIGHEST_PROTOCOL</code></span></dt><dd>
<p><span style="font-size: 15px">整数, 可用的最高协议版本。这个值可以作为一个被传递协议的价值函数 <code class="xref py py-func docutils literal notranslate"><span class="pre">dump()</span></code>和<code class="xref py py-func docutils literal notranslate"><span class="pre">dumps()</span></code>以及该<code class="xref py py-class docutils literal notranslate"><span class="pre">Pickler</span></code> 构造函数。</span></p>
</dd></dl><dl class="data"><dt id="pickle.DEFAULT_PROTOCOL"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">DEFAULT_PROTOCOL</code></span></dt><dd>
<p><span style="font-size: 15px">整数,用于编码的默认协议版本。可能不到<code class="xref py py-data docutils literal notranslate"><span class="pre">HIGHEST_PROTOCOL</span></code>。目前,默认协议是3,这是为Python 3设计的新协议。</span></p>
</dd></dl>
<p><span style="font-size: 15px"><code class="xref py py-mod docutils literal notranslate"><span class="pre">pickle</span></code>模块提供以下功能,使酸洗过程更加方便:</span></p>
<dl class="function"><dt id="pickle.dump"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">dump</code><span class="sig-paren">(obj,file,protocol = None,*,fix_imports = True <span class="sig-paren">)</span></span></span></dt><dd>
<p><span style="font-size: 15px">将obj对象的编码pickle编码表示写入到文件对象中,相当于<code class="docutils literal notranslate"><span class="pre">Pickler(file,<span class="pre">protocol).dump(obj)</span></span></code></span></p>
<p><span style="font-size: 15px">可供选择的协议参数是一个整数,指定pickler使用的协议版本,支持的协议是0到<code class="xref py py-data docutils literal notranslate"><span class="pre">HIGHEST_PROTOCOL</span></code>。如果未指定,则默认为<code class="xref py py-data docutils literal notranslate"><span class="pre">DEFAULT_PROTOCOL</span></code>。如果指定为负数,则选择<code class="xref py py-data docutils literal notranslate"><span class="pre">HIGHEST_PROTOCOL</span></code>。</span></p>
<p><span style="font-size: 15px">文件参数必须具有接受单个字节的参数写方法。因此,它可以是为二进制写入打开的磁盘文件, <code class="xref py py-class docutils literal notranslate"><span class="pre">io.BytesIO</span></code>实例或满足此接口的任何其他自定义对象。</span></p>
<p><span style="font-size: 15px">如果fix_imports为true且protocol小于3,则pickle将尝试将新的Python 3名称映射到Python 2中使用的旧模块名称,以便使用Python 2可读取pickle数据流。</span></p>
</dd></dl><dl class="function"><dt id="pickle.dumps"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">dumps</code><span class="sig-paren">(obj,protocol = None,*,fix_imports = True <span class="sig-paren">)</span></span></span></dt><dd>
<p><span style="font-size: 15px">将对象的pickled表示作为<code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code>对象返回,而不是将其写入文件。</span></p>
<p><span style="font-size: 15px">参数protocol和fix_imports具有与in中相同的含义 <code class="xref py py-func docutils literal notranslate"><span class="pre">dump()</span></code>。</span></p>
</dd></dl><dl class="function"><dt id="pickle.load"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">load</code><span class="sig-paren">(file,*,fix_imports = True,encoding =“ASCII”,errors =“strict” <span class="sig-paren">)</span></span></span></dt><dd>
<p><span style="font-size: 15px">从打开的文件对象 文件中读取pickle对象表示,并返回其中指定的重构对象层次结构。这相当于<code class="docutils literal notranslate"><span class="pre">Unpickler(file).load()</span></code>。</span></p>
<p><span style="font-size: 15px">pickle的协议版本是自动检测的,因此不需要协议参数。超过pickle对象的表示的字节将被忽略。</span></p>
<p><span style="font-size: 15px">参数文件必须有两个方法,一个采用整数参数的read()方法和一个不需要参数的readline()方法。两种方法都应返回字节。因此,文件可以是为二进制读取而打开的磁盘文件,<code class="xref py py-class docutils literal notranslate"><span class="pre">io.BytesIO</span></code>对象或满足此接口的任何其他自定义对象。</span></p>
<p><span style="font-size: 15px">可选的关键字参数是fix_imports,encoding和errors,用于控制Python 2生成的pickle流的兼容性支持。如果fix_imports为true,则pickle将尝试将旧的Python 2名称映射到Python 3中使用的新名称。编码和 错误告诉pickle如何解码Python 2编码的8位字符串实例; 这些默认分别为'ASCII'和'strict'。该编码可以是“字节”作为字节对象读取这些8位串的实例。使用<code class="docutils literal notranslate"><span class="pre">encoding='latin1'</span></code>所需的取储存NumPy的阵列和实例<code class="xref py py-class docutils literal notranslate"><span class="pre">datetime</span></code>,<code class="xref py py-class docutils literal notranslate"><span class="pre">date</span></code>并且<code class="xref py py-class docutils literal notranslate"><span class="pre">time</span></code>被Python 2解码。</span></p>
</dd></dl><dl class="function"><dt id="pickle.loads"><span style="font-size: 15px"><code class="descclassname">pickle.</code><code class="descname">loads</code><span class="sig-paren">(bytes_object,*,fix_imports = True,encoding =“ASCII”,errors =“strict” <span class="sig-paren">)</span></span></span></dt><dd>
<p><span style="font-size: 15px">从<code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code>对象读取pickle对象层次结构并返回其中指定的重构对象层次结构。</span></p>
<p><span style="font-size: 15px">pickle的协议版本是自动检测的,因此不需要协议参数。超过pickle对象的表示的字节将被忽略。</span></p>
</dd></dl>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> numpy as np
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> pickle
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> io
</span><span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
path </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">test</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
f </span>= open(path, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">wb</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
data </span>= {<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>:123, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">b</span><span style="color: rgba(128, 0, 0, 1)">'</span>:<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">ads</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">c</span><span style="color: rgba(128, 0, 0, 1)">'</span>:[,]}
pickle.dump(data, f)
f.close()
f1 </span>= open(path, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">rb</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
data1 </span>=<span style="color: rgba(0, 0, 0, 1)"> pickle.load(f1)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(data1)</pre>
</div>
<p><img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190605214431479-1478128997.png" alt="" width="807" height="88"></p>
<p><span style="font-size: 15px">对于python格式的数据集,我们就可以使用pickle进行加载了,下面与cifar10数据集为例,进行读取和加载:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> numpy as np
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> pickle
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> random
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> matplotlib.pyplot as plt
</span><span style="color: rgba(0, 0, 255, 1)">from</span> PIL <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Image
path1 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_1</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
path2 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_2</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
path3 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_3</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
path4 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_4</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
path5 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_5</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
path6 </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">D:\\tmp\cifar10_data\cifar-10-batches-py\\test_batch</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
with open(path1, </span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">rb</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) as fo:
data </span>= pickle.load(fo, encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">bytes</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print(data)</span>
<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print(data)</span>
<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print(data)</span>
<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print(data)</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(data.shape)
images_batch </span>= np.array(data)
images </span>= images_batch.reshape([-1, 3, 32, 32<span style="color: rgba(0, 0, 0, 1)">])
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(images.shape)
imgs </span>= images.reshape()
img </span>= np.stack((imgs, imgs, imgs), 2<span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(img.shape)
plt.imshow(img)
plt.axis(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">off</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
plt.show()</span></pre>
</div>
<p><span style="font-size: 15px">运行结果:</span></p>
<p><img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190608095130249-1829743208.png" alt=""></p>
<p><img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190608095148798-1674177438.png" alt="" width="480" height="440"></p>
<p>接下来就可以读取数据进行训练了。</p>
<p> </p><br><br>
来源:https://www.cnblogs.com/baby-lily/p/10990026.html
頁:
[1]