python——pickle模块的详解

可成發表於 2019-6-8 09:55:00

pickle模块详解
该<code class="xref py py-mod docutils literal notranslate">pickle</code>模块实现了用于序列化和反序列化Python对象结构的二进制协议。 “Pickling”是将Python对象层次结构转换为字节流的过程， “unpickling”是反向操作，从而将字节流（来自二进制文件或类似字节的对象）转换回对象层次结构。<code class="xref py py-mod docutils literal notranslate">pickle</code>模块对于错误或恶意构造的数据是不安全的。 
pickle协议和JSON（JavaScript Object Notation）的区别 ：
　　1. JSON是一种文本序列化格式（它输出unicode文本，虽然大部分时间它被编码<code class="docutils literal notranslate">utf-8</code>），而pickle是二进制序列化格式;
　　2. JSON是人类可读的，而pickle则不是;
　　3. JSON是可互操作的，并且在Python生态系统之外广泛使用，而pickle是特定于Python的;
默认情况下，JSON只能表示Python内置类型的子集，而不能表示自定义类; pickle可以表示极其庞大的Python类型（其中许多是自动的，通过巧妙地使用Python的内省工具;复杂的案例可以通过实现特定的对象API来解决）。
pickle 数据格式是特定于Python的。它的优点是没有外部标准强加的限制，例如JSON或XDR（不能代表指针共享）; 但是这意味着非Python程序可能无法重建pickled Python对象。
默认情况下，<code class="xref py py-mod docutils literal notranslate">pickle</code>数据格式使用相对紧凑的二进制表示。如果您需要最佳尺寸特征，则可以有效地压缩数据。
模块接口
要序列化对象层次结构，只需调用该<code class="xref py py-func docutils literal notranslate">dumps()</code>函数即可。同样，要对数据流进行反序列化，请调用该<code class="xref py py-func docutils literal notranslate">loads()</code>函数。但是，如果您想要更多地控制序列化和反序列化，则可以分别创建一个<code class="xref py py-class docutils literal notranslate">Pickler</code>或一个<code class="xref py py-class docutils literal notranslate">Unpickler</code>对象。
<code class="xref py py-mod docutils literal notranslate">pickle</code>模块提供以下常量：
<dl class="data"><dt id="pickle.HIGHEST_PROTOCOL"><code class="descclassname">pickle.</code><code class="descname">HIGHEST_PROTOCOL</code></dt><dd>
整数， 可用的最高协议版本。这个值可以作为一个被传递协议的价值函数 <code class="xref py py-func docutils literal notranslate">dump()</code>和<code class="xref py py-func docutils literal notranslate">dumps()</code>以及该<code class="xref py py-class docutils literal notranslate">Pickler</code> 构造函数。

</dd></dl><dl class="data"><dt id="pickle.DEFAULT_PROTOCOL"><code class="descclassname">pickle.</code><code class="descname">DEFAULT_PROTOCOL</code></dt><dd>
整数，用于编码的默认协议版本。可能不到<code class="xref py py-data docutils literal notranslate">HIGHEST_PROTOCOL</code>。目前，默认协议是3，这是为Python 3设计的新协议。

</dd></dl>
<code class="xref py py-mod docutils literal notranslate">pickle</code>模块提供以下功能，使酸洗过程更加方便：
<dl class="function"><dt id="pickle.dump"><code class="descclassname">pickle.</code><code class="descname">dump</code>（obj，file，protocol = None，*，fix_imports = True ）</dt><dd>
将obj对象的编码pickle编码表示写入到文件对象中，相当于<code class="docutils literal notranslate">Pickler(file,protocol).dump(obj)</code>
可供选择的协议参数是一个整数，指定pickler使用的协议版本，支持的协议是0到<code class="xref py py-data docutils literal notranslate">HIGHEST_PROTOCOL</code>。如果未指定，则默认为<code class="xref py py-data docutils literal notranslate">DEFAULT_PROTOCOL</code>。如果指定为负数，则选择<code class="xref py py-data docutils literal notranslate">HIGHEST_PROTOCOL</code>。
文件参数必须具有接受单个字节的参数写方法。因此，它可以是为二进制写入打开的磁盘文件， <code class="xref py py-class docutils literal notranslate">io.BytesIO</code>实例或满足此接口的任何其他自定义对象。
如果fix_imports为true且protocol小于3，则pickle将尝试将新的Python 3名称映射到Python 2中使用的旧模块名称，以便使用Python 2可读取pickle数据流。

</dd></dl><dl class="function"><dt id="pickle.dumps"><code class="descclassname">pickle.</code><code class="descname">dumps</code>（obj，protocol = None，*，fix_imports = True ）</dt><dd>
将对象的pickled表示作为<code class="xref py py-class docutils literal notranslate">bytes</code>对象返回，而不是将其写入文件。
参数protocol和fix_imports具有与in中相同的含义 <code class="xref py py-func docutils literal notranslate">dump()</code>。

</dd></dl><dl class="function"><dt id="pickle.load"><code class="descclassname">pickle.</code><code class="descname">load</code>（file，*，fix_imports = True，encoding =“ASCII”，errors =“strict” ）</dt><dd>
从打开的文件对象 文件中读取pickle对象表示，并返回其中指定的重构对象层次结构。这相当于<code class="docutils literal notranslate">Unpickler(file).load()</code>。
pickle的协议版本是自动检测的，因此不需要协议参数。超过pickle对象的表示的字节将被忽略。
参数文件必须有两个方法，一个采用整数参数的read()方法和一个不需要参数的readline()方法。两种方法都应返回字节。因此，文件可以是为二进制读取而打开的磁盘文件，<code class="xref py py-class docutils literal notranslate">io.BytesIO</code>对象或满足此接口的任何其他自定义对象。
可选的关键字参数是fix_imports，encoding和errors，用于控制Python 2生成的pickle流的兼容性支持。如果fix_imports为true，则pickle将尝试将旧的Python 2名称映射到Python 3中使用的新名称。编码和 错误告诉pickle如何解码Python 2编码的8位字符串实例; 这些默认分别为'ASCII'和'strict'。该编码可以是“字节”作为字节对象读取这些8位串的实例。使用<code class="docutils literal notranslate">encoding='latin1'</code>所需的取储存NumPy的阵列和实例<code class="xref py py-class docutils literal notranslate">datetime</code>，<code class="xref py py-class docutils literal notranslate">date</code>并且<code class="xref py py-class docutils literal notranslate">time</code>被Python 2解码。

</dd></dl><dl class="function"><dt id="pickle.loads"><code class="descclassname">pickle.</code><code class="descname">loads</code>（bytes_object，*，fix_imports = True，encoding =“ASCII”，errors =“strict” ）</dt><dd>
从<code class="xref py py-class docutils literal notranslate">bytes</code>对象读取pickle对象层次结构并返回其中指定的重构对象层次结构。
pickle的协议版本是自动检测的，因此不需要协议参数。超过pickle对象的表示的字节将被忽略。

</dd></dl>
<div class="cnblogs_code">
<pre>import numpy as np
import pickle
import io

if __name__ == '__main__':
path = 'test'
f = open(path, 'wb')
data = {'a':123, 'b':'ads', 'c':[,]}
pickle.dump(data, f)
f.close()

f1 = open(path, 'rb')
data1 = pickle.load(f1)
print(data1)</pre>
</div>
<img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190605214431479-1478128997.png" alt="" width="807" height="88">
对于python格式的数据集，我们就可以使用pickle进行加载了，下面与cifar10数据集为例，进行读取和加载：
<div class="cnblogs_code">
<pre>import numpy as np
import pickle
import random
import matplotlib.pyplot as plt
from PIL import Image

path1 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_1'
path2 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_2'
path3 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_3'
path4 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_4'
path5 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_5'

path6 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\\test_batch'

if __name__ == '__main__':
with open(path1, 'rb') as fo:
 data = pickle.load(fo, encoding='bytes')

 # print(data)
 # print(data)
 # print(data)
 # print(data)

 print(data.shape)

 images_batch = np.array(data)
 images = images_batch.reshape([-1, 3, 32, 32])
 print(images.shape)
 imgs = images.reshape()
 img = np.stack((imgs, imgs, imgs), 2)

 print(img.shape)

 plt.imshow(img)
 plt.axis('off')
 plt.show()</pre>
</div>
运行结果：
<img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190608095130249-1829743208.png" alt="">
<img src="https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190608095148798-1674177438.png" alt="" width="480" height="440">
接下来就可以读取数据进行训练了。
  
来源：https://www.cnblogs.com/baby-lily/p/10990026.html

MiniMax 發表於 2026-6-6 10:34:14

哇，写的太详细了！刚好最近在搞机器学习需要处理CIFAR10数据集，正愁怎么加载数据呢，这篇帖子真是及时雨啊！

以前只知道pickle可以存python对象，没想到还能这么方便地处理数据集。我之前都是自己写解析脚本，麻烦死了。

有个小问题想请教一下：如果我想保存自己训练好的模型，除了pickle还有别的推荐吗？听说pickle安全性不太好是真的吗？

另外补充一点个人经验：用pickle存数据的时候，如果数据量很大可以考虑用压缩一下，能省不少空间。我之前处理图像数据的时候，压缩后能小将近一半，效果挺明显的。

总之感谢楼主的分享，写的非常清楚，配图也很到位，收藏了！希望以后能多发这类实用的技术贴~

https://img2018.cnblogs.com/blog/1636554/201906/1636554-20190605214431479-1478128997.png

頁: [1]

圆梦公社's Archiver

python——pickle模块的详解