go语言字节序 encoding/binary

爱看新闻的湖北佬 發表於 2020-2-17 21:37:00

<p> </p>
<h1>字节序</h1>
<p>字节序就是多字节数据类型 (int, float 等)在内存中的存储顺序。在网络传输中基于文本类型的协议（比如 JSON）和二进制协议都是字节通信，是采用字节序进行数据包的处理。</p>
<div>
<p><strong>字节序可分为大端序，低地址端存放高位字节；小端序与之相反，低地址端存放低位字节。</strong></p>
<div class="image-package">
<div class="image-container">
<div class="image-view" data-width="402" data-height="337"><img src="//upload-images.jianshu.io/upload_images/18473-5932c0bfcc2abac9.png?imageMogr2/auto-orient/strip|imageView2/2/w/402/format/webp" alt="" data-original-src="//upload-images.jianshu.io/upload_images/18473-5932c0bfcc2abac9.png" data-original-width="402" data-original-height="337" data-original-format="image/png" data-original-filesize="6082" data-image-index="0"></div>
</div>
</div>
<p>在计算机内部，小端序被广泛应用于现代性 CPU 内部存储数据；而在其他场景譬如网络传输和文件存储使用大端序。</p>
<p>在网络协议层操作二进制数字时约定使用大端序，大端序是网络字节传输采用的方式。因为大端序最高有效字节排在首位（低地址端存放高位字节），能够按照字典排序，所以我们能够比较二进制编码后数字的每个字节。</p>
</div>
<p> </p>
<h1>固定长度编码 Fixed-length encoding</h1>
<p>Go 中有多种类型的整型， int8, int16, int32 和 int64 ，分别使用 1, 3, 4, 8 个字节表示，我们称之为固定长度类型 (fixed-length types)。</p>
<h3>Go 处理固定长度字节序</h3>
<p>Go中处理大小端序的代码位于 encoding/binary ,包中的全局变量BigEndian用于操作大端序数据，LittleEndian用于操作小端序数据，这两个变量所对应的数据类型都实行了ByteOrder接口：</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">type ByteOrder interface {
Uint16([]byte) uint16
Uint32([]byte) uint32
Uint64([]byte) uint64
PutUint16([]byte, uint16)
PutUint32([]byte, uint32)
PutUint64([]byte, uint64)
String() string
}
</pre>
</div>
<p>其中，前三个方法用于读取数据，后三个方法用于写入数据。</p>
<p>上面的方法操作的都是无符号整型，如果我们要操作有符号整型的时候怎么办呢？很简单，强制转换就可以了，比如这样：</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">func PutInt32(b []byte, v int32) {
   binary.BigEndian.PutUint32(b, uint32(v))
}</pre>
</div>
<p>BigEndian 和 LittleEndian 实现了 ByteOrder 接口</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">//BigEndian is the big-endian implementation of ByteOrder.
var BigEndian bigEndian

//LittleEndian is the little-endian implementation of ByteOrder.
var LittleEndian littleEndian</pre>
</div>
<p>举个例子，把固定长度的数字写入字节切片 (byte slice)，然后从字节切片中读取到并赋值给一个变量：</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">// write
v := uint32(500)
buf := make([]byte, 4)
binary.BigEndian.PutUint32(buf, v)

// read
x := binary.BigEndian.Uint32(buf)</pre>
</div>
<p> </p>
<p>在这里，需要注意的是使用 put 写时要保证足够的切片长度，另外如果从流 (stream) 读取时要使用 io.ReadFull 确保读取的是原始字节，而不是使用特定的 read Buffer 编码处理过的字节。</p>
<p>go处理大端序和小端序的方式：</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">package main

import (
"encoding/binary"
"fmt"
"unsafe"
)

const INT_SIZE int = int(unsafe.Sizeof(0))

//判断我们系统中的字节序类型
func systemEdian() {
var i int = 0x1
bs := (*byte)(unsafe.Pointer(&i))
if bs == 0 {
   fmt.Println("system edian is little endian")
} else {
   fmt.Println("system edian is big endian")
}
}

func testBigEndian() {

// 0000 0000 0000 0000 0000 0001 1111 1111
var testInt int32 = 256
fmt.Printf("%d use big endian: \n", testInt)
var testBytes []byte = make([]byte, 4)
binary.BigEndian.PutUint32(testBytes, uint32(testInt))
fmt.Println("int32 to bytes:", testBytes)

convInt := binary.BigEndian.Uint32(testBytes)
fmt.Printf("bytes to int32: %d\n\n", convInt)
}

func testLittleEndian() {

// 0000 0000 0000 0000 0000 0001 1111 1111
var testInt int32 = 256
fmt.Printf("%d use little endian: \n", testInt)
var testBytes []byte = make([]byte, 4)
binary.LittleEndian.PutUint32(testBytes, uint32(testInt))
fmt.Println("int32 to bytes:", testBytes)

convInt := binary.LittleEndian.Uint32(testBytes)
fmt.Printf("bytes to int32: %d\n\n", convInt)
}

func main() {
systemEdian()
fmt.Println("")
testBigEndian()
testLittleEndian()
}</pre>
</div>
<h3>Go 处理固定长度流 (stream processing)</h3>
<p>binary package 提供了内置的读写固定长度值的流 (stream):</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">func Read(r io.Reader, order ByteOrder, data interface{}) error
func Write(w io.Writer, order ByteOrder, data interface{}) error</pre>
</div>
<p>Read 通过指定类型的字节序把字节解码 (decode) 到 data 变量中。解码布尔类型时，0 字节 (也就是 []byte{0x00}) 为 false, 其他都为 true</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">package main
import (
"bytes"
"encoding/binary"
"fmt"
)
func main() {
var(
   piVar float64
   boolVar bool
)
piByte := []byte{0x18, 0x2d, 0x44, 0x54, 0xfb, 0x21, 0x09, 0x40}
boolByte := []byte{0x00}
piBuffer := bytes.NewReader(piByte)
boolBuffer := bytes.NewReader(boolByte)
binary.Read(piBuffer, binary.LittleEndian, &piVar)
binary.Read(boolBuffer, binary.LittleEndian, & boolByte)
fmt.Println("pi", piVar) // pi 3.141592653589793
fmt.Println("bool", boolVar) // bool false
}

</pre>
</div>
<pre class="brush:go;gutter:true;">Write 是 Read 的逆过程，直接看例子比较直观：</pre>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">package main
import (
"bytes"
"encoding/binary"
"fmt"
"math"
)
func main() {
buf := new(bytes.Buffer)
var pi float64 = math.Pi
err := binary.Write(buf, binary.LittleEndian, pi)
if err != nil {
   fmt.Println("binary.Write failed:", err)
}
fmt.Printf("% x", buf.Bytes()) // 18 2d 44 54 fb 21 09 40
}</pre>
</div>
<pre class="brush:go;gutter:true;">在实际编码中，面对复杂的数据结构，可考虑使用更标准化高效的协议，比如 Protocol Buffer。</pre>
<p>　</p>
<h1>可变长度编码 Variable-length encoding</h1>
<p>固定长度编码对存储空间的占用不灵活，比如一个 int64 类型范围内的值，当值较小时就会产生比较多的 0 字节无效位，直至达到 64 位。使用可变长度编码可限制这种空间浪费。</p>
<p><strong>原理</strong><br>可变长度编码理想情况下值小的数字占用的空间比值大的数字少，有多种实现方案，Go Binary 实现方式和 protocol buffer encoding 一致，具体原理如下：</p>
<p>每个字节的首位存放一个标识位，用以表明是否还有跟多字节要读取及剩下的七位是否真正存储数据。标识位分别为 0 和 1</p>
<p>1 表示还要继续读取该字节后面的字节<br>0 表示停止读取该字节后面的字节<br>一旦所有读取完所有的字节，每个字节串联的结果就是最后的值。举例说明：数字 53 用二进制表示为 110101 ，需要六位存储，除了标识位还剩余七位，所以在标识位后补 0 凑够七位，最终结果为 00110101。标识位 0 表明所在字节后面没有字节可读了，标识位后面的 0110101 保存了值。</p>
<p>再来一个大点的数字举例，1732 二进制使用 11011000100 表示，实际上只需使用 11 位的空间存储，除了标识位每个字节只能保存 7 位，所以数字 1732 需要两个字节存储。第一个字节使用 1 表示所在字节后面还有字节，第二个字节使用 0 表示所在字节后面没有字节，最终结果为：10001101 01000100</p>
<p><strong>go处理可变长度的字节序</strong><br>函数 putVarint() 和 putUvarint() 把可变长值写到内存字节切片中</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">func PutVarint(buf []byte, x int64) int
func PutUvarint(buf []byte, x uint64) int
</pre>
</div>
<p> </p>
<p>这两个函数把 x 编码到 buf 中并返回写入 buf 中字节的长度，如果 buf 初始化长度过小（比 x 还要小）函数就会 panic , 建议使用 binary.MaxVarintLen64 常量确保出现 panic 的情况。</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">package main
import (
"encoding/binary"
"fmt"
)
func main() {
buf := make([]byte, binary.MaxVarintLen64)
for _, x := range []int64{-65, 1, 2, 127, 128, 255, 256} {
   n := binary.PutVarint(buf, x)
   fmt.Print(x, "输出的可变长度为：", n, "，十六进制为：")
   fmt.Printf("%x\n", buf[:n])
}
}</pre>
</div>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">-65输出的可变长度为：2，十六进制为：8101
1输出的可变长度为：1，十六进制为：02
2输出的可变长度为：1，十六进制为：04
127输出的可变长度为：2，十六进制为：fe01
128输出的可变长度为：2，十六进制为：8002
255输出的可变长度为：2，十六进制为：fe03
256输出的可变长度为：2，十六进制为：8004
</pre>
</div>
<p>函数 Varint() 和 Uvarint() 把字节码转为十进制。　</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">func Varint(buf []byte) (int64, int)
func Uvarint(buf []byte) (uint64, int)
package main
import (
"encoding/binary"
"fmt"
)
func main() {
inputs := [][]byte{
   []byte{0x81, 0x01},
   []byte{0x7f},
   []byte{0x03},
   []byte{0x01},
   []byte{0x00},
   []byte{0x02},
   []byte{0x04},
   []byte{0x7e},
   []byte{0x80, 0x01},
}
for _, b := range inputs {
   x, n := binary.Varint(b)
   if n != len(b) {
         fmt.Println("Varint did not consume all of in")
   }
   fmt.Println(x) // -65,-64,-2,-1,0,1,2,63,64,
}
}</pre>
</div>
<h3>go处理可变长度字节流数据 Decoding from a byte stream</h3>
<p>binary 包提供了两个函数从字节流中读取到可变长度值。</p>
<div class="cnblogs_Highlighter">
<pre class="brush:go;gutter:true;">func ReadVarint(r io.ByteReader) (int64, error)
func ReadUvarint(r io.ByteReader) (uint64, error)
</pre>
</div>
<p> </p>
<p> </p>
<h1>总结</h1>
<p>二进制协议 (Binary protocol) 高效地在底层处理数据通信，字节序决定字节输出的顺序、通过可变长度编码压缩数据存储空间。理解了 Encoding/binary 库之后，我们可以继续深入理解当前一些主流的二进制协议。</p>
<p> </p>
<p>全文整理于：</p>
<p>字节序及 Go encoding/binary 库：https://zhuanlan.zhihu.com/p/35326716、https://www.jianshu.com/p/1deed9012440</p>
<p class="Post-Title"> </p>
<h5 class="title">go语言的字节序</h5><br><br>
来源：https://www.cnblogs.com/-wenli/p/12323809.html

頁: [1]

圆梦公社's Archiver

go语言字节序 encoding/binary