三胎宝妈时刻准备逆袭中 發表於 2019-12-22 14:59:00

使用python实现哈希表、字典、集合

<h2>哈希表</h2>
<p>哈希表(<strong>Hash Table</strong>, 又称为散列表),是一种线性表的存储结构。哈希表由一个<strong>直接寻址表</strong>和一个<strong>哈希函数</strong>组成。哈希函数<strong>h(k)</strong>将元素关键字<strong>k</strong>作为自变量,返回元素的存储下标。</p>
<p>简单哈希函数:</p>
<ol>
<li>除法哈希:<strong>h(k) = k mod m</strong></li>
<li>乘法哈希:<strong>h(k) = floor(m(kA mod 1)) 0&lt;A&lt;1</strong></li>
</ol>
<p>假设有一个长度为7的数组,哈希函数<strong>h(k) = k mod 7</strong>,元素集合<strong>{14, 22, 3, 5}</strong>的存储方式如下图:</p>
<p><img src="https://img2018.cnblogs.com/common/1414912/201912/1414912-20191222141939350-697136046.png" alt=""></p>
<h3>哈希冲突</h3>
<p>由于哈希表的大小是有限的,而要存储的值的总数量是无限的,因此对于任何哈希函数,都会出现两个不同的元素映射到同一个位置上的情况,这种情况叫做哈希冲突。</p>
<p>比如:<strong>h(k) = k mod 7, h(0) = h(7) = h(14) = ...</strong></p>
<p>&nbsp;</p>
<p><strong>解决哈希冲突--开放寻址法</strong></p>
<p>开放寻址法:如果哈希函数返回的位置已经有值,则可以向后探查新的位置来存储这个值</p>
<ol>
<li>线性探查:如果位置<strong>i</strong>被占用,则探查<strong>i+1, i+2,...</strong></li>
<li>二次探查:如果位置<strong>i</strong>被占用,则探查<strong>i+1<sup>2</sup>, i-1<sup>2</sup>, i+2<sup>2</sup>, i-2<sup>2</sup>,...</strong></li>
<li>二度哈希:有<strong>n</strong>个哈希函数,当使用第一个哈希函数<strong>h1</strong>发生冲突时,则尝试使用<strong>h2, h3,...</strong></li>
</ol>
<p>&nbsp;</p>
<p><strong>解决哈希冲突--拉链法</strong></p>
<p>拉链法:哈希表每一个位置都连接一个链表,当冲突发生时,冲突的元素将被加到该位置链表的最后。</p>
<p><img src="https://img2018.cnblogs.com/common/1414912/201912/1414912-20191222143215344-2107366338.png" alt=""></p>
<h3>哈希表的实现</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Array(object):

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span>(self, size=32, init=<span style="color: rgba(0, 0, 0, 1)">None):
      self._size </span>=<span style="color: rgba(0, 0, 0, 1)"> size
      self._items </span>= *<span style="color: rgba(0, 0, 0, 1)"> size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__getitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._items

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__setitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index, value):
      self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> clear(self, value=<span style="color: rgba(0, 0, 0, 1)">None):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> range(len(self._items)):
            self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._items:
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> item


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Slot(object):
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
    定义一个 hash 表数组的槽(slot 这里指的就是数组的一个位置)
    hash table 就是一个数组,每个数组的元素(也叫slot槽)是一个对象,对象包含两个属性 key 和 value。

    注意,一个槽有三种状态,看你能否想明白。相比链接法解决冲突,探查法删除一个 key 的操作稍微复杂。
    1.从未使用 HashMap.UNUSED。此槽没有被使用和冲突过,查找时只要找到 UNUSED 就不用再继续探查了
    2.使用过但是 remove 了,此时是 HashMap.EMPTY,该探查点后边的元素仍然可能是有key的,需要继续查找
    3.槽正在使用 Slot 节点
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self, key, value):
      self.key, self.value </span>=<span style="color: rgba(0, 0, 0, 1)"> key, value


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> HashTable(object):
    UNUSED </span>= None<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没被使用过</span>
    EMPTY = Slot(None, None)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 使用却被删除过</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      self._table </span>= Array(8, init=HashTable.UNUSED)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 保持 2*i 次方</span>
      self.length =<span style="color: rgba(0, 0, 0, 1)"> 0

    @property
    </span><span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _load_factor(self):
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> load_factor 超过 0.8 重新分配</span>
      <span style="color: rgba(0, 0, 255, 1)">return</span> self.length /<span style="color: rgba(0, 0, 0, 1)"> float(len(self._table))

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self.length

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 进行哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _hash(self, key):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> abs(hash(key)) %<span style="color: rgba(0, 0, 0, 1)"> len(self._table)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 查找key</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_key(self, key):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
      解释一个 slot 为 UNUSED 和 EMPTY 的区别
      因为使用的是二次探查的方式,假如有两个元素 A,B 冲突了,
      首先A hash 得到是 slot 下标5,A 放到了第5个槽,之后插入 B 因为冲突了,所以继续根据二次探查方式放到了 slot下标8。
      然后删除 A,槽 5 被置为 EMPTY。然后我去查找 B,
      第一次 hash 得到的是 槽5,但是这个时候我还是需要第二次计算 hash 才能找到 B。
      但是如果槽是 UNUSED 我就不用继续找了,我认为 B 就是不存在的元素。这个就是 UNUSED 和 EMPTY 的区别。
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      origin_index </span>= index = self._hash(key)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> origin_index 判断是否又走到了起点,如果查找一圈了都找不到则无此元素</span>
      _len =<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 注意如果是 EMPTY,继续寻找下一个槽</span>
                index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
                </span><span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>
                <span style="color: rgba(0, 0, 255, 1)">continue</span>
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._table.key == key:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找到了key</span>
                <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index
            </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
                index </span>= (index * 5 + 1) % _len<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没有找到继续找下一个位置</span>
                <span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>

      <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找能插入的槽</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_slot_for_insert(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._hash(key)
      _len </span>=<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> <span style="color: rgba(0, 0, 255, 1)">not</span> self._slot_can_insert(index):<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 直到找到一个可以用的槽</span>
            index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽是否能插入</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _slot_can_insert(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY <span style="color: rgba(0, 0, 255, 1)">or</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> in operator,实现之后可以使用 in 操作符判断</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__contains__</span><span style="color: rgba(0, 0, 0, 1)">(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> index <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 添加元素</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> add(self, key, value):
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> key <span style="color: rgba(0, 0, 255, 1)">in</span> self:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> update</span>
            index =<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
            self._table.value </span>=<span style="color: rgba(0, 0, 0, 1)"> value
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> False
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(key)
            self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Slot(key, value)
            self.length </span>+= 1
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._load_factor &gt;= 0.8<span style="color: rgba(0, 0, 0, 1)">:
                self._rehash()
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> True

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽不够时,重哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _rehash(self):
      old_table </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table
      newsize </span>= len(self._table) * 2<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Array(newsize, HashTable.UNUSED)

      self.length </span>=<span style="color: rgba(0, 0, 0, 1)"> 0

      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> old_table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span> HashTable.UNUSED <span style="color: rgba(0, 0, 255, 1)">and</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY:
                index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(slot.key)
                self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> slot
                self.length </span>+= 1

    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 获取值</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> get(self, key, default=<span style="color: rgba(0, 0, 0, 1)">None):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> default
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._table.value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 移除</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> remove(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">raise</span><span style="color: rgba(0, 0, 0, 1)"> KeyError()
      value </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table.value
      self.length </span>-= 1<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 遍历</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> (HashTable.EMPTY, HashTable.UNUSED):
                </span><span style="color: rgba(0, 0, 255, 1)">yield</span> slot.key</pre>
</div>
<h3>哈希表的使用</h3>
<div class="cnblogs_code">
<pre>h =<span style="color: rgba(0, 0, 0, 1)"> HashTable()
h.add(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">, 0)
h.add(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">b</span><span style="color: rgba(128, 0, 0, 1)">'</span>, 1<span style="color: rgba(0, 0, 0, 1)">)
h.add(</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">c</span><span style="color: rgba(128, 0, 0, 1)">'</span>, 2<span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(len(h)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 3</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(h.get(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 0</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(h.get(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">b</span><span style="color: rgba(128, 0, 0, 1)">'</span>)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 1</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(h.get(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">hehe</span><span style="color: rgba(128, 0, 0, 1)">'</span>)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> None</span>
h.remove(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(h.get(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> None</span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(sorted(list(h))) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> ['b', 'c']</span></pre>
</div>
<h2>字典</h2>
<p>字典是另一种可变容器模型,且可存储任意类型对象。</p>
<p>字典的每个键值&nbsp;<span class="marked"><strong>key=&gt;value</strong>&nbsp;对用冒号&nbsp;<span class="marked">:&nbsp;分割,每个键值对之间用逗号&nbsp;<span class="marked">,&nbsp;分割,整个字典包括在花括号&nbsp;<span class="marked">{}&nbsp;中 ,格式如下所示:</span></span></span></span></p>
<div class="cnblogs_code">
<pre>d = {key1 : value1, key2 : value2 }</pre>
</div>
<p><img src="https://img2018.cnblogs.com/common/1414912/201912/1414912-20191222144552808-1702137323.gif" alt=""></p>
<h3>基于哈希表实现字典</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Array(object):

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span>(self, size=32, init=<span style="color: rgba(0, 0, 0, 1)">None):
      self._size </span>=<span style="color: rgba(0, 0, 0, 1)"> size
      self._items </span>= *<span style="color: rgba(0, 0, 0, 1)"> size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__getitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._items

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__setitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index, value):
      self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> clear(self, value=<span style="color: rgba(0, 0, 0, 1)">None):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> range(len(self._items)):
            self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._items:
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> item


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Slot(object):
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
    定义一个 hash 表数组的槽(slot 这里指的就是数组的一个位置)
    hash table 就是一个数组,每个数组的元素(也叫slot槽)是一个对象,对象包含两个属性 key 和 value。

    注意,一个槽有三种状态,看你能否想明白。相比链接法解决冲突,探查法删除一个 key 的操作稍微复杂。
    1.从未使用 HashMap.UNUSED。此槽没有被使用和冲突过,查找时只要找到 UNUSED 就不用再继续探查了
    2.使用过但是 remove 了,此时是 HashMap.EMPTY,该探查点后边的元素仍然可能是有key的,需要继续查找
    3.槽正在使用 Slot 节点
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self, key, value):
      self.key, self.value </span>=<span style="color: rgba(0, 0, 0, 1)"> key, value


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> HashTable(object):
    UNUSED </span>= None<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没被使用过</span>
    EMPTY = Slot(None, None)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 使用却被删除过</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      self._table </span>= Array(8, init=HashTable.UNUSED)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 保持 2*i 次方</span>
      self.length =<span style="color: rgba(0, 0, 0, 1)"> 0

    @property
    </span><span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _load_factor(self):
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> load_factor 超过 0.8 重新分配</span>
      <span style="color: rgba(0, 0, 255, 1)">return</span> self.length /<span style="color: rgba(0, 0, 0, 1)"> float(len(self._table))

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self.length

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 进行哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _hash(self, key):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> abs(hash(key)) %<span style="color: rgba(0, 0, 0, 1)"> len(self._table)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 查找key</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_key(self, key):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
      解释一个 slot 为 UNUSED 和 EMPTY 的区别
      因为使用的是二次探查的方式,假如有两个元素 A,B 冲突了,
      首先A hash 得到是 slot 下标5,A 放到了第5个槽,之后插入 B 因为冲突了,所以继续根据二次探查方式放到了 slot下标8。
      然后删除 A,槽 5 被置为 EMPTY。然后我去查找 B,
      第一次 hash 得到的是 槽5,但是这个时候我还是需要第二次计算 hash 才能找到 B。
      但是如果槽是 UNUSED 我就不用继续找了,我认为 B 就是不存在的元素。这个就是 UNUSED 和 EMPTY 的区别。
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      origin_index </span>= index = self._hash(key)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> origin_index 判断是否又走到了起点,如果查找一圈了都找不到则无此元素</span>
      _len =<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 注意如果是 EMPTY,继续寻找下一个槽</span>
                index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
                </span><span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>
                <span style="color: rgba(0, 0, 255, 1)">continue</span>
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._table.key == key:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找到了key</span>
                <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index
            </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
                index </span>= (index * 5 + 1) % _len<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没有找到继续找下一个位置</span>
                <span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>

      <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找能插入的槽</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_slot_for_insert(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._hash(key)
      _len </span>=<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> <span style="color: rgba(0, 0, 255, 1)">not</span> self._slot_can_insert(index):<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 直到找到一个可以用的槽</span>
            index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽是否能插入</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _slot_can_insert(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY <span style="color: rgba(0, 0, 255, 1)">or</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> in operator,实现之后可以使用 in 操作符判断</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__contains__</span><span style="color: rgba(0, 0, 0, 1)">(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> index <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 添加元素</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> add(self, key, value):
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> key <span style="color: rgba(0, 0, 255, 1)">in</span> self:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> update</span>
            index =<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
            self._table.value </span>=<span style="color: rgba(0, 0, 0, 1)"> value
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> False
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(key)
            self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Slot(key, value)
            self.length </span>+= 1
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._load_factor &gt;= 0.8<span style="color: rgba(0, 0, 0, 1)">:
                self._rehash()
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> True

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽不够时,重哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _rehash(self):
      old_table </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table
      newsize </span>= len(self._table) * 2<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Array(newsize, HashTable.UNUSED)

      self.length </span>=<span style="color: rgba(0, 0, 0, 1)"> 0

      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> old_table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span> HashTable.UNUSED <span style="color: rgba(0, 0, 255, 1)">and</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY:
                index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(slot.key)
                self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> slot
                self.length </span>+= 1

    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 获取值</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> get(self, key, default=<span style="color: rgba(0, 0, 0, 1)">None):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> default
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._table.value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 移除</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> remove(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">raise</span><span style="color: rgba(0, 0, 0, 1)"> KeyError()
      value </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table.value
      self.length </span>-= 1<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 遍历</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> (HashTable.EMPTY, HashTable.UNUSED):
                </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> slot.key


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> DictADT(HashTable):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 执行dict=value时执行</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__setitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, key, value):
      self.add(key, value)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 执行dict时执行</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__getitem__</span>(self, key, default=<span style="color: rgba(0, 0, 0, 1)">None):
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> key <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self:
            </span><span style="color: rgba(0, 0, 255, 1)">raise</span><span style="color: rgba(0, 0, 0, 1)"> KeyError()
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self.get(key, default)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 遍历时执行</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _iter_slot(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> (self.UNUSED, self.EMPTY):
                </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> slot

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 实现items方法</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> items(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._iter_slot():
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> (slot.key, slot.value)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 实现keys方法</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> keys(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._iter_slot():
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> slot.key

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 实现values方法</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> values(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._iter_slot():
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span> slot.value</pre>
</div>
<h3>字典的使用</h3>
<div class="cnblogs_code">
<pre>d =<span style="color: rgba(0, 0, 0, 1)"> DictADT()
d[</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>] = 1
<span style="color: rgba(0, 0, 255, 1)">print</span>(d[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>]) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 1<br></span></pre>
</div>
<h2>集合</h2>
<p>集合是一种不包含重复元素的数据结构,经常用来判断是否重复这种操作,或者集合中是否存在一个元素。</p>
<p class="md-end-block md-p"><span class="md-plain md-expand">集合可能最常用的就是去重,判断是否存在一个元素等,但是 set 相比 dict 有更丰富的操作,主要是数学概念上的。</span></p>
<p class="md-end-block md-p"><span class="md-plain md-expand"><span class="md-softbreak"> <span class="md-plain">如果你学过《离散数学》中集合相关的概念,基本上是一致的。 python 的 set 提供了如下基本的集合操作,<span class="md-softbreak"> <span class="md-plain">假设有两个集合 A,B,有以下操作:</span></span></span></span></span></p>
<ul class="ul-list" data-mark="-">
<li class="md-list-item">
<p class="md-end-block md-p"><span class="md-plain">交集: A &amp; B,表示同时在 A 和 B 中的元素。 python 中重载 <span><code>__and__</code><span class="md-plain"> 实现</span></span></span></p>
</li>
<li class="md-list-item">
<p class="md-end-block md-p"><span class="md-plain">并集: A | B,表示在 A 或者 B 中的元素,两个集合相加。python 中重载 <span><code>__or__</code><span class="md-plain"> 实现</span></span></span></p>
</li>
<li class="md-list-item">
<p class="md-end-block md-p"><span class="md-plain">差集: A - B,表示在 A 中但是不在 B 中的元素。 python 中重载 <span><code>__sub__</code><span class="md-plain"> 实现</span></span></span></p>
</li>
</ul>
<h3><span class="md-plain"><span><span class="md-plain">基于哈希表实现集合</span></span></span></h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Array(object):

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span>(self, size=32, init=<span style="color: rgba(0, 0, 0, 1)">None):
      self._size </span>=<span style="color: rgba(0, 0, 0, 1)"> size
      self._items </span>= *<span style="color: rgba(0, 0, 0, 1)"> size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__getitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._items

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__setitem__</span><span style="color: rgba(0, 0, 0, 1)">(self, index, value):
      self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._size

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> clear(self, value=<span style="color: rgba(0, 0, 0, 1)">None):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> range(len(self._items)):
            self._items </span>=<span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._items:
            </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> item


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> Slot(object):
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
    定义一个 hash 表数组的槽(slot 这里指的就是数组的一个位置)
    hash table 就是一个数组,每个数组的元素(也叫slot槽)是一个对象,对象包含两个属性 key 和 value。

    注意,一个槽有三种状态,看你能否想明白。相比链接法解决冲突,探查法删除一个 key 的操作稍微复杂。
    1.从未使用 HashMap.UNUSED。此槽没有被使用和冲突过,查找时只要找到 UNUSED 就不用再继续探查了
    2.使用过但是 remove 了,此时是 HashMap.EMPTY,该探查点后边的元素仍然可能是有key的,需要继续查找
    3.槽正在使用 Slot 节点
    </span><span style="color: rgba(128, 0, 0, 1)">"""</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self, key, value):
      self.key, self.value </span>=<span style="color: rgba(0, 0, 0, 1)"> key, value


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> HashTable(object):
    UNUSED </span>= None<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没被使用过</span>
    EMPTY = Slot(None, None)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 使用却被删除过</span>

    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__init__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      self._table </span>= Array(8, init=HashTable.UNUSED)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 保持 2*i 次方</span>
      self.length =<span style="color: rgba(0, 0, 0, 1)"> 0

    @property
    </span><span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _load_factor(self):
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> load_factor 超过 0.8 重新分配</span>
      <span style="color: rgba(0, 0, 255, 1)">return</span> self.length /<span style="color: rgba(0, 0, 0, 1)"> float(len(self._table))

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__len__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self.length

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 进行哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _hash(self, key):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> abs(hash(key)) %<span style="color: rgba(0, 0, 0, 1)"> len(self._table)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 查找key</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_key(self, key):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">
      解释一个 slot 为 UNUSED 和 EMPTY 的区别
      因为使用的是二次探查的方式,假如有两个元素 A,B 冲突了,
      首先A hash 得到是 slot 下标5,A 放到了第5个槽,之后插入 B 因为冲突了,所以继续根据二次探查方式放到了 slot下标8。
      然后删除 A,槽 5 被置为 EMPTY。然后我去查找 B,
      第一次 hash 得到的是 槽5,但是这个时候我还是需要第二次计算 hash 才能找到 B。
      但是如果槽是 UNUSED 我就不用继续找了,我认为 B 就是不存在的元素。这个就是 UNUSED 和 EMPTY 的区别。
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      origin_index </span>= index = self._hash(key)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> origin_index 判断是否又走到了起点,如果查找一圈了都找不到则无此元素</span>
      _len =<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 注意如果是 EMPTY,继续寻找下一个槽</span>
                index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
                </span><span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>
                <span style="color: rgba(0, 0, 255, 1)">continue</span>
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._table.key == key:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找到了key</span>
                <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index
            </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
                index </span>= (index * 5 + 1) % _len<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 没有找到继续找下一个位置</span>
                <span style="color: rgba(0, 0, 255, 1)">if</span> index ==<span style="color: rgba(0, 0, 0, 1)"> origin_index:
                  </span><span style="color: rgba(0, 0, 255, 1)">break</span>

      <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 找能插入的槽</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _find_slot_for_insert(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._hash(key)
      _len </span>=<span style="color: rgba(0, 0, 0, 1)"> len(self._table)
      </span><span style="color: rgba(0, 0, 255, 1)">while</span> <span style="color: rgba(0, 0, 255, 1)">not</span> self._slot_can_insert(index):<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 直到找到一个可以用的槽</span>
            index = (index * 5 + 1) %<span style="color: rgba(0, 0, 0, 1)"> _len
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> index

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽是否能插入</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _slot_can_insert(self, index):
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span> HashTable.EMPTY <span style="color: rgba(0, 0, 255, 1)">or</span> self._table <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.UNUSED

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> in operator,实现之后可以使用 in 操作符判断</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__contains__</span><span style="color: rgba(0, 0, 0, 1)">(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> index <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> None

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 添加元素</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> add(self, key, value):
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> key <span style="color: rgba(0, 0, 255, 1)">in</span> self:<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> update</span>
            index =<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
            self._table.value </span>=<span style="color: rgba(0, 0, 0, 1)"> value
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> False
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(key)
            self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Slot(key, value)
            self.length </span>+= 1
            <span style="color: rgba(0, 0, 255, 1)">if</span> self._load_factor &gt;= 0.8<span style="color: rgba(0, 0, 0, 1)">:
                self._rehash()
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> True

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 槽不够时,重哈希</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> _rehash(self):
      old_table </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table
      newsize </span>= len(self._table) * 2<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> Array(newsize, HashTable.UNUSED)

      self.length </span>=<span style="color: rgba(0, 0, 0, 1)"> 0

      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> old_table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span> HashTable.UNUSED <span style="color: rgba(0, 0, 255, 1)">and</span> slot <span style="color: rgba(0, 0, 255, 1)">is</span> <span style="color: rgba(0, 0, 255, 1)">not</span><span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY:
                index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_slot_for_insert(slot.key)
                self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> slot
                self.length </span>+= 1

    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 获取值</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> get(self, key, default=<span style="color: rgba(0, 0, 0, 1)">None):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> default
      </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> self._table.value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 移除</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> remove(self, key):
      index </span>=<span style="color: rgba(0, 0, 0, 1)"> self._find_key(key)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> index <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> None:
            </span><span style="color: rgba(0, 0, 255, 1)">raise</span><span style="color: rgba(0, 0, 0, 1)"> KeyError()
      value </span>=<span style="color: rgba(0, 0, 0, 1)"> self._table.value
      self.length </span>-= 1<span style="color: rgba(0, 0, 0, 1)">
      self._table </span>=<span style="color: rgba(0, 0, 0, 1)"> HashTable.EMPTY
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> value

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 遍历</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__iter__</span><span style="color: rgba(0, 0, 0, 1)">(self):
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> slot <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self._table:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> slot <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> (HashTable.EMPTY, HashTable.UNUSED):
                </span><span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> slot.key


</span><span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> SetADT(HashTable):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 添加元素</span>
    <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> add(self, key):
      super().add(key, True)
   
    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__and__</span><span style="color: rgba(0, 0, 0, 1)">(self, other_set):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">交集 A&amp;B</span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      new_set </span>=<span style="color: rgba(0, 0, 0, 1)"> SetADT()
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> element_a <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> element_a <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> other_set:
                new_set.add(element_a)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> new_set

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__sub__</span><span style="color: rgba(0, 0, 0, 1)">(self, other_set):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">差集 A-B</span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      new_set </span>=<span style="color: rgba(0, 0, 0, 1)"> SetADT()
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> element_a <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self:
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> element_a <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> other_set:
                new_set.add(element_a)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> new_set

    </span><span style="color: rgba(0, 0, 255, 1)">def</span> <span style="color: rgba(128, 0, 128, 1)">__or__</span><span style="color: rgba(0, 0, 0, 1)">(self, other_set):
      </span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(128, 0, 0, 1)">并集 A|B</span><span style="color: rgba(128, 0, 0, 1)">"""</span><span style="color: rgba(0, 0, 0, 1)">
      new_set </span>=<span style="color: rgba(0, 0, 0, 1)"> SetADT()
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> element_a <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> self:
            new_set.add(element_a)
      </span><span style="color: rgba(0, 0, 255, 1)">for</span> element_b <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> other_set:
            new_set.add(element_b)
      </span><span style="color: rgba(0, 0, 255, 1)">return</span> new_set</pre>
</div>
<h3>集合的使用</h3>
<div class="cnblogs_code">
<pre>sa =<span style="color: rgba(0, 0, 0, 1)"> SetADT()
sa.add(</span>1<span style="color: rgba(0, 0, 0, 1)">)
sa.add(</span>2<span style="color: rgba(0, 0, 0, 1)">)
sa.add(</span>3<span style="color: rgba(0, 0, 0, 1)">)

sb </span>=<span style="color: rgba(0, 0, 0, 1)"> SetADT()
sb.add(</span>3<span style="color: rgba(0, 0, 0, 1)">)
sb.add(</span>4<span style="color: rgba(0, 0, 0, 1)">)
sb.add(</span>5<span style="color: rgba(0, 0, 0, 1)">)

</span><span style="color: rgba(0, 0, 255, 1)">print</span>(sorted(list(sa &amp; sb))) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> </span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(sorted(list(sa - sb))) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> </span>
<span style="color: rgba(0, 0, 255, 1)">print</span>(sorted(list(sa | sb))) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> </span></pre>
</div>
<p><span style="color: rgba(0, 255, 255, 1)"><strong>~&gt;.&lt;~</strong></span></p><br><br>
来源:https://www.cnblogs.com/pungchur/p/12079853.html
頁: [1]
查看完整版本: 使用python实现哈希表、字典、集合