難得糊涂 發表於 2021-11-11 11:56:00

EDG夺冠!用Python分析22.3万条数据:粉丝都疯了!

<blockquote>
<h2>原创不易,本文禁止抄袭,转载,违权必究!</h2>
</blockquote>
<h2>一、<strong>EDG夺冠信息</strong></h2>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">11月6日,在英雄联盟总决赛中,EDG战队以<span style="color: rgba(51, 102, 255, 1)">3:2</span>战胜韩国队,获得2021年<span style="color: rgba(51, 102, 255, 1)">英雄联盟全球总决赛冠军</span>,这个比赛在全网各大平台也是备受瞩目:</span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">1、<span style="color: rgba(51, 102, 255, 1)">微博</span>热搜<span style="color: rgba(51, 102, 255, 1)">第一名</span>,截止2021-11-10已有<span style="color: rgba(51, 102, 255, 1)">亿级</span>观看量,微博粉丝数到达<span style="color: rgba(51, 102, 255, 1)">638.4万</span></span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="color: rgba(51, 102, 255, 1)"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111111449468-1968820800.png"></span></span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(0, 0, 0, 1)"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">2、<span style="color: rgba(51, 102, 255, 1)">哔哩哔哩</span>已有<span style="color: rgba(51, 102, 255, 1)">几亿</span>人气,总弹幕有<span style="color: rgba(51, 102, 255, 1)">22.3万</span>,全站排行榜最高<span style="color: rgba(51, 102, 255, 1)">第2名</span>,B站粉丝已有<span style="color: rgba(51, 102, 255, 1)">219.9万</span></span></span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="color: rgba(51, 102, 255, 1)"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(0, 0, 0, 1)"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="color: rgba(51, 102, 255, 1)"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111111628685-1316199418.png"></span></span></span></span></span></p>
<p>&nbsp;</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111111654020-1907004601.png"></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">3、<span style="color: rgba(51, 102, 255, 1)">腾讯</span>、<span style="color: rgba(51, 102, 255, 1)">爱奇艺</span>、<span style="color: rgba(51, 102, 255, 1)">优酷</span>等视频平台<span style="color: rgba(51, 102, 255, 1)">800万</span>人看过</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">&nbsp;</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">4、<span style="color: rgba(51, 102, 255, 1)">虎牙</span>等直播平台热度也是居高不下</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">&nbsp;</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">5、<span style="color: rgba(51, 102, 255, 1)">央视新闻</span>也发微博<span style="color: rgba(51, 102, 255, 1)">庆祝EDG夺冠</span></span></p>
<p>&nbsp;</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111111813645-556722073.png"></p>
<p>&nbsp;</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111111826938-1003812549.png"></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">既然比赛热度这么高,那么本次我们就以bilibili为基准,通过采集EDG夺冠比赛视频在哔哩哔哩的22.3万条弹幕数据,再通过Python来分析进而感受粉丝的热情</span></p>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>二、<strong>实战目标&nbsp;</strong></h2>
<h3><strong>2.1&nbsp;网络爬虫</strong></h3>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">利用爬虫技术抓取EDG战队在B站夺冠比赛视频的22.3万条弹幕数据</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112154701681-883159492.png"></span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112171259850-9591407.png"></p>
<p id="1636708380376">&nbsp;</p>
<h3><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">2.2 数据可视化</span></h3>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">通过jieba、numpy等Python库对抓取来的弹幕数据进行分析并且可视化</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113546444-218410341.png"></span></p>
<p>&nbsp;</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113711956-255118540.png"></p>
<p>&nbsp;</p>
<h3>2.3 自然语言处理(情感分析)</h3>
<p>利用pandas+自然语言处理(NLP)等对EDG夺冠比赛视频的弹幕数据进行情感分析,根据分析结果得出一些结论</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112171327759-249634957.png"></p>
<p id="1636708407552">&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112171359889-836568478.png"></p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112184012472-1981832756.png"></p>
<p id="1636713612982">&nbsp;</p>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>三、<strong>bilibili接口分析&nbsp;</strong></h2>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">首先进入EDG夺冠比赛视频URL:</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">https://www.bilibili.com/video/BV1EP4y1j7kV?p=1</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">&nbsp;</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">哔哩哔哩已为大家整理好了EDG比赛视频,从开幕式到夺冠时刻,共有7个视频</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112126423-1102009083.png"></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(51, 102, 255, 1)">哔哩哔哩弹幕数据接口:</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(0, 255, 0, 1)">http://api.bilibili.com/x/v1/dm/list.so?oid=XXX</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">这个接口就是B站弹幕数据专用接口,我们可以直接拿来用,这个接口中的oid可以理解为每个视频中的唯一标识符,它由数字组成,每一个视频都有唯一的一个oid,那么我们只要找到oid就可以请求相应比赛视频弹幕的API接口,从而抓取弹幕数据</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">&nbsp;</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(51, 102, 255, 1)">获取oid</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">打开开发者工具,切换到<span style="color: rgba(51, 102, 255, 1)">Network</span>选项,然后找到以<span style="color: rgba(51, 102, 255, 1)">pagelist</span>为开头的请求接口</span></p>
<h2><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">&nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112235226-471884587.png"></span></h2>
<p>&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">接着找到Request URL这个请求接口,打开新窗口直接用这个API接口请求,如下图:</span></p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112323680-43762921.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">当我们直接请求这个API接口时可以看到<span style="color: rgba(51, 102, 255, 1)">JSON</span>格式的数据,而在里面的cid就是我们需要的oid,如下所示:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> {<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">code</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">message</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">0</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">ttl</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">data</span><span style="color: rgba(128, 0, 0, 1)">"</span>:[{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437586584</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">第一局 4K</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">2952</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437626309</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">2</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">第二局 4K</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">3031</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437659159</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">3</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">第三局 4K</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">3406</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437727348</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">4</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">第四局 4K</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">3212</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437729555</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">5</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">第五局 4K</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">3478</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437550300</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">6</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">开幕式</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">984</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}},{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">437717574</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">page</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">7</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">from</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vupload</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">part</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">夺冠时刻</span><span style="color: rgba(128, 0, 0, 1)">"</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">duration</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">2017</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">vid</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">weblink</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 0, 1)">""</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">dimension</span><span style="color: rgba(128, 0, 0, 1)">"</span>:{<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1920</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">height</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">1080</span>,<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">rotate</span><span style="color: rgba(128, 0, 0, 1)">"</span>:<span style="color: rgba(128, 0, 128, 1)">0</span>}}]</pre>
</div>
<p>&nbsp;</p>
<p>当然我们也可以点击<span style="color: rgba(51, 102, 255, 1)">Preview</span>选项,点击data,打开数据,而里面的JSON数据是<span style="color: rgba(51, 102, 255, 1)">折叠</span>的,包括cid在内,如下图所示:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112507841-476607944.png"></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">可以看到,每个cid对应每一个比赛视频。我们也可以点击<span style="color: rgba(51, 102, 255, 1)">Response</span>选项,里面的数据是真实的数据,意味着数据没有经过折叠,与直接请求Request URL返回的JSON数据是一样的</span></p>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>&nbsp;四、<strong>编码&nbsp;</strong></h2>
<h3><strong>4.1 爬取数据</strong></h3>
<h2><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="color: rgba(0, 255, 0, 1)">定义一个获取cid的方法</span></span></h2>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 0, 1)">import requests
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 0, 1)">import json
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>
<span style="color: rgba(0, 128, 128, 1)"> 4</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)">def get_cid():
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>   url = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">https://api.bilibili.com/x/player/pagelist?bvid=BV1EP4y1j7kV&amp;jsonp=jsonp</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>   <span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>   response = requests.<span style="color: rgba(0, 0, 255, 1)">get</span>(url,timeout=<span style="color: rgba(0, 0, 0, 1)">None)
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span>   <span style="color: rgba(0, 0, 255, 1)">if</span> response <span style="color: rgba(0, 0, 255, 1)">is</span><span style="color: rgba(0, 0, 0, 1)"> not None:
</span><span style="color: rgba(0, 128, 128, 1)">10</span>       <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> response.text
</span><span style="color: rgba(0, 128, 128, 1)">11</span>   <span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">12</span>       <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> Nnone
</span><span style="color: rgba(0, 128, 128, 1)">13</span>   except Exception <span style="color: rgba(0, 0, 255, 1)">as</span><span style="color: rgba(0, 0, 0, 1)"> e:
</span><span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 0, 1)">    print(e.args)
</span><span style="color: rgba(0, 128, 128, 1)">15</span>
<span style="color: rgba(0, 128, 128, 1)">16</span>
<span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 255, 1)">if</span> __name__ == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">18</span>   data =<span style="color: rgba(0, 0, 0, 1)"> get_cid()
</span><span style="color: rgba(0, 128, 128, 1)">19</span>   json_data =<span style="color: rgba(0, 0, 0, 1)"> json.loads(data)
</span><span style="color: rgba(0, 128, 128, 1)">20</span>   <span style="color: rgba(0, 0, 255, 1)">for</span> cid_datas <span style="color: rgba(0, 0, 255, 1)">in</span> json_data[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">data</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]:
</span><span style="color: rgba(0, 128, 128, 1)">21</span>   cid = cid_datas.<span style="color: rgba(0, 0, 255, 1)">get</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">22</span>   print(cid)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">控制台输出如下:</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112839949-862321133.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(0, 255, 0, 1)"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">拼接URL弹幕数据API接口</span></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> <span style="color: rgba(0, 0, 255, 1)">if</span> __name__ == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">2</span>   data =<span style="color: rgba(0, 0, 0, 1)"> get_cid()
</span><span style="color: rgba(0, 128, 128, 1)">3</span>   json_data =<span style="color: rgba(0, 0, 0, 1)"> json.loads(data)
</span><span style="color: rgba(0, 128, 128, 1)">4</span>   base_api = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">http://api.bilibili.com/x/v1/dm/list.so?oid=</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 128, 128, 1)">5</span>   <span style="color: rgba(0, 0, 255, 1)">for</span> cid_datas <span style="color: rgba(0, 0, 255, 1)">in</span> json_data[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">data</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]:
</span><span style="color: rgba(0, 128, 128, 1)">6</span>   cid = cid_datas.<span style="color: rgba(0, 0, 255, 1)">get</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">7</span>   detail_api = base_api +<span style="color: rgba(0, 0, 0, 1)"> str(cid)
</span><span style="color: rgba(0, 128, 128, 1)">8</span>   print(detail_api)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">控制台输出如下:</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111112943680-2104996142.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">一共有7个网址,对应7个EDG比赛视频的弹幕数据,我们点开第一个网址查看</span></span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113012931-389787030.png"></span></span></span></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h2>&nbsp;</h2>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px; color: rgba(0, 255, 0, 1)">抓取弹幕数据</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">从上一张图可以看到,每一条弹幕数据都在每一个&lt;d&gt;标签中,面对这种格式我们思考一下用哪种解析工具比较合适?答案当然是正则表达式,接下来我们要获取7个比赛视频的22.3万条数据,代码如下:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> base_api = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">http://api.bilibili.com/x/v1/dm/list.so?oid=</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span>   all_api =<span style="color: rgba(0, 0, 0, 1)"> []
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>   <span style="color: rgba(0, 0, 255, 1)">for</span> cid_datas <span style="color: rgba(0, 0, 255, 1)">in</span> json_data[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">data</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]:
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>   cid = cid_datas.<span style="color: rgba(0, 0, 255, 1)">get</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">cid</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span>   detail_api = base_api +<span style="color: rgba(0, 0, 0, 1)"> str(cid)
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> <span style="color: rgba(0, 0, 0, 1)">    all_api.append(detail_api)
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span>   <span style="color: rgba(0, 0, 255, 1)">for</span> api <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> all_api:
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>   edg_datas =<span style="color: rgba(0, 0, 0, 1)"> get_api_data(detail_api)
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span>   edg_datas = re.findall(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&lt;d.*?&gt;(.*?)&lt;/d&gt;</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,edg_datas,re.S)
</span><span style="color: rgba(0, 128, 128, 1)">10</span>   with open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">EDG.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 0, 255, 1)">as</span><span style="color: rgba(0, 0, 0, 1)"> f:
</span><span style="color: rgba(0, 128, 128, 1)">11</span>       <span style="color: rgba(0, 0, 255, 1)">for</span> edg_data <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> edg_datas:
</span><span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 0, 1)">      print(edg_data)
</span><span style="color: rgba(0, 128, 128, 1)">13</span>         f.write(edg_data + <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">避免乱码,加上如下代码:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> response.encoding = chardet.detect(response.content)[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">encoding</span><span style="color: rgba(128, 0, 0, 1)">'</span>]</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">控制台输出如下:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113328374-487905899.png"></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">由于弹幕数据共有22.3万条,这里仅展示EDG.txt部分弹幕数据,如下图所示:</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113400197-1857964932.png"></span></p>
<p>&nbsp;</p>
<h3>4.2 数据可视化(词云图)</h3>
<p><span style="color: rgba(0, 255, 0, 1); font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">词云图制作</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">我们已经抓取到弹幕数据,接下来利用EDG背景图做一个词云图</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113430001-22965755.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">代码如下:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 0, 1)">import jieba
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">from</span><span style="color: rgba(0, 0, 0, 1)"> wordcloud import WordCloud
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> import matplotlib.pyplot <span style="color: rgba(0, 0, 255, 1)">as</span><span style="color: rgba(0, 0, 0, 1)"> plt
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span> <span style="color: rgba(0, 0, 255, 1)">from</span><span style="color: rgba(0, 0, 0, 1)"> PIL import Image
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> import numpy <span style="color: rgba(0, 0, 255, 1)">as</span><span style="color: rgba(0, 0, 0, 1)"> np
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 0, 1)">def do_wordcloud():
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>   text = open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">EDG.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">r</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">).read()
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span>   text = text.replace(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">''</span>).replace(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\u3000</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">''</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">10</span>   text_cut =<span style="color: rgba(0, 0, 0, 1)"> jieba.lcut(text)
</span><span style="color: rgba(0, 128, 128, 1)">11</span>   text_cut = <span style="color: rgba(128, 0, 0, 1)">'</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">.join(text_cut)
</span><span style="color: rgba(0, 128, 128, 1)">12</span>
<span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(0, 0, 0, 1)">#过滤一些没有关系的词
</span><span style="color: rgba(0, 128, 128, 1)">14</span>   stop_words = [<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">“</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">,</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span> <span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">我</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">的</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">是</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">了</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">:</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">?</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">!</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">啊</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">你</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">吗</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">。</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">我们</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]
</span><span style="color: rgba(0, 128, 128, 1)">15</span>
<span style="color: rgba(0, 128, 128, 1)">16</span>   background = Image.open(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">EDG.jpg</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">17</span>   graph =<span style="color: rgba(0, 0, 0, 1)"> np.array(background)
</span><span style="color: rgba(0, 128, 128, 1)">18</span>
<span style="color: rgba(0, 128, 128, 1)">19</span>   word_cloud = WordCloud(font_path=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">simsun.ttc</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,
</span><span style="color: rgba(0, 128, 128, 1)">20</span>                        background_color=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">white</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,
</span><span style="color: rgba(0, 128, 128, 1)">21</span>                        mask=<span style="color: rgba(0, 0, 0, 1)">graph, # 指定词云的形状
</span><span style="color: rgba(0, 128, 128, 1)">22</span>                        stopwords=<span style="color: rgba(0, 0, 0, 1)">stop_words)
</span><span style="color: rgba(0, 128, 128, 1)">23</span>
<span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">word_cloud.generate(text_cut)
</span><span style="color: rgba(0, 128, 128, 1)">25</span>   plt.subplots(figsize=(<span style="color: rgba(128, 0, 128, 1)">12</span>,<span style="color: rgba(128, 0, 128, 1)">8</span><span style="color: rgba(0, 0, 0, 1)">))
</span><span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(0, 0, 0, 1)">plt.imshow(word_cloud)
</span><span style="color: rgba(0, 128, 128, 1)">27</span>   plt.axis(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">off</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">28</span> <span style="color: rgba(0, 0, 0, 1)">plt.show()
</span><span style="color: rgba(0, 128, 128, 1)">29</span>   word_cloud.to_file(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">edg.png</span><span style="color: rgba(128, 0, 0, 1)">'</span>)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">控制台输出如下:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113546444-218410341.png"></p>
<p>&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">把<span style="color: rgba(51, 102, 255, 1)">迪迦奥特曼</span>背景图片也制作一波吧,哈哈哈!</span></p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113623230-323421479.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">制作成<span style="color: rgba(51, 102, 255, 1)">迪迦奥特曼</span>词云图形状,如下所示:</span></p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;"><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211111113711956-255118540.png"></span></p>
<p>&nbsp;</p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">当然你也可以使用<span style="color: rgba(51, 102, 255, 1)">pyecharts/echarts</span>制作也行,还可以制作成你喜欢的图片形状。如果你接触过<span style="color: rgba(51, 102, 255, 1)">情感分析</span>的话,也可以用这些弹幕数据分析一波</span></p>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>五、自然语言处理(NLP)</h2>
<h3>5.1 数据导入</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> data = pd.read_csv(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">EDG.csv</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">2</span> data =<span style="color: rgba(0, 0, 0, 1)"> data.head()
</span><span style="color: rgba(0, 128, 128, 1)">3</span> print(data)</pre>
</div>
<p>&nbsp;</p>
<p>控制台输出:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112154408039-810407098.png"></p>
<p id="1636703048347">&nbsp;</p>
<h3>5.2 数据预处理</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> data = pd.read_csv(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">EDG.csv</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">2</span> data = data[[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">id</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">content</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]]
</span><span style="color: rgba(0, 128, 128, 1)">3</span> data = data.head(<span style="color: rgba(128, 0, 128, 1)">10</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">4</span> print(data)</pre>
</div>
<p>&nbsp;</p>
<p>控制台输出:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112155705549-333639840.png"></p>
<p id="1636703826223">&nbsp;</p>
<h3>5.3 情感分析</h3>
<p><span style="font-size: 15px">先安装一下用于情感分析的Python库:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> pip install snownlp -i https:<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)">pypi.doubanio.com/simple</span></pre>
</div>
<p>&nbsp;</p>
<p><span style="font-size: 15px">效果如下:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112161409482-140487807.png"></p>
<p id="1636704849466">&nbsp;</p>
<p><span style="font-size: 15px">情感分析</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> <span style="color: rgba(0, 0, 255, 1)">from</span><span style="color: rgba(0, 0, 0, 1)"> snownlp import SnowNLP
</span><span style="color: rgba(0, 128, 128, 1)">2</span> data1[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">emotion</span><span style="color: rgba(128, 0, 0, 1)">'</span>] = data1[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">content</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">].apply(lambda x:SnowNLP(x).sentiments)
</span><span style="color: rgba(0, 128, 128, 1)">3</span> data1 =<span style="color: rgba(0, 0, 0, 1)"> data1.head()
</span><span style="color: rgba(0, 128, 128, 1)">4</span> print(data1)</pre>
</div>
<p>&nbsp;</p>
<p>控制台输出:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112161552379-1610844797.png"></p>
<p id="1636704952164">&nbsp;</p>
<p><span style="font-size: 15px">情感数据描述</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> data1 = data1.describe()</pre>
</div>
<p>&nbsp;</p>
<p>控制台输出:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112164035607-977196132.png"></p>
<p><span style="color: rgba(51, 102, 255, 1)"><strong>数据说明</strong></span>:<span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">emotion的平均值为0.63,中位数为0.67,25%分位数为0.49,可见不到25%的数据造成了整体均值的较大下移。另外上图的最下面可以看到,情感分析执行时间为48.8s,数据量还是挺大的。</span></p>
<p>&nbsp;</p>
<h3>5.4 情感分析直方图</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> plt.rcParams[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">font.sans-serif</span><span style="color: rgba(128, 0, 0, 1)">'</span>] = [<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">SimHei</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> plt.rcParams[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">axes.unicode_minus</span><span style="color: rgba(128, 0, 0, 1)">'</span>] =<span style="color: rgba(0, 0, 0, 1)"> False
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>
<span style="color: rgba(0, 128, 128, 1)"> 4</span> bins = np.arange(<span style="color: rgba(128, 0, 128, 1)">0</span>,<span style="color: rgba(128, 0, 128, 1)">1.1</span>,<span style="color: rgba(128, 0, 128, 1)">0.1</span><span style="color: rgba(0, 0, 0, 1)">)    #设置区间
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> plt.hist(data1[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">emotion</span><span style="color: rgba(128, 0, 0, 1)">'</span>],bins,color=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">#4F94CD</span><span style="color: rgba(128, 0, 0, 1)">'</span>,alpha=<span style="color: rgba(128, 0, 128, 1)">0.9</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> plt.xlim(<span style="color: rgba(128, 0, 128, 1)">0</span>,<span style="color: rgba(128, 0, 128, 1)">1</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> plt.xlabel(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分析</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span> plt.ylabel(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">数量</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> plt.title(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分析直方图</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">10</span> plt.show()</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-size: 15px">控制台输出:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112171831788-970059785.png"></p>
<p><span style="font-size: 15px">数据说明:</span></p>
<ul>
<li><span style="font-size: 15px">由直方图可见,弹幕情感呈逐渐上升的趋势,说明粉丝对EDG夺冠情绪逐渐兴奋,很激动;</span></li>
<li><span style="font-size: 15px">弹幕数据中有约4500条弹幕情感分在区间内,这个区间粉丝的情绪最亢奋,估计是夺冠时刻,哈哈哈!</span></li>
<li><span style="font-size: 15px">从区间过渡到以及从区间过渡到弹幕情绪出现下降,可能是因为在比赛中出现一些问题或者是比赛落幕了</span></li>
</ul>
<p>&nbsp;</p>
<h3>5.5 关键词提取</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> <span style="color: rgba(0, 0, 255, 1)">from</span><span style="color: rgba(0, 0, 0, 1)"> jieba import analyse
</span><span style="color: rgba(0, 128, 128, 1)">2</span> key_words = analyse.extract_tags(sentence=text_cut,topK=<span style="color: rgba(128, 0, 128, 1)">10</span>,withWeight=True,allowPOS=<span style="color: rgba(0, 0, 0, 1)">())
</span><span style="color: rgba(0, 128, 128, 1)">3</span> print(key_words)</pre>
</div>
<p>&nbsp;</p>
<p>控制台输出:</p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112180601884-1100683097.png"></p>
<p><span style="font-size: 15px">数据说明:</span></p>
<ul>
<li><span style="font-size: 15px">以上关键词显示,粉丝发的弹幕中“冠军”是最多的,然后是“翻译”,”我们”,“卧槽”,“小姐姐”,“EDG”,“泪目“,”圣枪哥“,”贺电“,”edg“,由此看来,EDG真的很受欢迎,翻译小姐姐也挺受欢迎的。这在上面的词云图中也可以看得出来</span></li>
</ul>
<p>&nbsp;</p>
<p><span style="font-size: 15px">参数说明:</span></p>
<ul>
<li><span style="font-size: 15px">sentence是需要提取的字符串,必须是str类型,不能是list</span></li>
<li><span style="font-size: 15px">topK表示提取前多少个关键字</span></li>
<li><span style="font-size: 15px">withWeight表示是否返回每个关键词的权重</span></li>
<li><span style="font-size: 15px">allowPOS表示允许提取的词性,默认提取地名(ns)、名词(n)、动名词(vn)、动词(v)</span></li>
</ul>
<p>&nbsp;</p>
<h3>5.6 积极弹幕与消极弹幕</h3>
<p><span style="font-size: 15px">计算积极弹幕与消极弹幕各自的数目:</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> pos,neg = <span style="color: rgba(128, 0, 128, 1)">0</span>,<span style="color: rgba(128, 0, 128, 1)">0</span>
<span style="color: rgba(0, 128, 128, 1)">2</span> <span style="color: rgba(0, 0, 255, 1)">for</span>i <span style="color: rgba(0, 0, 255, 1)">in</span> data1[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">emotion</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]:
</span><span style="color: rgba(0, 128, 128, 1)">3</span>   <span style="color: rgba(0, 0, 255, 1)">if</span> i &gt;= <span style="color: rgba(128, 0, 128, 1)">0.5</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">4</span>         pos += <span style="color: rgba(128, 0, 128, 1)">1</span>
<span style="color: rgba(0, 128, 128, 1)">5</span>   <span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">6</span>         neg += <span style="color: rgba(128, 0, 128, 1)">1</span>
<span style="color: rgba(0, 128, 128, 1)">7</span> print(f<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">积极弹幕数据为:{pos}</span><span style="color: rgba(128, 0, 0, 1)">'</span> + <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span> + f<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">消极弹幕数据为:{neg}</span><span style="color: rgba(128, 0, 0, 1)">'</span>)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-size: 15px">控制台输出:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112182727112-876717277.png"></p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">积极弹幕数据为:17941</span><br><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">消极弹幕数据为:6054</span></p>
<p>&nbsp;</p>
<h3>5.7 饼图分析</h3>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> import matplotlib.pyplot <span style="color: rgba(0, 0, 255, 1)">as</span><span style="color: rgba(0, 0, 0, 1)"> plt
</span><span style="color: rgba(0, 128, 128, 1)">2</span>
<span style="color: rgba(0, 128, 128, 1)">3</span> plt.rcParams[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">font.sans-serif</span><span style="color: rgba(128, 0, 0, 1)">'</span>] = [<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">SimHei</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">]
</span><span style="color: rgba(0, 128, 128, 1)">4</span> plt.rcParams[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">axes.unicode_minus</span><span style="color: rgba(128, 0, 0, 1)">'</span>] =<span style="color: rgba(0, 0, 0, 1)"> False
</span><span style="color: rgba(0, 128, 128, 1)">5</span>
<span style="color: rgba(0, 128, 128, 1)">6</span> pie_labels = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">positive</span><span style="color: rgba(128, 0, 0, 1)">'</span>,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">negative</span><span style="color: rgba(128, 0, 0, 1)">'</span>
<span style="color: rgba(0, 128, 128, 1)">7</span> plt.pie(,labels=pie_labels,autopct=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">%1.2f%%</span><span style="color: rgba(128, 0, 0, 1)">'</span>,shadow=<span style="color: rgba(0, 0, 0, 1)">True)
</span><span style="color: rgba(0, 128, 128, 1)">8</span>
<span style="color: rgba(0, 128, 128, 1)">9</span> plt.show()</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">控制台输出:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112184157520-2934732.png"></p>
<p id="1636713717407"><span style="font-size: 15px">由上图可见,由74.77%的弹幕数据是积极的,有25.23%的弹幕数据是消极的,总体来看,积极弹幕还是比较多的</span></p>
<p>&nbsp;</p>
<h3>5.8 消极弹幕分析</h3>
<p><span style="font-size: 15px">取出部分消极弹幕数据</span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)">1</span> data2 = data1 &lt; <span style="color: rgba(128, 0, 128, 1)">0.5</span><span style="color: rgba(0, 0, 0, 1)">]
</span><span style="color: rgba(0, 128, 128, 1)">2</span> data2 =<span style="color: rgba(0, 0, 0, 1)"> data2.head()
</span><span style="color: rgba(0, 128, 128, 1)">3</span> print(data2)</pre>
</div>
<p>&nbsp;</p>
<p><span style="font-size: 15px">控制台输出:</span></p>
<p><img src="https://img2020.cnblogs.com/blog/2225056/202111/2225056-20211112190826957-1907847734.png"></p>
<p id="1636715307257">&nbsp;</p>
<p><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">数据说明:</span></p>
<ul>
<li><span style="font-size: 15px; font-family: &quot;Microsoft YaHei&quot;">图中的“回血”,“求生欲”等消极弹幕有可能是EDG战队或者韩国队比赛不佳造成的</span></li>
</ul>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>六、<strong>总结</strong></h2>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">PIL库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">jieba库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">numpy库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">pandas库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">requests库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">wordcloud库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">matplotlib库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">json,re,chardet库</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">snownlp情感分析库</span></p>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>七、完整项目及源码下载</h2>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">完整项目(包括源码)获取方式:下载<br></span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">也可添加本人微信号获取完整项目:<span style="color: rgba(51, 102, 255, 1)">MakerChen66</span></span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">本人原创公众号原文链接:阅读原文</span></p>
<p><span style="font-family: &quot;Microsoft YaHei&quot;; font-size: 15px">本人CSDN博客链接:阅读原文</span></p>
<p>&nbsp;</p>
<blockquote>
<p><span style="font-size: 16px">原创不易,如果觉得有趣好玩,希望可以随手点个赞,拜谢各位老铁!</span></p>
<p><span style="font-size: 16px">最近发现CSDN上好多人抄袭本人博客,还比我热度高,哎!毕竟是10月份刚刚创建的账号,知名度和粉丝没有别人高啊!</span></p>











</blockquote>
<p>&nbsp;</p>
<hr>
<p>&nbsp;</p>
<h2>八、作者Info</h2>
<blockquote>
<p><span style="font-size: 16px">作者:南柯树下,Goal:让编程更有趣!</span><br><br><span style="font-size: 16px">原创微信公众号:『<strong>小鸿星空科技</strong>』,专注于算法、爬虫,网站,游戏开发,数据分析、自然语言处理,AI等,期待你的关注,让我们一起成长、一起Coding!</span><br><br><span style="font-size: 16px">版权声明:本文禁止抄袭、转载 ,侵权必究!</span></p>
</blockquote>
<p>&nbsp;</p>
<hr>
<p>&nbsp;<span style="font-size: 16px"><strong>更多<span style="color: rgba(255, 0, 0, 1)">独家</span>精彩内容&nbsp;</strong>&nbsp;<strong>请<span style="color: rgba(255, 0, 0, 1)">扫码关注</span>个人公众号,<strong>我们</strong><strong>一起成长,一起Coding,让编程更有趣!</strong></strong></span></p>
<hr>
<p>&nbsp;</p>
<p>——&nbsp; ——&nbsp; ——&nbsp; ——&nbsp; —&nbsp; END&nbsp; ——&nbsp; ——&nbsp; ——&nbsp; ——&nbsp; ————&nbsp;</p>
<p>       &nbsp;&nbsp;欢迎扫码关注我的公众号</p>
<p>         <strong> 小鸿星空科技</strong></p>
<p>  &nbsp; &nbsp; &nbsp;<img src="https://img2020.cnblogs.com/blog/2225056/202011/2225056-20201125000206527-414454055.png"></p>
<p>&nbsp;</p><br><br>
来源:https://www.cnblogs.com/makerchen/p/15539183.html
頁: [1]
查看完整版本: EDG夺冠!用Python分析22.3万条数据:粉丝都疯了!