无限流 發表於 2020-5-11 11:10:00

基于情感词典的python情感分析

<p>近期老师给我们安排了一个大作业,要求根据情感词典对微博语料进行情感分析。于是在网上狂找资料,看相关书籍,终于搞出了这个任务。现在做做笔记,总结一下本次的任务,同时也给遇到有同样需求的人,提供一点帮助。</p>
<h1>1、情感分析含义</h1>
<p>情感分析指的是对新闻报道、商品评论、电影影评等文本信息进行观点提取、主题分析、情感挖掘。情感分析常用于对某一篇新闻报道积极消极分析、淘宝商品评论情感打分、股评情感分析、电影评论情感挖掘。情感分析的内容包括:情感的持有者分析、态度持有者分析、态度类型分析(一系列类型如喜欢(like),讨厌(hate),珍视(value),渴望(desire)等;或着简单的加权极性如积极(positive),消极(negative)和中性(neutral)并可用具体的权重修饰)、态度的范围分析(包含每句话,某一段、或者全文)。因此,情感分析的目的可以分为:<em><strong>初级</strong></em>:文章的整体感情是积极/消极的;<em><strong>进阶</strong></em>:对文章的态度从1-5打分;<em><strong>高级</strong></em>:检测态度的目标,持有者和类型。</p>
<p>总的来说,<strong>情感分析就是对文本信息进行情感倾向挖掘</strong>。</p>
<h1>2、情感挖掘方法</h1>
<p>情感挖掘目前主要使用的方法是使用情感词典,对文本进行情感词匹配,汇总情感词进行评分,最后得到文本的情感倾向。本次我主要使用了两种方法进行情感分析。第一种:基于BosonNLP情感词典。该情感词典是由波森自然语言处理公司推出的一款已经做好标注的情感词典。词典中对每个情感词进行情感值评分,bosanNLP情感词典如下图所示:</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511092647302-1358816982.png"></p>
<p>&nbsp;</p>
<p>第二种,采用的是知网推出的情感词典,以及极性表进行情感分析。知网提供的情感词典共用12个文件,分为英文和中文。其中中文情感词典包括:评价、情感、主张、程度(正面、负面)的情感文本。本文将评价和情感词整合作为情感词典使用,程度词表中含有的程度词,按照等级区分,分为:most(最高)-very(很、非常)-more(更多、更)-ish(稍、一点点)-insufficiently(欠、不)-over(过多、多分、多)六个情感程度词典。</p>
<p>&nbsp;知网情感词典下载地址:-&nbsp;http://www.keenage.com/html/c_bulletin_2007.htm</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511094444328-1419039549.png"></p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511094618965-537399517.png"></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;<img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511094712865-318393687.png"></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h1>&nbsp;3、原理介绍</h1>
<h2>3.1 基于BosonNLP情感分析原理</h2>
<p>基于BosonNLP情感词典的情感分析较为简单。首先,需要对文本进行分句、分词,本文选择的分词工具为哈工大的pyltp。其次,将分词好的列表数据对应BosonNLp词典进行逐个匹配,并记录匹配到的情感词分值。最后,统计计算分值总和,如果分值大于0,表示情感倾向为积极的;如果小于0,则表示情感倾向为消极的。原理框图如下:</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511095305941-1868946211.png"></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h2>3.2 基于BosonNLP情感分析代码:</h2>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> -*- coding:utf-8 -*-</span>
<span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> pandas as pd
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> jieba

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">基于波森情感词典计算情感值</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> getscore(text):
    df </span>= pd.read_table(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">BosonNLP_dict\BosonNLP_sentiment_score.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span>, sep=<span style="color: rgba(128, 0, 0, 1)">"</span> <span style="color: rgba(128, 0, 0, 1)">"</span>, names=[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">key</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">score</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">])
    key </span>= df[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">key</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">].values.tolist()
    score </span>= df[<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">score</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">].values.tolist()
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> jieba分词</span>
    segs = jieba.lcut(text,cut_all = False) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">返回list</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 计算得分</span>
    score_list = <span style="color: rgba(0, 0, 255, 1)">for</span> x <span style="color: rgba(0, 0, 255, 1)">in</span> segs <span style="color: rgba(0, 0, 255, 1)">if</span>(x <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> key)]
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> sum(score_list)

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">读取文件</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> read_txt(filename):
    with open(filename,</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">r</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)as f:
      txt </span>=<span style="color: rgba(0, 0, 0, 1)"> f.read()
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> txt
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入文件</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> write_data(filename,data):
    with open(filename,</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)as f:
      f.write(data)


</span><span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span>==<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
    text </span>= read_txt(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">test_data\微博.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    lists</span>= text.split(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)

    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> al_senti = ['无','积极','消极','消极','中性','消极','积极','消极','积极','积极','积极',</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">             '无','积极','积极','中性','积极','消极','积极','消极','积极','消极','积极',</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">             '无','中性','消极','中性','消极','积极','消极','消极','消极','消极','积极'</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">             ]</span>
    al_senti = read_txt(r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">test_data\人工情感标注.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>).split(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    i </span>=<span style="color: rgba(0, 0, 0, 1)"> 0
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> list <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> lists:
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> list!= <span style="color: rgba(128, 0, 0, 1)">''</span><span style="color: rgba(0, 0, 0, 1)">:
            </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print(list)</span>
            sentiments = round(getscore(list),2<span style="color: rgba(0, 0, 0, 1)">)
            </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">情感值为正数,表示积极;为负数表示消极</span>
            <span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(list)
            </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">情感值:</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">,sentiments)
            </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">人工标注情感倾向:</span><span style="color: rgba(128, 0, 0, 1)">'</span>+<span style="color: rgba(0, 0, 0, 1)">al_senti)
            </span><span style="color: rgba(0, 0, 255, 1)">if</span> sentiments &gt;<span style="color: rgba(0, 0, 0, 1)"> 0:
                </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">机器标注情感倾向:积极\n</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
                s </span>= <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">机器判断情感倾向:积极\n</span><span style="color: rgba(128, 0, 0, 1)">"</span>
            <span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
                </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">机器标注情感倾向:消极\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
                s </span>= <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">机器判断情感倾向:消极</span><span style="color: rgba(128, 0, 0, 1)">"</span>+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
            sentiment </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感值:</span><span style="color: rgba(128, 0, 0, 1)">'</span>+str(sentiments)+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
            al_sentiment</span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">人工标注情感倾向:</span><span style="color: rgba(128, 0, 0, 1)">'</span>+al_senti+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>
            <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">文件写入</span>
            filename = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">result_data\BosonNLP情感分析结果.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
            write_data(filename,</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分析文本:</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
            write_data(filename,list</span>+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入待处理文本</span>
            write_data(filename,sentiment) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入情感值</span>
            write_data(filename,al_sentiment) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入机器判断情感倾向</span>
            write_data(filename,s+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入人工标注情感</span>
            i = i+1</pre>
</div>
<p>相关文件:</p>
<p>BosonNLp情感词典下载地址:https://bosonnlp.com/dev/resource</p>
<p>微博语料:<em>链接:https://pan.baidu.com/s/1Pskzw7bg9qTnXD_QKF-4sg&nbsp; &nbsp;提取码:15bu</em></p>
<p>输出结果:</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511104420782-1785778392.png"></p>
<p>&nbsp;</p>
<h2>&nbsp;3.3 基于知网情感词典的情感挖掘原理</h2>
<p>基于知网情感词典的情感分析原理分为以下几步:</p>
<p>1、首先,需要对文本分句,分句,得到分词分句后的文本语料,并将结果与哈工大的停用词表比对,去除停用词;</p>
<p>2、其次,对每一句话进行情感分析,分析的方法主要为:判断这段话中的情感词数目,含有积极词,则积极词数目加1,含有消极词,则消极词数目加1。并且再统计的过程中还需要判断该情感词前面是否存在程度副词,如果存在,则需要根据程度副词的种类赋予不同的权重,乘以情感词数。如果句尾存在?!等符号,则情感词数目增加一定值,因为!与?这类的标点往往表示情感情绪的加强,因此需要进行一定处理。</p>
<p>3、接着统计计算整段话的情感值(积极词值-消极词值),得到该段文本的情感倾向。</p>
<p>4、最后,统计每一段的情感值,相加得到文章的情感值。</p>
<p>整体流程框图如下:</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511105452038-599476551.png"></p>
<p>&nbsp;</p>
<h2>3.4 基于知网情感词典的情感分析代码:</h2>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> pyltp
</span><span style="color: rgba(0, 0, 255, 1)">from</span> pyltp <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Segmentor
</span><span style="color: rgba(0, 0, 255, 1)">from</span> pyltp <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> SentenceSplitter
</span><span style="color: rgba(0, 0, 255, 1)">from</span> pyltp <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> Postagger
</span><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> numpy as np

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">读取文件,文件读取函数</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> read_file(filename):
    withopen(filename, </span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">r</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)as f:
      text </span>=<span style="color: rgba(0, 0, 0, 1)"> f.read()
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">返回list类型数据</span>
      text = text.split(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> text

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">将数据写入文件中</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> write_data(filename,data):
    with open(filename,</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>,encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">utf-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)as f:
      f.write(str(data))

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">文本分句</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> cut_sentence(text):
    sentences </span>=<span style="color: rgba(0, 0, 0, 1)"> SentenceSplitter.split(text)
    sentence_list </span>= [ w <span style="color: rgba(0, 0, 255, 1)">for</span> w <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> sentences]
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> sentence_list

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">文本分词</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> tokenize(sentence):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">加载模型</span>
    segmentor = Segmentor()<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 初始化实例</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 加载模型</span>
    segmentor.load(r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">E:\tool\python\Lib\site-packages\pyltp-0.2.1.dist-info\ltp_data\cws.model</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 产生分词,segment分词函数</span>
    words =<span style="color: rgba(0, 0, 0, 1)"> segmentor.segment(sentence)
    words </span>=<span style="color: rgba(0, 0, 0, 1)"> list(words)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 释放模型</span>
<span style="color: rgba(0, 0, 0, 1)">    segmentor.release()
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> words

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">词性标注</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> postagger(words):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 初始化实例</span>
    postagger =<span style="color: rgba(0, 0, 0, 1)"> Postagger()
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 加载模型</span>
    postagger.load(r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">E:\tool\python\Lib\site-packages\pyltp-0.2.1.dist-info\ltp_data\pos.model</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 词性标注</span>
    postags =<span style="color: rgba(0, 0, 0, 1)"> postagger.postag(words)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 释放模型</span>
<span style="color: rgba(0, 0, 0, 1)">    postagger.release()
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">返回list</span>
    postags =
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> postags

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 分词,词性标注,词和词性构成一个元组</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> intergrad_word(words,postags):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">拉链算法,两两匹配</span>
    pos_list =<span style="color: rgba(0, 0, 0, 1)"> zip(words,postags)
    pos_list </span>= [ w <span style="color: rgba(0, 0, 255, 1)">for</span> w <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> pos_list]
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> pos_list

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">去停用词函数</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> del_stopwords(words):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 读取停用词表</span>
    stopwords = read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">test_data\stopwords.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 去除停用词后的句子</span>
    new_words =<span style="color: rgba(0, 0, 0, 1)"> []
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> words:
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> word <span style="color: rgba(0, 0, 255, 1)">not</span> <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> stopwords:
            new_words.append(word)
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> new_words

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 获取六种权值的词,根据要求返回list,这个函数是为了配合Django的views下的函数使用</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> weighted_value(request):
    result_dict </span>=<span style="color: rgba(0, 0, 0, 1)"> []
    </span><span style="color: rgba(0, 0, 255, 1)">if</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">one</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\most.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">two</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\very.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">three</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\more.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">four</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\ish.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">five</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\insufficiently.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">six</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\degree_dict\inverse.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">posdict</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\emotion_dict\pos_all_dict.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">elif</span> request == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">negdict</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
      result_dict </span>= read_file(r<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">E:\学习笔记\NLP学习\NLP code\情感分析3\emotion_dict\neg_all_dict.txt</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
      </span><span style="color: rgba(0, 0, 255, 1)">pass</span>
    <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> result_dict

</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">reading sentiment dict .......</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">读取情感词典</span>
posdict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">posdict</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
negdict </span>= weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">negdict</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 读取程度副词词典</span><span style="color: rgba(0, 128, 0, 1)">
#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为2</span>
mostdict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">one</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为1.75</span>
verydict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">two</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为1.50</span>
moredict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">three</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为1.25</span>
ishdict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">four</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为0.25</span>
insufficientdict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">five</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 权值为-1</span>
inversedict = weighted_value(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">six</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">程度副词处理,对不同的程度副词给予不同的权重</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> match_adverb(word,sentiment_value):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">最高级权重为</span>
    <span style="color: rgba(0, 0, 255, 1)">if</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> mostdict:
      sentiment_value </span>*= 8
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">比较级权重</span>
    <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> verydict:
      sentiment_value </span>*= 6
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">比较级权重</span>
    <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> moredict:
      sentiment_value </span>*= 4
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">轻微程度词权重</span>
    <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> ishdict:
      sentiment_value </span>*= 2
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">相对程度词权重</span>
    <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> insufficientdict:
      sentiment_value </span>*= 0.5
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">否定词权重</span>
    <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> inversedict:
      sentiment_value </span>*= -1
    <span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
      sentiment_value </span>*= 1
    <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> sentiment_value

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">对每一条微博打分</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> single_sentiment_score(text_sent):
    sentiment_scores </span>=<span style="color: rgba(0, 0, 0, 1)"> []
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">对单条微博分句</span>
    sentences =<span style="color: rgba(0, 0, 0, 1)"> cut_sentence(text_sent)
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> sent <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> sentences:
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">查看分句结果</span>
      <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print('分句:',sent)</span>
      <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">分词</span>
      words =<span style="color: rgba(0, 0, 0, 1)"> tokenize(sent)
      seg_words </span>=<span style="color: rgba(0, 0, 0, 1)"> del_stopwords(words)
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">i,s 记录情感词和程度词出现的位置</span>
      i = 0   <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">记录扫描到的词位子</span>
      s = 0   <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">记录情感词的位置</span>
      poscount = 0 <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">记录积极情感词数目</span>
      negcount = 0 <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">记录消极情感词数目</span>
      <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">逐个查找情感词</span>
      <span style="color: rgba(0, 0, 255, 1)">for</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> seg_words:
            </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">如果为积极词</span>
            <span style="color: rgba(0, 0, 255, 1)">if</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> posdict:
                poscount </span>+= 1<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">情感词数目加1</span>
            <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">在情感词前面寻找程度副词</span>
                <span style="color: rgba(0, 0, 255, 1)">for</span> w <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> seg_words:
                  poscount </span>=<span style="color: rgba(0, 0, 0, 1)"> match_adverb(w,poscount)
                s </span>= i+1 <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">记录情感词位置</span>
            <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 如果是消极情感词</span>
            <span style="color: rgba(0, 0, 255, 1)">elif</span> word <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> negdict:
                negcount </span>+=1
                <span style="color: rgba(0, 0, 255, 1)">for</span> w <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> seg_words:
                  negcount </span>=<span style="color: rgba(0, 0, 0, 1)"> match_adverb(w,negcount)
                s </span>= i+1
            <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">如果结尾为感叹号或者问号,表示句子结束,并且倒序查找感叹号前的情感词,权重+4</span>
            <span style="color: rgba(0, 0, 255, 1)">elif</span> word ==<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">!</span><span style="color: rgba(128, 0, 0, 1)">'</span> <span style="color: rgba(0, 0, 255, 1)">or</span>word ==<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">!</span><span style="color: rgba(128, 0, 0, 1)">'</span> <span style="color: rgba(0, 0, 255, 1)">or</span> word ==<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">?</span><span style="color: rgba(128, 0, 0, 1)">'</span> <span style="color: rgba(0, 0, 255, 1)">or</span> word == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">?</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
                </span><span style="color: rgba(0, 0, 255, 1)">for</span> w2 <span style="color: rgba(0, 0, 255, 1)">in</span> seg_words[::-1<span style="color: rgba(0, 0, 0, 1)">]:
                  </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">如果为积极词,poscount+2</span>
                  <span style="color: rgba(0, 0, 255, 1)">if</span> w2 <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> posdict:
                        poscount </span>+= 4
                        <span style="color: rgba(0, 0, 255, 1)">break</span>
                  <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">如果是消极词,negcount+2</span>
                  <span style="color: rgba(0, 0, 255, 1)">elif</span> w2 <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> negdict:
                        negcount </span>+= 4
                        <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">
            i </span>+= 1 <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">定位情感词的位置</span>
      <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">计算情感值</span>
      sentiment_score = poscount -<span style="color: rgba(0, 0, 0, 1)"> negcount
      sentiment_scores.append(sentiment_score)
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">查看每一句的情感值</span>
      <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> print('分句分值:',sentiment_score)</span>
    sentiment_sum =<span style="color: rgba(0, 0, 0, 1)"> 0
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> s <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> sentiment_scores:
      </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">计算出一条微博的总得分</span>
      sentiment_sum +=<span style="color: rgba(0, 0, 0, 1)">s
    </span><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> sentiment_sum

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 分析test_data.txt 中的所有微博,返回一个列表,列表中元素为(分值,微博)元组</span>
<span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> run_score(contents):
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 待处理数据</span>
    scores_list =<span style="color: rgba(0, 0, 0, 1)"> []
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> content <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> contents:
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> content !=<span style="color: rgba(128, 0, 0, 1)">''</span><span style="color: rgba(0, 0, 0, 1)">:
            score </span>= single_sentiment_score(content)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 对每条微博调用函数求得打分</span>
            scores_list.append((score, content)) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 形成(分数,微博)元组</span>
    <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> scores_list

</span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">主程序</span>
<span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:
    </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">Processing........</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">测试</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> sentence = '要怎么说呢! 我需要的恋爱不是现在的样子, 希望是能互相鼓励的勉励, 你现在的样子让我觉得很困惑。 你到底能不能陪我一直走下去, 你是否有决心?'</span>
    <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> sentence = '转有用吗?这个事本来就是要全社会共同努力的,公交公司有没有培训到位?公交车上地铁站内有没有放足够的宣传标语?我现在转一下微博,没有多大的意义。'</span>
    sentences = read_file(r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">test_data\微博.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    scores </span>=<span style="color: rgba(0, 0, 0, 1)"> run_score(sentences)
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">人工标注情感词典</span>
    man_sentiment = read_file(r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">test_data\人工情感标注.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
    al_sentiment </span>=<span style="color: rgba(0, 0, 0, 1)"> []
    </span><span style="color: rgba(0, 0, 255, 1)">for</span> score <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> scores:
      </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分值:</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,score)
      </span><span style="color: rgba(0, 0, 255, 1)">if</span> score &lt;<span style="color: rgba(0, 0, 0, 1)"> 0:
            </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感倾向:消极</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
            s </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">消极</span><span style="color: rgba(128, 0, 0, 1)">'</span>
      <span style="color: rgba(0, 0, 255, 1)">elif</span> score ==<span style="color: rgba(0, 0, 0, 1)"> 0:
            </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感倾向:中性</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
            s </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">中性</span><span style="color: rgba(128, 0, 0, 1)">'</span>
      <span style="color: rgba(0, 0, 255, 1)">else</span><span style="color: rgba(0, 0, 0, 1)">:
            </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感倾向:积极</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
            s </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">积极</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
      al_sentiment.append(s)
      </span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分析文本:</span><span style="color: rgba(128, 0, 0, 1)">'</span>,score)
    i </span>=<span style="color: rgba(0, 0, 0, 1)"> 0
    </span><span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入文件中</span>
    filename = r<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">result_data\result_data.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>
    <span style="color: rgba(0, 0, 255, 1)">for</span> score <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> scores:
      write_data(filename, </span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分析文本:{}</span><span style="color: rgba(128, 0, 0, 1)">'</span>.format(str(score))+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入情感分析文本</span>
      write_data(filename,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">情感分值:{}</span><span style="color: rgba(128, 0, 0, 1)">'</span>.format(str(score))+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入情感分值</span>
      write_data(filename,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">人工标注情感:{}</span><span style="color: rgba(128, 0, 0, 1)">'</span>.format(str(man_sentiment))+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入人工标注情感</span>
      write_data(filename, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">机器情感标注:{}</span><span style="color: rgba(128, 0, 0, 1)">'</span>.format(str(al_sentiment))+<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">写入机器情感标注</span>
      write_data(filename,<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
      i </span>+=1
    <span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">succeed.......</span><span style="color: rgba(128, 0, 0, 1)">'</span>)</pre>
</div>
<p>&nbsp;</p>
<p>&nbsp;输出结果:</p>
<p><img src="https://img2020.cnblogs.com/blog/1951785/202005/1951785-20200511105647413-507269323.png"></p>
<p>&nbsp;</p>
<h1>4、小结</h1>
<p>&nbsp;本次的情感分析程序完成简单的情感倾向判断,准确率上基于BosonNLP的情感分析较低,其情感分析准确率为:56.67%;而基于知网情感词典的情感分析准确率达到90%,效果上还是不错的。但是,这两个程序都还只是情感分析简单使用,并未涉及到更深奥的算法,如果想要更加精确,或者再更大样本中获得更高精度,这两个情感分析模型还是不够的。但是用来练习学习还是不错的选择。要想更深入的理解和弄得情感分析,还需要继续学习。</p>
<p>&nbsp;</p>
<p>&nbsp;急需模型和情感词典资料的可以转到这篇博文下载资源:https://blog.csdn.net/maxMikexu</p>
<p>不急的留下邮箱,博主有空的时候会给你发送模型和数据,可能需要等1-3天。</p>

</div>
<div id="MySignature" role="contentinfo">
    学习、思考、阅读、旅游、爱人—— 一生所求<br><br>
来源:https://www.cnblogs.com/maxxu11/p/12867850.html
頁: [1]
查看完整版本: 基于情感词典的python情感分析