Python爬虫-抓取网页数据并解析,写入本地文件
<p><strong><span style="font-size: 18px"> 之前没学过Python,最近因一些个人需求,需要写个小爬虫,于是就搜罗了一批资料,看了一些别人写的代码,现在记录一下学习时爬过的坑。</span></strong></p><p><strong><span style="font-size: 18px"> 如果您是从没有接触过Python的新手,又想迅速用Python写出一个爬虫,那么这篇文章比较适合你。</span></strong></p>
<p><strong><span style="font-size: 18px"> 首先,我通过:</span></strong></p>
<p><strong><span style="font-size: 18px"> https://mp.weixin.qq.com/s/ET9HP2n3905PxBy4ZLmZNw</span></strong></p>
<p><strong><span style="font-size: 18px">找到了一份参考资料,它实现的功能是:爬取当当网Top 500本五星好评书籍</span></strong></p>
<p><strong><span style="font-size: 18px"> 源代码可以在Github上找到:</span></strong></p>
<p><strong><span style="font-size: 18px"> https://github.com/wistbean/learn_python3_spider/blob/master/dangdang_top_500.py</span></strong></p>
<p><strong><span style="font-size: 18px">然而,当我运行这段代码时,发现CPU几乎满负荷运行了,却<strong>根本没有输出。</strong></span></strong></p>
<p><strong><span style="font-size: 18px"><strong>现在我们来分析一下其源代码,并将之修复。</strong></span></strong></p>
<p><span style="font-size: large"><strong> 先给出有问题的源码:</strong></span></p>
<p> </p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> requests
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> re
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> json
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span>
<span style="color: rgba(0, 128, 128, 1)"> 6</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> request_dandan(url):
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span> response =<span style="color: rgba(0, 0, 0, 1)"> requests.get(url)
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 255, 1)">if</span> response.status_code == 200<span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">10</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> response.text
</span><span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 255, 1)">except</span><span style="color: rgba(0, 0, 0, 1)"> requests.RequestException:
</span><span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> None
</span><span style="color: rgba(0, 128, 128, 1)">13</span>
<span style="color: rgba(0, 128, 128, 1)">14</span>
<span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> parse_result(html):
</span><span style="color: rgba(0, 128, 128, 1)">16</span> pattern =<span style="color: rgba(0, 0, 0, 1)"> re.compile(
</span><span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><li>.*?list_num.*?(\d+).</div>.*?<img src="(.*?)".*?class="name".*?title="(.*?)">.*?class="star">.*?class="tuijian">(.*?)</span>.*?class="publisher_info">.*?target="_blank">(.*?)</a>.*?class="biaosheng">.*?<span>(.*?)</span></div>.*?<p><span\sclass="price_n">¥(.*?)</span>.*?</li></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,
</span><span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 0, 1)"> re.S)
</span><span style="color: rgba(0, 128, 128, 1)">19</span> items =<span style="color: rgba(0, 0, 0, 1)"> re.findall(pattern, html)
</span><span style="color: rgba(0, 128, 128, 1)">20</span>
<span style="color: rgba(0, 128, 128, 1)">21</span> <span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> items:
</span><span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 0, 255, 1)">yield</span><span style="color: rgba(0, 0, 0, 1)"> {
</span><span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">range</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">: item,
</span><span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">iamge</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item,
</span><span style="color: rgba(0, 128, 128, 1)">25</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">title</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item,
</span><span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">recommend</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item,
</span><span style="color: rgba(0, 128, 128, 1)">27</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">author</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item,
</span><span style="color: rgba(0, 128, 128, 1)">28</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">times</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item,
</span><span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">price</span><span style="color: rgba(128, 0, 0, 1)">'</span>: item
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 0, 1)"> }
</span><span style="color: rgba(0, 128, 128, 1)">31</span>
<span style="color: rgba(0, 128, 128, 1)">32</span>
<span style="color: rgba(0, 128, 128, 1)">33</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> write_item_to_file(item):
</span><span style="color: rgba(0, 128, 128, 1)">34</span> <span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">开始写入数据 ====> </span><span style="color: rgba(128, 0, 0, 1)">'</span> +<span style="color: rgba(0, 0, 0, 1)"> str(item))
</span><span style="color: rgba(0, 128, 128, 1)">35</span> with open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">book.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>, encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">UTF-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) as f:
</span><span style="color: rgba(0, 128, 128, 1)">36</span> f.write(json.dumps(item, ensure_ascii=False) + <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">37</span>
<span style="color: rgba(0, 128, 128, 1)">38</span>
<span style="color: rgba(0, 128, 128, 1)">39</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> main(page):
</span><span style="color: rgba(0, 128, 128, 1)">40</span> url = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-</span><span style="color: rgba(128, 0, 0, 1)">'</span> +<span style="color: rgba(0, 0, 0, 1)"> str(page)
</span><span style="color: rgba(0, 128, 128, 1)">41</span> html =<span style="color: rgba(0, 0, 0, 1)"> request_dandan(url)
</span><span style="color: rgba(0, 128, 128, 1)">42</span> items = parse_result(html)<span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)"> 解析过滤我们想要的信息</span>
<span style="color: rgba(0, 128, 128, 1)">43</span> <span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> items:
</span><span style="color: rgba(0, 128, 128, 1)">44</span> <span style="color: rgba(0, 0, 0, 1)"> write_item_to_file(item)
</span><span style="color: rgba(0, 128, 128, 1)">45</span>
<span style="color: rgba(0, 128, 128, 1)">46</span>
<span style="color: rgba(0, 128, 128, 1)">47</span> <span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">48</span> <span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span> range(1, 26<span style="color: rgba(0, 0, 0, 1)">):
</span><span style="color: rgba(0, 128, 128, 1)">49</span> main(i)</pre>
</div>
<p> </p>
<p><strong><span style="font-size: 18px"> 是不是有点乱?别急,我们来一步步分析。(如果您不想看大段的分析,可以直接跳到最后,在那里我会给出修改后的,带有完整注释的代码)</span></strong></p>
<p><strong><span style="font-size: 18px"> 首先,Python程序中的代码是一行行顺序执行的,前面都是函数定义,因此直接先运行第47-49行的代码:</span></strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span> range(1, 26<span style="color: rgba(0, 0, 0, 1)">):
main(i)</span></pre>
</div>
<p> </p>
<p><strong><span style="font-size: 18px"> 看样子这里是在调用main函数(定义在第39行),那么__name__是什么呢?</span></strong></p>
<p><strong><span style="font-size: 18px"> __name__是系统内置变量,当直接运行包含main函数的程序时,__name__的值为"__main__",因此main函数会被执行,而当包含main函数程序作为module被import时,__name__的值为对应的module名字,此时main函数不会被执行。</span></strong></p>
<p><strong><span style="font-size: 18px"> 为了加深理解,可以阅读这篇文章,讲得非常清楚:</span></strong></p>
<p><strong><span style="font-size: 18px"> https://www.cnblogs.com/keguo/p/9760361.html</span></strong></p>
<p><strong><span style="font-size: 18px">我们的程序里是<strong>直接运行包含main函数的程序的,因此__name__的值就是__main__。</strong> </span></strong></p>
<p> </p>
<p><strong><span style="font-size: 18px">还有个小细节需要注意一下:</span></strong></p>
<p><strong><span style="font-size: 18px"> 像Lua这种语言,函数在结束之前会有end作为函数结束标记,包括if,for这种语句,都会有相应的end标记。但Python中是没有的,Python中是用对应的缩进来表示各个作用域的,我们把</span></strong><strong><span style="font-size: 18px"><strong>第47-49行的代码稍微改一下来进一步说明:</strong></span></strong></p>
<p><strong><span style="font-size: 18px"> 新建个Python文件,直接输入:</span></strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span> range(1,5<span style="color: rgba(0, 0, 0, 1)">):
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">内层</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">外层</span><span style="color: rgba(128, 0, 0, 1)">"</span>)</pre>
</div>
<p><span style="font-size: 18px"><strong> 此时for语句比if语句缩进更多,因此位于if的作用域内,同理,print("内层")语句位于for语句的作用域内,因此会打印5次,print("外层")已经不在for语句的作用域内,而在if语句的作用域内,因此只打印1次,运行结果如下:</strong></span></p>
<p> <img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190907174049183-1633311573.png"></p>
<p><span style="font-size: 18px"><strong> 那么47-49行做的就是循环调用25次main函数(range左闭右开),为什么是25次呢?因为要爬取的当当网好评榜一页有20本图书数据,要爬500本我们需要发送25次数据请求。</strong></span></p>
<p><span style="font-size: 18px"><strong> 我们看一下main函数(39-44行)做了什么:</strong></span></p>
<p><span style="font-size: 18px"><strong> 首先进行了url的拼接,每次调用时传入不同的page,分别对应第1-25页数据。随后调用request_dandan发送数据请求,看一下<strong>request_dandan(第6-12行)做了什么:</strong></strong></span></p>
<p><span style="font-size: 18px"><strong><strong> 这里调用了requests模块向服务器发送get请求,因此要在程序开头导入requests模块(第1行),get请求去指定的url获取网页数据,随后对响应码作了判断,200代表获取成功,成功就返回获取的响应数据。要注意的一点是,这里get请求是同步请求,意思是发送请求后程序会阻塞在原地,直到收到服务器的响应后继续执行下一行代码。</strong></strong></span></p>
<p><span style="font-size: 18px"><strong> 接下来main函数要调用parse_result(第15到30行)对<strong>获取到的html文本进行</strong>解析,提取其中与图书有关的信息,在分析这段代码之前,我们需要先了解下返回的html文件的格式:</strong></span></p>
<p><span style="font-size: 18px"><strong>我们可以在chrome浏览器中的开发者工具里,查看对应请求网页响应的html格式,以我的为例:</strong></span><strong> </strong></p>
<p><img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190907200706435-1523104222.png"></p>
<p><strong style="font-size: 18px"> <strong>以第一本书“有话说出来”为例,用Command+F(Mac下)快速翻找一下与要爬取的图书有关的信息:</strong></strong></p>
<p><img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190907201036017-1875257866.png"></p>
<p><span style="font-size: 18px"><strong style="font-size: 14px"><strong> <span style="font-size: 18px">每一本书的信息格式是这样的:</span></strong></strong></span></p>
<div class="cnblogs_code">
<pre><li>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">list_num red</span><span style="color: rgba(128, 0, 0, 1)">"</span>>1.</div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">pic</span><span style="color: rgba(128, 0, 0, 1)">"</span>><a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://product.dangdang.com/25345988.html</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>><img src=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://img3m8.ddimg.cn/8/26/25345988-1_l_1.jpg</span><span style="color: rgba(128, 0, 0, 1)">"</span> alt=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">有话说出来!(彻底颠覆社会人脉的固有方式,社交电池帮你搞定社交。社交恐惧症患者必须拥有的一本实用社交指南,初入大学和职场的必备“攻略”,拿起这本书,你也是“魏璎珞”)纤阅出品</span><span style="color: rgba(128, 0, 0, 1)">"</span>title=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">有话说出来!(彻底颠覆社会人脉的固有方式,社交电池帮你搞定社交。社交恐惧症患者必须拥有的一本实用社交指南,初入大学和职场的必备“攻略”,拿起这本书,你也是“魏璎珞”)纤阅出品</span><span style="color: rgba(128, 0, 0, 1)">"</span>/></a></div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">name</span><span style="color: rgba(128, 0, 0, 1)">"</span>><a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://product.dangdang.com/25345988.html</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span> title=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">有话说出来!(彻底颠覆社会人脉的固有方式,社交电池帮你搞定社交。社交恐惧症患者必须拥有的一本实用社交指南,初入大学和职场的必备“攻略”,拿起这本书,你也是“魏璎珞”)纤阅出品</span><span style="color: rgba(128, 0, 0, 1)">"</span>>有话说出来!(彻底颠覆社会人脉的固有方式,社交电池帮你搞定社<span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">dot</span><span style="color: rgba(128, 0, 0, 1)">'</span>>...</span></a></div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">star</span><span style="color: rgba(128, 0, 0, 1)">"</span>><span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">level</span><span style="color: rgba(128, 0, 0, 1)">"</span>><span style=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">width: 100%;</span><span style="color: rgba(128, 0, 0, 1)">"</span>></span></span><a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://product.dangdang.com/25345988.html?point=comment_point</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>>17757条评论</a><span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">tuijian</span><span style="color: rgba(128, 0, 0, 1)">"</span>>100%推荐</span></div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">publisher_info</span><span style="color: rgba(128, 0, 0, 1)">"</span>>【美】<a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://search.dangdang.com/?key=帕特里克·金</span><span style="color: rgba(128, 0, 0, 1)">"</span> title=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">【美】帕特里克·金 著,张捷/李旭阳 译</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>>帕特里克·金</a> 著,<a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://search.dangdang.com/?key=张捷</span><span style="color: rgba(128, 0, 0, 1)">"</span> title=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">【美】帕特里克·金 著,张捷/李旭阳 译</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>>张捷</a>/<a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://search.dangdang.com/?key=李旭阳</span><span style="color: rgba(128, 0, 0, 1)">"</span> title=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">【美】帕特里克·金 著,张捷/李旭阳 译</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>>李旭阳</a> 译</div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">publisher_info</span><span style="color: rgba(128, 0, 0, 1)">"</span>><span>2018-08-01</span>&nbsp;<a href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">http://search.dangdang.com/?key=天津人民出版社</span><span style="color: rgba(128, 0, 0, 1)">"</span> target=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">_blank</span><span style="color: rgba(128, 0, 0, 1)">"</span>>天津人民出版社</a></div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">biaosheng</span><span style="color: rgba(128, 0, 0, 1)">"</span>>五星评分:<span>16273次</span></div>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">price</span><span style="color: rgba(128, 0, 0, 1)">"</span>>
<p><span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">price_n</span><span style="color: rgba(128, 0, 0, 1)">"</span>>&yen;30.40</span>
<span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">price_r</span><span style="color: rgba(128, 0, 0, 1)">"</span>>&yen;42.00</span>(<span <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">price_s</span><span style="color: rgba(128, 0, 0, 1)">"</span>>7.2折</span><span style="color: rgba(0, 0, 0, 1)">)
</span></p>
<p <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">price_e</span><span style="color: rgba(128, 0, 0, 1)">"</span>></p>
<div <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">buy_button</span><span style="color: rgba(128, 0, 0, 1)">"</span>>
<a ddname=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">加入购物车</span><span style="color: rgba(128, 0, 0, 1)">"</span> name=<span style="color: rgba(128, 0, 0, 1)">""</span> href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">javascript:AddToShoppingCart('25345988');</span><span style="color: rgba(128, 0, 0, 1)">"</span> <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">listbtn_buy</span><span style="color: rgba(128, 0, 0, 1)">"</span>>加入购物车</a>
<a ddname=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">加入收藏</span><span style="color: rgba(128, 0, 0, 1)">"</span> id=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">addto_favorlist_25345988</span><span style="color: rgba(128, 0, 0, 1)">"</span> name=<span style="color: rgba(128, 0, 0, 1)">""</span> href=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">javascript:showMsgBox('addto_favorlist_25345988',encodeURIComponent('25345988&platform=3'), 'http://myhome.dangdang.com/addFavoritepop');</span><span style="color: rgba(128, 0, 0, 1)">"</span> <span style="color: rgba(0, 0, 255, 1)">class</span>=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">listbtn_collect</span><span style="color: rgba(128, 0, 0, 1)">"</span>>收藏</a>
</div>
</div></pre>
</div>
<p><strong><strong><span style="font-size: 18px"><strong> 是不是很乱?不要急,我们慢慢来分析,首先我们要明确自己要提取图书的哪部分信息,我们这里决定爬取它的:</strong></span></strong></strong></p>
<p><span style="font-size: large"><strong>排名,书名,图片地址,作者,推荐指数,五星评分次数和价格。</strong></span></p>
<p><span style="font-size: large"><strong> 那么对这么大段的html文本,怎么提取每本书的相关信息呢?答案自然是通过正则表达式,在parse_result函数中,先构建了用来匹配的正则表达式(第16行),随后对传入的html文件执行匹配,获取匹配结果(第19行),注意,这一步需要re模块的支持(在第1行导入re模块),re.compile是对匹配符的封装,直接用re.match(匹配符,要匹配的原文本)可以达到相同的效果, 当然,这里没有用re.match来执行匹配,而是用了re.findall,这是因为后者可以适用于多行文本的匹配。另外,re.compile后面的第2个参数,re.S是用来应对换行的,.匹配的单个字符不包括\n和\r,当遇到换行时,我们需要用到re.S。</strong></span></p>
<p><span style="font-size: large"><strong> 上面的这段表述可能不大清楚,具体re模块的正则匹配用法请自行百度,配合自己动手实验才能真正明白,这里只能描述个大概,另外,</strong></span><span style="font-size: large"><strong>我们这里不会从头开始讲解正则表达式的种种细节,而是<strong>仅对代码中用到的正则表达式进行分析,要了解更多正则表达式相关的消息,就需要您自行百度了,</strong>毕竟对一个程序员来说,自学能力还是很重要的。</strong></span></p>
<p><span style="font-size: 18px"> <strong>好,我们来看下代码用到的正则表达式:</strong></span></p>
<p><span style="font-size: 18px"> <strong>一段段来分析,首先是:</strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">li</span><span style="color: rgba(0, 0, 255, 1)">></span>.*?list_num.*?(\d+).<span style="color: rgba(0, 0, 255, 1)"></</span><span style="color: rgba(128, 0, 0, 1)">div</span><span style="color: rgba(0, 0, 255, 1)">></span></pre>
</div>
<p><span style="font-size: 18px"> <strong>.代表匹配除了\n和\r之外的任意字符,*代表匹配0次或多次,?跟在限制符(这里是*)后面是代表使用非贪婪模式匹配,因为默认的正则匹配是贪婪匹配,比如下面这段代码:<br></strong></span></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> re
content </span>= <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">abcabc</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">
res </span>= re.match(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a.*c</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,content)
</span><span style="color: rgba(0, 0, 255, 1)">print</span>(res.group())</pre>
</div>
<p><strong><span style="font-size: 18px"> 此时匹配时会匹配尽可能长的字符串,因此会输出abcabc,而若把a.*c改为a.*c?,此时是非贪婪匹配,会匹配尽可能少的字符串,因此会输出abc。</span></strong></p>
<p><strong><span style="font-size: 18px"> 然后是\d,代表匹配一个数字,+代表匹配1个或多个。因此上面的表达式匹配的就是html文本中下图所示的部分:</span></strong></p>
<p> <img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190907235257264-359315990.png"></p>
<p><span style="font-size: 18px"><strong> 注意,\d+被括号括起来了,代表将匹配的这部分内容(即图中的1这个数字)捕获并作为1个元素存放到了一个数组中,所以现在匹配结果对应的数组中(即item)第一个元素是1,也就是排名。</strong></span></p>
<p><span style="font-size: 18px"><strong> 随后是</strong></span></p>
<div class="cnblogs_code">
<pre>.*?<img src=<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">(.*?)</span><span style="color: rgba(128, 0, 0, 1)">"</span></pre>
</div>
<p><strong><span style="font-size: 18px"> 显然它匹配的是下面这段: </span></strong></p>
<p><strong><span style="font-size: 18px"><img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190908000509370-2072335720.png"><img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190908000532401-1127900882.png"></span></strong></p>
<p><strong><span style="font-size: 18px"> 此时会把括号中匹配到的图片地址作为第2个元素存到数组中<span style="font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, sans-serif"> </span></span></strong></p>
<p><strong><span style="font-size: 18px"> 剩下的匹配都是同样的原理,并没有什么值得注意的点,就不一一描述了,整个正则表达式一共有7对括号,因此有数组中存了7个元素,分别对应我们要提取的排名,书名,图片地址,作者,推荐指数,五星评分次数和价格。</span></strong></p>
<p><strong><span style="font-size: 18px"> re.findall进行了进行了匹配后返回一个数组items,里面存放了所有匹配成功的条目(item),每个item对应一次对正则表达式的成功匹配,也就是上面说的7个元素的数组。</span></strong></p>
<p><strong><span style="font-size: 18px"> 随后程序中遍历了items数组,将数组中每个item的数据封装成一个表,并分别进行返回。</span></strong></p>
<p><strong><span style="font-size: 18px"> 这里出了问题,每次返回的都是items数组中的一个item,而main函数中却对返回值又进一步遍历了其中的元素,并调用write_item_to_file将每个元素写到自定义的文件中(43-44行)。</span></strong></p>
<p><strong><span style="font-size: 18px"> 我们来看下write_item_to_file:</span></strong></p>
<p><strong><span style="font-size: 18px"> 第31行的:</span></strong></p>
<div class="cnblogs_code">
<pre> with open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">book.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>, encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">UTF-8</span><span style="color: rgba(128, 0, 0, 1)">'</span>) as f:</pre>
</div>
<p><strong><span style="font-size: 18px"> 是一种简化写法,等价于:</span></strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
f </span>= open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">book.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 0, 255, 1)">print</span><span style="color: rgba(0, 0, 0, 1)">(f.read())
</span><span style="color: rgba(0, 0, 255, 1)">finally</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> f:
f.close()</span></pre>
</div>
<p><strong><span style="font-size: 18px"> 具体可以参考:https://www.cnblogs.com/yizhenfeng/p/7554620.html</span></strong></p>
<p> <strong><span style="font-size: 18px">这里打开book.txt后(没有会自动创建),用write函数将item转换成的json格式的字符串写入(json.dumps函数是将一个Python数据类型列表进行json格式的编码(可以这么理解,json.dumps()函数是将字典转化为字符串))</span></strong></p>
<p><strong><span style="font-size: 18px"> 而我们传入<strong>write_item_to_file函数的是item中的一个元素,显然不是一个表,这当然不对。显然这里源代码写得有问题,在parse_result函数中,不应该遍历匹配到的items并返回其中的每个item,而是应该直接返回items(对应正则表达式所有匹配结果),这样,在main函数中,就可以正常地对items遍历,抽出每个item(对应正则表达式的一组匹配结果),传入<strong>write_item_to_file,后者在写入时,对item进行json转换,由于item是一个表,可以正常转换,自然也能正常写入。</strong><br></strong></span></strong></p>
<p><strong><span style="font-size: 18px"><strong><strong> 下面给出修改后的代码:</strong></strong></span></strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> requests
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> re
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 255, 1)">import</span><span style="color: rgba(0, 0, 0, 1)"> json
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> request_dandan(url):
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> <span style="color: rgba(0, 0, 255, 1)">try</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 128, 0, 1)">#</span><span style="color: rgba(0, 128, 0, 1)">同步请求</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span> response =<span style="color: rgba(0, 0, 0, 1)"> requests.get(url)
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 255, 1)">if</span> response.status_code == 200<span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">10</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> response.text
</span><span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 255, 1)">except</span><span style="color: rgba(0, 0, 0, 1)"> requests.RequestException:
</span><span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> None
</span><span style="color: rgba(0, 128, 128, 1)">13</span>
<span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> parse_result(html):
</span><span style="color: rgba(0, 128, 128, 1)">15</span> pattern = re.compile(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><li>.*?list_num.*?(\d+).</div>.*?<img src="(.*?)".*?class="name".*?title="(.*?)">.*?class="star">.*?class="tuijian">(.*?)</span>.*?class="publisher_info">.*?target="_blank">(.*?)</a>.*?class="biaosheng">.*?<span>(.*?)</span></div>.*?<p><span\sclass="price_n">&yen;(.*?)</span>.*?</li></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">,re.S)
</span><span style="color: rgba(0, 128, 128, 1)">16</span> items =<span style="color: rgba(0, 0, 0, 1)"> re.findall(pattern, html)
</span><span style="color: rgba(255, 0, 0, 1)">17 return items
18 # for item in items:
19 # yield {
20 # 'range': item,
21 # 'iamge': item,
22 # 'title': item,
23 # 'recommend': item,
24 # 'author': item,
25 # 'times': item,
26 # 'price': item
27 # }</span>
<span style="color: rgba(0, 128, 128, 1)">28</span>
<span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> write_item_to_file(item):
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 255, 1)">print</span>(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">开始写入数据 ====> </span><span style="color: rgba(128, 0, 0, 1)">'</span> +<span style="color: rgba(0, 0, 0, 1)"> str(item))
</span><span style="color: rgba(0, 128, 128, 1)">31</span> with open(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">book.txt</span><span style="color: rgba(128, 0, 0, 1)">'</span>, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">a</span><span style="color: rgba(128, 0, 0, 1)">'</span>, encoding=<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">UTF-8</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) as f:
</span><span style="color: rgba(0, 128, 128, 1)">32</span> f.write(json.dumps(item, ensure_ascii=False) + <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">33</span>
<span style="color: rgba(0, 128, 128, 1)">34</span> <span style="color: rgba(0, 0, 255, 1)">def</span><span style="color: rgba(0, 0, 0, 1)"> main(page):
</span><span style="color: rgba(0, 128, 128, 1)">35</span> url = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-</span><span style="color: rgba(128, 0, 0, 1)">'</span> +<span style="color: rgba(0, 0, 0, 1)"> str(page)
</span><span style="color: rgba(0, 128, 128, 1)">36</span> html =<span style="color: rgba(0, 0, 0, 1)"> request_dandan(url)
</span><span style="color: rgba(0, 128, 128, 1)">37</span> items =<span style="color: rgba(0, 0, 0, 1)"> parse_result(html)
</span><span style="color: rgba(0, 128, 128, 1)">38</span> <span style="color: rgba(0, 0, 255, 1)">for</span> item <span style="color: rgba(0, 0, 255, 1)">in</span><span style="color: rgba(0, 0, 0, 1)"> items:
</span><span style="color: rgba(0, 128, 128, 1)">39</span> <span style="color: rgba(0, 0, 0, 1)"> write_item_to_file(item)
</span><span style="color: rgba(0, 128, 128, 1)">40</span>
<span style="color: rgba(0, 128, 128, 1)">41</span> <span style="color: rgba(0, 0, 255, 1)">if</span> <span style="color: rgba(128, 0, 128, 1)">__name__</span> == <span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">__main__</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">42</span> <span style="color: rgba(0, 0, 255, 1)">for</span> i <span style="color: rgba(0, 0, 255, 1)">in</span> range(1,5<span style="color: rgba(0, 0, 0, 1)">):
</span><span style="color: rgba(0, 128, 128, 1)">43</span> main(i) </pre>
</div>
<p><strong><span style="font-size: 18px"> 标红的就是修改的部分,去试试吧,此时查看新生成的book.txt文件,结果如下: </span></strong></p>
<p><img src="https://img2018.cnblogs.com/blog/1539443/201909/1539443-20190908005136976-174955328.png"></p>
<p><strong><span style="font-size: 18px"> 至此,代码修改成功。下面记录一些额外的知识点,感兴趣的可以看看:</span></strong></p>
<p><strong><span style="font-size: 18px"> 原先的错误代码用到了yield,之前用C#写代码时会用到协程,里面就用到了yield关键子,那么yield在Python中是怎么用的呢?</span></strong></p>
<p><strong><span style="font-size: 18px"> 事实上,<strong>函数中一旦有语句被yield标记,那这个函数就不一样了,此时直接调用它是不会调用的,而得理解成一种赋值效果,相当于暂存了这个函数,当要真正调用此函数时,需要用next驱动它,没驱动一次,就相当于调用一次这个函数,而</strong></span></strong><strong><span style="font-size: 18px">yield标记的语句实现的功能可以理解为是return,因此调用函数时,一旦运行到yield语句,就会返回,但返回后会记住当前运行到的yield语句位置,当下次再调用yield时,会从之前中断的yield语句处继续执行剩下的代码。</span></strong></p>
<p><strong><span style="font-size: 18px"> 是不是感觉有点抽象?不要着急,我为您精心准备了一份资料,参考下面这篇文章,相信你很快就能明白它的使用方法:</span></strong></p>
<p><strong><span style="font-size: 18px"> https://www.cnblogs.com/gausstu/p/9545519.html</span></strong></p>
<p> <strong><span style="font-size: 18px">至此,要讲的基本讲完了。最后讲一个小知识点吧,写代码时,在return或yield后面的大括号,是不能移到下一行中的,比如Python中:</span></strong></p>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> {
</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">range</span><span style="color: rgba(128, 0, 0, 1)">'</span>: 1<span style="color: rgba(0, 0, 0, 1)">,
</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">image</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:here
}</span></pre>
</div>
<div class="cnblogs_code">
<pre><span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">
{
</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">range</span><span style="color: rgba(128, 0, 0, 1)">'</span>: 1<span style="color: rgba(0, 0, 0, 1)">,
</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">image</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">:here
}</span></pre>
</div>
<p><strong><span style="font-size: 18px">这两种写法,区别仅仅是第二种把大括号写在了下面那行,但在Python中,是会报错的,这点需要注意一下。</span></strong></p>
<p> </p>
<p> <strong><span style="font-size: 18px">以前碰到问题搜别人的博客时,总是各种嫌弃,嫌弃这个写得不清楚,那个写得太简单,现在终于明白为什么会这样了,主要是写博客确实比较麻烦,举个例子吧,同样的知识,花一天能吸收完,但真要把它写出来,并且写得比较清晰,别人能看懂,基本还要再花一天。以前还好,博主一直在读研,有大把的时间可以记录,现在工作了,也没啥时间了,虽然工作中学了不少东西,但实在没时间记录,毕竟空余时间若是都用来写博客,就没时间学更多的东西。</span></strong></p>
<p><strong><span style="font-size: 18px"> 很多人的做法是用降低博客的质量来弥补,比如有些问题,明明很多细节,就用一两句话一笔带过,导致的结果就是除了自己没人能看懂,别人照着做的时候各种踩坑。说真的,这其实是一种很自私的行为,对记录人来说,可能真的只需要记几笔,提醒下自己关键点即可。但对那些碰到问题去搜解决方案的人来说,真的是一种煎熬,我相信大家都有这样的体会:项目中碰到了一个问题,去网上搜索解决方案,结果花了半天时间,网上的方案各种不靠谱,各种不详细,耗时又耗神。</span></strong></p>
<p><strong><span style="font-size: 18px"> 在我看来,即使有着上面提到的原因,这种做法也是不可原谅的,随着垃圾信息的不断增加,每个人获取有用信息的成本必然不断上升,到最后,受害的是所有人。</span></strong></p>
<p><strong><span style="font-size: 18px"> 可惜我无法改变这一切,我唯一能做到的就是,在我的博客中,尽可能将我的探索过程描述清楚,将每个细节展现给来看我博客的人,毕竟你们花了时间看我的博客,我也不能对不起你们。</span></strong></p>
<p><strong><span style="font-size: 18px"> 好了,发点牢骚而已,不要在意,有空我会陆续将工作中碰到的问题及解决方案逐渐记录下来的,可能会有点慢,但好在足够详细。</span></strong></p>
<p> </p>
<p> </p><br><br>
来源:https://www.cnblogs.com/czw52460183/p/11484001.html
頁:
[1]