day01:pandas数据分析
<p> </p><div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E5%B1%B1%E9%A1%B6%E4%BC%9A%E8%AF%BE%E7%A8%8B%E6%A6%82%E8%BF%B0">山顶会课程概述¶</h3>
<ul>
<li>【数分+开发为主题】,迎合多维度接单</li>
<li>课程排期问题</li>
<li>课程大致内容</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%9F%BA%E6%9C%AC%E6%A6%82%E8%BF%B0">数据分析基本概述¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E4%BB%80%E4%B9%88%E6%98%AF%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%EF%BC%9F">什么是数据分析?¶</h4>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>所谓的数据分析就是使用一些有效的方法和工具对收集到的数据进行处理,从中发现数据的关键趋势或者规律,以便做出合理的决策和提出有针对性的建议。通俗来说,数据分析就是从数据中找到有用的信息来帮助我们做出更明智、更准确的决策。 <img src="https://img2024.cnblogs.com/blog/2867340/202504/2867340-20250413191806011-1102061161.png" alt="image-2.png"></p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>总之,数据分析是将数据转化为可理解的信息和见解,可为业务提供决策信息、帮助解决问题和提高效率的过程。本质上,所有的决策、战略和规划都需要数据驱动,数据分析在这个过程中起到了突出的作用。</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>简单一句话描述数据分析:数据分析可以实现数据价值的最大化!</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E7%9A%84%E6%8A%80%E6%9C%AF%E5%AE%9E%E7%8E%B0">数据分析的技术实现¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E4%B8%8D%E5%86%99%E4%BB%A3%E7%A0%81%E7%9A%84%E5%AE%9E%E7%8E%B0">不写代码的实现¶</h4>
<p>处理简单或者普通难度的业务逻辑的分析处理</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>Excel</li>
<li>Mysql</li>
<li>BI工具(PowerBI或者Tableau)</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E5%86%99%E4%BB%A3%E7%A0%81%E7%9A%84%E5%AE%9E%E7%8E%B0">写代码的实现¶</h4>
<p>处理普通难度的业务逻辑的分析处理+复杂的业务逻辑处理</p>
<ul>
<li>Python
<ul>
<li>数据分析三剑客:Numpy、Pandas和Matplotlib</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>开发环境安装:
<ul>
<li>链接: https://pan.baidu.com/s/1xI-RafNRZKDQPI7WMSmd2A?pwd=6x2v 提取码: 6x2v</li>
<li>anaconda:数据分析的集成环境(包含了各种数据分析的模块)</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="Numpy">Numpy¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>NumPy(Numerical Python)是Python中用于数据分析、机器学习、科学计算的重要工具包,也是python进行科学计算重要基础库之一,多数值运算。 <img src="https://img2024.cnblogs.com/blog/2867340/202504/2867340-20250413191806029-1221175317.png" alt="image.png"></p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="Pandas%EF%BC%88%E9%87%8D%E7%82%B9%EF%BC%89">Pandas(重点)¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>Pandas 库是一个免费、开源的第三方 Python 库,是 Python 数据分析和机器学习必不可少的工具之一,它为 Python 数据分析提供了高性能,且易于使用的数据结构,即 Series 和 DataFrame。Pandas 自诞生后被应用于众多的领域,比如金融、统计学、社会科学、建筑工程等。</p>
<p>Pandas 库基于 Python NumPy 库开发而来,因此,它可以与 Python 的科学计算库配合使用。Pandas 提供了两种数据结构,分别是 Series(一维数组结构)与 DataFrame(二维表格结构),这两种数据结构极大地增强的了 Pandas 的数据分析能力。</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>数据结构介绍:数据存储在不同的数据结构表示的容器中,则可以基于容器的特性对数据进行不同维度的运算处理操作。</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p><img src="https://img2024.cnblogs.com/blog/2867340/202504/2867340-20250413191806022-1482106819.png" alt="image.png"></p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="Matplotlib">Matplotlib¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p>matplotlib是一个用于创建可视化图表的Python库。它提供了一组广泛的功能,用于绘制线图、散点图、柱状图、饼图、等高线图、热图等各种类型的图表。</p>
<p>matplotlib是一个功能强大且灵活的库,被广泛应用于数据可视化、科学计算、工程绘图等领域。</p>
<p>图表绘制在数据分析中主要用户进行数据探索和分析结果的展示。</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p><img src="https://img2024.cnblogs.com/blog/2867340/202504/2867340-20250413191727989-999956350.png" alt=""></p>
<p> </p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h2 id="pandas%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90%E5%BA%93">pandas数据分析库¶</h2>
<p>Pandas 提供了两种数据结构,分别是 Series(一维数组结构)与 DataFrame(二维数组结构),这两种数据结构极大地增强的了 Pandas 的数据分析能力。</p>
<h3 id="Series">Series¶</h3>
<h4 id="%E6%A6%82%E8%BF%B0">概述¶</h4>
<p>Series是一种类似与一维数组的对象,由下面两个部分组成:</p>
<ul>
<li>values:一组数据</li>
<li>index:相关的数据索引标签</li>
</ul>
<h4 id="%E5%B8%B8%E8%A7%81%E6%93%8D%E4%BD%9C">常见操作¶</h4>
<ul>
<li>创建方式
<ul>
<li>由列表创建</li>
<li>由字典创建</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>import pandas as pd
from pandas import Series,DataFrame</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1 = Series(data=)
s1</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 3
1 3
2 6
3 6
4 8
5 8
6 9
7 9
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s2 = Series(data={'name':'bobo','salary':10000,'age':30})
s2</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>name bobo
salary 10000
age 30
dtype: object</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>
<p>Series的索引</p>
<ul>
<li>隐式索引:默认形式的索引(0,1,2....)</li>
<li>显示索引:自定义的索引,可以通过index参数设置显示索引
<ul>
<li>显示索引的作用:增加了数据的可读性</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1[] #通过隐式索引访问元素</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 3
2 6
4 8
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s2['name']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>'bobo'</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>Series的索引和切片</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 3
1 3
2 6
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s2['name':'age']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>name bobo
salary 10000
age 30
dtype: object</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>
<p>Series的常用方法</p>
<ul>
<li>head(),tail()</li>
<li>unique(),nunique()</li>
<li>value_counts()</li>
<li>isnull(),notnull()</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1.head(3),s1.tail(2) #用来显示前几个或者后几个元素</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>(0 3
1 3
2 6
dtype: int64,
6 9
7 9
dtype: int64)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1.unique() #元素去重</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>array()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1.nunique() #统计去重后元素的个数</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>4</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s1.value_counts() #统计元素出现的次数</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>3 2
6 2
8 2
9 2
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#series的运算
s1 + 100 #让s1的每一个元素都加上100</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 103
1 103
2 106
3 106
4 108
5 108
6 109
7 109
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s3 = Series(data=,index=['a','b','c'])
s3</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>a 1
b 2
c 3
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s4 = Series(data=,index=['a','b','d'])
s4</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>a 1
b 2
d 3
dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>s3 + s4 #NAN就是None
#在Series的运算中,只有索引一致的元素可以进行算数运算,否则补空</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>a 2.0
b 4.0
c NaN
d NaN
dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="DataFrame(%E9%87%8D%E8%A6%81)">DataFrame(重要)¶</h3>
<h4 id="%E6%A6%82%E8%BF%B0">概述¶</h4>
<ul>
<li>DataFrame是一个【表格型】的数据结构。DataFrame由按一定顺序排列的多列数据组成。设计初衷是将Series的使用场景从一维拓展到多维。DataFrame既有行索引,也有列索引。
<ul>
<li>行索引:index</li>
<li>列索引:columns</li>
<li>值:values</li>
</ul>
</li>
</ul>
<h4 id="DataFrame%E7%9A%84%E5%88%9B%E5%BB%BA">DataFrame的创建¶</h4>
<ul>
<li>字典创建</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>dic = {
'name':['Tom','Jerry','Jay'],
'age':,
'salary':
}
table = DataFrame(data=dic)
table</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>name</th><th>age</th><th>salary</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Tom</td>
<td>10</td>
<td>2000</td>
</tr>
<tr><th>1</th>
<td>Jerry</td>
<td>20</td>
<td>3000</td>
</tr>
<tr><th>2</th>
<td>Jay</td>
<td>30</td>
<td>4000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="DataFrame%E7%9A%84%E5%B8%B8%E7%94%A8%E5%B1%9E%E6%80%A7">DataFrame的常用属性¶</h4>
<ul>
<li>values,columns,index,shape</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>table.shape #表格的形状</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>(3, 3)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>table.values #返回表格所有的值</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>array([['Tom', 10, 2000],
['Jerry', 20, 3000],
['Jay', 30, 4000]], dtype=object)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>table.index #行索引</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>RangeIndex(start=0, stop=3, step=1)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>table.columns #列索引</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>Index(['name', 'age', 'salary'], dtype='object')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E7%B4%A2%E5%BC%95%E6%93%8D%E4%BD%9C%EF%BC%88%E9%87%8D%E7%82%B9%EF%BC%89">索引操作(重点)¶</h4>
<ul>
<li>对行进行索引</li>
<li>队列进行索引</li>
<li>对元素进行索引</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>dic = {'names':['jay','tom','jerry'],
'salary':,
'age':}
df = DataFrame(data=dic,index=['a','b','c'])
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#索引取单列:df
df['age']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>a 30
b 40
c 50
Name: age, dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#索引取多列:df[]
df[['age','names']]</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>age</th><th>names</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>30</td>
<td>jay</td>
</tr>
<tr><th>b</th>
<td>40</td>
<td>tom</td>
</tr>
<tr><th>c</th>
<td>50</td>
<td>jerry</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#索引取单行:df.loc/iloc
df.loc['a']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>names jay
salary 1000
age 30
Name: a, dtype: object</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.iloc #iloc后面跟的是隐式索引,loc后面跟显示索引</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>names jay
salary 1000
age 30
Name: a, dtype: object</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#索引取多行:df.loc/iloc[]
df.loc[['b','a']]</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#索引取元素
df.loc['b','names'] #逗号左边是行,右边是列</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>'tom'</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E5%88%87%E7%89%87%E6%93%8D%E4%BD%9C">切片操作¶</h4>
<ul>
<li>批量切行</li>
<li>批量切列</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#切行
df['a':'c']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#切列
df.loc[:,'names':'salary']</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E6%95%B0%E6%8D%AE%E6%9F%A5%E7%9C%8B">数据查看¶</h4>
<ul>
<li>查看DataFrame的概览和统计信息
<ul>
<li>head()</li>
<li>tail()</li>
<li>info()</li>
<li>describe()</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.info() #查看表格的基本信息</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"> </div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code><class 'pandas.core.frame.DataFrame'>
Index: 3 entries, a to c
Data columns (total 3 columns):
# ColumnNon-Null CountDtype
----------------------------
0 names 3 non-null object
1 salary3 non-null int64
2 age 3 non-null int64
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.describe() #对数据表格进行统计描述</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>count</th>
<td>3.0</td>
<td>3.0</td>
</tr>
<tr><th>mean</th>
<td>2000.0</td>
<td>40.0</td>
</tr>
<tr><th>std</th>
<td>1000.0</td>
<td>10.0</td>
</tr>
<tr><th>min</th>
<td>1000.0</td>
<td>30.0</td>
</tr>
<tr><th>25%</th>
<td>1500.0</td>
<td>35.0</td>
</tr>
<tr><th>50%</th>
<td>2000.0</td>
<td>40.0</td>
</tr>
<tr><th>75%</th>
<td>2500.0</td>
<td>45.0</td>
</tr>
<tr><th>max</th>
<td>3000.0</td>
<td>50.0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E6%95%B0%E6%8D%AE%E4%BF%9D%E5%AD%98%E4%B8%8E%E5%8A%A0%E8%BD%BD">数据保存与加载¶</h4>
<h5 id="csv">csv¶</h5>
<ul>
<li>to_csv() & read_csv()</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#将df数据写入到文件中存储
dic = {'names':['jay','tom','jerry'],
'salary':,
'age':}
df = pd.DataFrame(data=dic,index=['a','b','c'])
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.to_csv('./df.csv')</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#读取外部文件 ./data/透视表-篮球赛.csv的数据到df表格中
ball = pd.read_csv('data/透视表-篮球赛.csv')
ball</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>对手</th><th>胜负</th><th>主客场</th><th>命中</th><th>投篮数</th><th>投篮命中率</th><th>3分命中率</th><th>篮板</th><th>助攻</th><th>得分</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>勇士</td>
<td>胜</td>
<td>客</td>
<td>10</td>
<td>23</td>
<td>0.435</td>
<td>0.444</td>
<td>6</td>
<td>11</td>
<td>27</td>
</tr>
<tr><th>1</th>
<td>国王</td>
<td>胜</td>
<td>客</td>
<td>8</td>
<td>21</td>
<td>0.381</td>
<td>0.286</td>
<td>3</td>
<td>9</td>
<td>27</td>
</tr>
<tr><th>2</th>
<td>小牛</td>
<td>胜</td>
<td>主</td>
<td>10</td>
<td>19</td>
<td>0.526</td>
<td>0.462</td>
<td>3</td>
<td>7</td>
<td>29</td>
</tr>
<tr><th>3</th>
<td>灰熊</td>
<td>负</td>
<td>主</td>
<td>8</td>
<td>20</td>
<td>0.400</td>
<td>0.250</td>
<td>5</td>
<td>8</td>
<td>22</td>
</tr>
<tr><th>4</th>
<td>76人</td>
<td>胜</td>
<td>客</td>
<td>10</td>
<td>20</td>
<td>0.500</td>
<td>0.250</td>
<td>3</td>
<td>13</td>
<td>27</td>
</tr>
<tr><th>5</th>
<td>黄蜂</td>
<td>胜</td>
<td>客</td>
<td>8</td>
<td>18</td>
<td>0.444</td>
<td>0.400</td>
<td>10</td>
<td>11</td>
<td>27</td>
</tr>
<tr><th>6</th>
<td>灰熊</td>
<td>负</td>
<td>客</td>
<td>6</td>
<td>19</td>
<td>0.316</td>
<td>0.222</td>
<td>4</td>
<td>8</td>
<td>20</td>
</tr>
<tr><th>7</th>
<td>76人</td>
<td>负</td>
<td>主</td>
<td>8</td>
<td>21</td>
<td>0.381</td>
<td>0.429</td>
<td>4</td>
<td>7</td>
<td>29</td>
</tr>
<tr><th>8</th>
<td>尼克斯</td>
<td>胜</td>
<td>客</td>
<td>9</td>
<td>23</td>
<td>0.391</td>
<td>0.353</td>
<td>5</td>
<td>9</td>
<td>31</td>
</tr>
<tr><th>9</th>
<td>老鹰</td>
<td>胜</td>
<td>客</td>
<td>8</td>
<td>15</td>
<td>0.533</td>
<td>0.545</td>
<td>3</td>
<td>11</td>
<td>29</td>
</tr>
<tr><th>10</th>
<td>爵士</td>
<td>胜</td>
<td>主</td>
<td>19</td>
<td>25</td>
<td>0.760</td>
<td>0.875</td>
<td>2</td>
<td>13</td>
<td>56</td>
</tr>
<tr><th>11</th>
<td>骑士</td>
<td>胜</td>
<td>主</td>
<td>8</td>
<td>21</td>
<td>0.381</td>
<td>0.429</td>
<td>11</td>
<td>13</td>
<td>35</td>
</tr>
<tr><th>12</th>
<td>灰熊</td>
<td>胜</td>
<td>主</td>
<td>11</td>
<td>25</td>
<td>0.440</td>
<td>0.429</td>
<td>4</td>
<td>8</td>
<td>38</td>
</tr>
<tr><th>13</th>
<td>步行者</td>
<td>胜</td>
<td>客</td>
<td>9</td>
<td>21</td>
<td>0.429</td>
<td>0.250</td>
<td>5</td>
<td>15</td>
<td>26</td>
</tr>
<tr><th>14</th>
<td>猛龙</td>
<td>负</td>
<td>主</td>
<td>8</td>
<td>25</td>
<td>0.320</td>
<td>0.273</td>
<td>6</td>
<td>11</td>
<td>38</td>
</tr>
<tr><th>15</th>
<td>太阳</td>
<td>胜</td>
<td>客</td>
<td>12</td>
<td>22</td>
<td>0.545</td>
<td>0.545</td>
<td>2</td>
<td>7</td>
<td>48</td>
</tr>
<tr><th>16</th>
<td>灰熊</td>
<td>胜</td>
<td>客</td>
<td>9</td>
<td>20</td>
<td>0.450</td>
<td>0.500</td>
<td>5</td>
<td>7</td>
<td>29</td>
</tr>
<tr><th>17</th>
<td>掘金</td>
<td>胜</td>
<td>主</td>
<td>6</td>
<td>16</td>
<td>0.375</td>
<td>0.143</td>
<td>8</td>
<td>9</td>
<td>21</td>
</tr>
<tr><th>18</th>
<td>尼克斯</td>
<td>胜</td>
<td>主</td>
<td>12</td>
<td>27</td>
<td>0.444</td>
<td>0.385</td>
<td>2</td>
<td>10</td>
<td>37</td>
</tr>
<tr><th>19</th>
<td>篮网</td>
<td>胜</td>
<td>主</td>
<td>13</td>
<td>20</td>
<td>0.650</td>
<td>0.615</td>
<td>10</td>
<td>8</td>
<td>37</td>
</tr>
<tr><th>20</th>
<td>步行者</td>
<td>胜</td>
<td>主</td>
<td>8</td>
<td>22</td>
<td>0.364</td>
<td>0.333</td>
<td>8</td>
<td>10</td>
<td>29</td>
</tr>
<tr><th>21</th>
<td>湖人</td>
<td>胜</td>
<td>客</td>
<td>13</td>
<td>22</td>
<td>0.591</td>
<td>0.444</td>
<td>4</td>
<td>9</td>
<td>36</td>
</tr>
<tr><th>22</th>
<td>爵士</td>
<td>胜</td>
<td>客</td>
<td>8</td>
<td>19</td>
<td>0.421</td>
<td>0.333</td>
<td>5</td>
<td>3</td>
<td>29</td>
</tr>
<tr><th>23</th>
<td>开拓者</td>
<td>胜</td>
<td>客</td>
<td>16</td>
<td>29</td>
<td>0.552</td>
<td>0.571</td>
<td>8</td>
<td>3</td>
<td>48</td>
</tr>
<tr><th>24</th>
<td>鹈鹕</td>
<td>胜</td>
<td>主</td>
<td>8</td>
<td>16</td>
<td>0.500</td>
<td>0.400</td>
<td>1</td>
<td>17</td>
<td>26</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h5 id="Excel">Excel¶</h5>
<p>环境安装:</p>
<p>pip install xlrd -i https://pypi.tuna.tsinghua.edu.cn/simple</p>
<p>pip install xlwt -i https://pypi.tuna.tsinghua.edu.cn/simple</p>
<p>pip install openpyxl -i https://pypi.tuna.tsinghua.edu.cn/simple</p>
<ul>
<li>to_excel(filaPath,sheet_name) & read_excel(filaPath,sheet_name)
<ul>
<li>sheet_name工作表名称</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#读取excel数据:data/运营商数据.xlsx
opt = pd.read_excel('data/运营商数据.xlsx')
opt</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>用户号码</th><th>用户套餐月租</th><th>入网时间</th><th>近6个月平均话费</th><th>近6个月平均使用流量</th><th>近6个月平均使用语音</th><th>优惠名称</th><th>号码品牌</th><th>用户年龄</th><th>用户性别</th><th>是否订购</th><th>是否参与活动</th><th>活动开始时间</th><th>活动结束时间</th><th>外呼团队</th><th>外呼时间</th><th>外呼分钟数</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>1</td>
<td>56</td>
<td>20020209</td>
<td>146.2050</td>
<td>9090.910500</td>
<td>398.3167</td>
<td>送3个月会员</td>
<td>4G</td>
<td>55</td>
<td>男</td>
<td>否</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>201911</td>
<td>91</td>
</tr>
<tr><th>1</th>
<td>2</td>
<td>50</td>
<td>20060424</td>
<td>50.0000</td>
<td>3980.592767</td>
<td>86.9000</td>
<td>送3个月会员</td>
<td>4G</td>
<td>51</td>
<td>男</td>
<td>否</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>201909</td>
<td>28</td>
</tr>
<tr><th>2</th>
<td>3</td>
<td>50</td>
<td>20111206</td>
<td>67.1125</td>
<td>1706.841767</td>
<td>453.0833</td>
<td>送3个月会员</td>
<td>4G</td>
<td>36</td>
<td>女</td>
<td>是</td>
<td>会员赠送3个月</td>
<td>201909.0</td>
<td>202008.0</td>
<td>团队D</td>
<td>201909</td>
<td>128</td>
</tr>
<tr><th>3</th>
<td>4</td>
<td>56</td>
<td>20120412</td>
<td>99.0000</td>
<td>2872.303067</td>
<td>41.3500</td>
<td>送3个月会员</td>
<td>4G</td>
<td>35</td>
<td>女</td>
<td>是</td>
<td>会员赠送3个月</td>
<td>201909.0</td>
<td>202008.0</td>
<td>团队D</td>
<td>201909</td>
<td>91</td>
</tr>
<tr><th>4</th>
<td>5</td>
<td>88</td>
<td>20150503</td>
<td>88.0000</td>
<td>28222.901100</td>
<td>326.3500</td>
<td>送3个月会员</td>
<td>4G</td>
<td>57</td>
<td>男</td>
<td>是</td>
<td>会员赠送3个月</td>
<td>201909.0</td>
<td>202008.0</td>
<td>团队D</td>
<td>201909</td>
<td>99</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>16493</th>
<td>16494</td>
<td>49</td>
<td>20041014</td>
<td>49.0500</td>
<td>50.793967</td>
<td>57.2000</td>
<td>送3个月会员</td>
<td>4G</td>
<td>23</td>
<td>女</td>
<td>是</td>
<td>会员赠送3个月</td>
<td>201910.0</td>
<td>202009.0</td>
<td>团队D</td>
<td>201910</td>
<td>84</td>
</tr>
<tr><th>16494</th>
<td>16495</td>
<td>9</td>
<td>20060310</td>
<td>15.4250</td>
<td>554.286000</td>
<td>56.7667</td>
<td>送3个月会员</td>
<td>4G</td>
<td>47</td>
<td>女</td>
<td>否</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>201911</td>
<td>0</td>
</tr>
<tr><th>16495</th>
<td>16496</td>
<td>28</td>
<td>20020417</td>
<td>64.7350</td>
<td>0.002900</td>
<td>111.8833</td>
<td>送3个月会员</td>
<td>2G</td>
<td>61</td>
<td>男</td>
<td>否</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>201910</td>
<td>34</td>
</tr>
<tr><th>16496</th>
<td>16497</td>
<td>15</td>
<td>20121001</td>
<td>18.1750</td>
<td>186.963833</td>
<td>21.8333</td>
<td>送3个月会员</td>
<td>2G</td>
<td>28</td>
<td>男</td>
<td>否</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>201910</td>
<td>34</td>
</tr>
<tr><th>16497</th>
<td>16498</td>
<td>19</td>
<td>20171103</td>
<td>36.2250</td>
<td>3839.240000</td>
<td>149.6667</td>
<td>送3个月会员</td>
<td>4G</td>
<td>37</td>
<td>女</td>
<td>是</td>
<td>会员赠送3个月</td>
<td>201910.0</td>
<td>202009.0</td>
<td>团队D</td>
<td>201910</td>
<td>63</td>
</tr>
</tbody>
</table>
<p>16498 rows × 17 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>opt.shape</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>(16498, 17)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#写入数据到excel中
dic = {'names':['jay','tom','jerry'],
'salary':,
'age':}
df = pd.DataFrame(data=dic,index=['a','b','c'])
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>a</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>b</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>c</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.to_excel('dic.xlsx')</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>为什么需要将外部文件的数据读取加载到DataFrame表格中呢?
<ul>
<li>将外部文件读取到DataFrame中,我们就可以基于DataFrame自身的特性对数据进行不同维度的运算和处理</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h5 id="sql">sql¶</h5>
<p>环境安装:</p>
<p>pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple</p>
<p>pip install pymysql -i https://pypi.tuna.tsinghua.edu.cn/simple</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>写入数据到数据库
<ul>
<li>from sqlalchemy import create_engine</li>
<li>创建链接对象:
<ul>
<li>conn = create_engine('mysql+pymysql://root:boboadmin@127.0.0.1:3306/spider?charset=UTF8MB4')</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>dic = {'names':['jay','tom','jerry'],
'salary':,
'age':}
df = pd.DataFrame(data=dic)
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>names</th><th>salary</th><th>age</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>jay</td>
<td>1000</td>
<td>30</td>
</tr>
<tr><th>1</th>
<td>tom</td>
<td>2000</td>
<td>40</td>
</tr>
<tr><th>2</th>
<td>jerry</td>
<td>3000</td>
<td>50</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>from sqlalchemy import create_engine
#创建一个链接对象
#mysql+pymysql://用户名:密码@ip:port/dbName?charset=UTF8MB4
conn = create_engine('mysql+pymysql://root:boboadmin@127.0.0.1:3306/new_spider?charset=UTF8MB4')
df.to_sql(name='tb_df_new',con=conn)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>3</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>读取数据库中的数据</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>import pymysql
conn = pymysql.Connect(
host = '127.0.0.1', #数据库服务器地址
port = 3306, #数据库端口
user = 'root', #数据库的用户名
password = 'boboadmin', #密码
db = 'new_spider' #数据库名字
)
ret = pd.read_sql('select * from dep',conn)
ret</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"> </div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="application/vnd.jupyter.stderr">
<pre class="highlighter-hljs"><code>/Users/zhangxiaobo/opt/arm-anaconda/anaconda3/lib/python3.9/site-packages/pandas/io/sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn(</code></pre>
</div>
</div>
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>id</th><th>name</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>200</td>
<td>技术</td>
</tr>
<tr><th>1</th>
<td>201</td>
<td>人力资源</td>
</tr>
<tr><th>2</th>
<td>202</td>
<td>销售</td>
</tr>
<tr><th>3</th>
<td>203</td>
<td>运营</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E8%82%A1%E7%A5%A8%E5%88%86%E6%9E%90%E6%A1%88%E4%BE%8B">股票分析案例¶</h4>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>import pandas as pd
#可以将本地的文件数据读取到df,./data/600519.xlsx
df = pd.read_excel('data/600519.xlsx')
df.head()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>Unnamed: 0</th><th>date</th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0</td>
<td>2015-01-05</td>
<td>24.096</td>
<td>35.823</td>
<td>37.387</td>
<td>23.250</td>
<td>94515.0</td>
<td>600519</td>
</tr>
<tr><th>1</th>
<td>1</td>
<td>2015-01-06</td>
<td>33.532</td>
<td>31.560</td>
<td>35.860</td>
<td>29.914</td>
<td>55020.0</td>
<td>600519</td>
</tr>
<tr><th>2</th>
<td>2</td>
<td>2015-01-07</td>
<td>29.932</td>
<td>27.114</td>
<td>33.078</td>
<td>24.432</td>
<td>54797.0</td>
<td>600519</td>
</tr>
<tr><th>3</th>
<td>3</td>
<td>2015-01-08</td>
<td>28.078</td>
<td>26.041</td>
<td>28.550</td>
<td>24.569</td>
<td>40525.0</td>
<td>600519</td>
</tr>
<tr><th>4</th>
<td>4</td>
<td>2015-01-09</td>
<td>24.805</td>
<td>24.723</td>
<td>29.687</td>
<td>24.541</td>
<td>53982.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#删除无用的一列
df.drop(columns='Unnamed: 0',inplace=True) #inplace=True将删除操作作用在了原始数据中</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.shape</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>(2153, 7)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.head()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>date</th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>2015-01-05</td>
<td>24.096</td>
<td>35.823</td>
<td>37.387</td>
<td>23.250</td>
<td>94515.0</td>
<td>600519</td>
</tr>
<tr><th>1</th>
<td>2015-01-06</td>
<td>33.532</td>
<td>31.560</td>
<td>35.860</td>
<td>29.914</td>
<td>55020.0</td>
<td>600519</td>
</tr>
<tr><th>2</th>
<td>2015-01-07</td>
<td>29.932</td>
<td>27.114</td>
<td>33.078</td>
<td>24.432</td>
<td>54797.0</td>
<td>600519</td>
</tr>
<tr><th>3</th>
<td>2015-01-08</td>
<td>28.078</td>
<td>26.041</td>
<td>28.550</td>
<td>24.569</td>
<td>40525.0</td>
<td>600519</td>
</tr>
<tr><th>4</th>
<td>2015-01-09</td>
<td>24.805</td>
<td>24.723</td>
<td>29.687</td>
<td>24.541</td>
<td>53982.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.info()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"> </div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code><class 'pandas.core.frame.DataFrame'>
RangeIndex: 2153 entries, 0 to 2152
Data columns (total 7 columns):
# ColumnNon-Null CountDtype
----------------------------
0 date 2153 non-null object
1 open 2153 non-null float64
2 close 2153 non-null float64
3 high 2153 non-null float64
4 low 2153 non-null float64
5 volume2153 non-null float64
6 code 2153 non-null int64
dtypes: float64(5), int64(1), object(1)
memory usage: 117.9+ KB</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#将date列转转换成时间类型
df['date'] = df['date'].astype('datetime64') #astype用作类型转换</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.info()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"> </div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code><class 'pandas.core.frame.DataFrame'>
RangeIndex: 2153 entries, 0 to 2152
Data columns (total 7 columns):
# ColumnNon-Null CountDtype
----------------------------
0 date 2153 non-null datetime64
1 open 2153 non-null float64
2 close 2153 non-null float64
3 high 2153 non-null float64
4 low 2153 non-null float64
5 volume2153 non-null float64
6 code 2153 non-null int64
dtypes: datetime64(1), float64(5), int64(1)
memory usage: 117.9 KB</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#显示索引优势:可以增加数据的可读性
#将date列作为表格的行索引
df.set_index('date',inplace=True)</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.head()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
<tr><th>date</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>2015-01-05</th>
<td>24.096</td>
<td>35.823</td>
<td>37.387</td>
<td>23.250</td>
<td>94515.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-06</th>
<td>33.532</td>
<td>31.560</td>
<td>35.860</td>
<td>29.914</td>
<td>55020.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-07</th>
<td>29.932</td>
<td>27.114</td>
<td>33.078</td>
<td>24.432</td>
<td>54797.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-08</th>
<td>28.078</td>
<td>26.041</td>
<td>28.550</td>
<td>24.569</td>
<td>40525.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-09</th>
<td>24.805</td>
<td>24.723</td>
<td>29.687</td>
<td>24.541</td>
<td>53982.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.shape</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>(2153, 6)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>
<p>计算股票的每日收益率和7日波动率:通过计算收益率和波动率,我们可以评估股票的风险和收益情况。</p>
<ul>
<li>每日收益率:(当日收盘价 - 前一日的收盘价)/ 前一日的收盘价
<ul>
<li>shift():将一组数据向前或者前后进行移动</li>
</ul>
</li>
<li>7日波动率:对每日收益率数据进行每7日滚动的方差计算
<ul>
<li>rolling():设置滚动窗口</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df['close'].shift(1)#前日收盘</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>date
2015-01-05 NaN
2015-01-06 35.823
2015-01-07 31.560
2015-01-08 27.114
2015-01-09 26.041
...
2023-11-03 1779.500
2023-11-06 1811.240
2023-11-07 1812.000
2023-11-08 1791.170
2023-11-09 1798.340
Name: close, Length: 2153, dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#每日收益率:(当日收盘价 - 前一日的收盘价)/ 前一日的收盘价
day_rate = (df['close'] - df['close'].shift(1)) / df['close'].shift(1)
day_rate</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>date
2015-01-05 NaN
2015-01-06 -0.119002
2015-01-07 -0.140875
2015-01-08 -0.039574
2015-01-09 -0.050612
...
2023-11-03 0.017836
2023-11-06 0.000420
2023-11-07 -0.011496
2023-11-08 0.004003
2023-11-09 -0.002352
Name: close, Length: 2153, dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#7日波动率
day_7_rolling_rate = day_rate.rolling(7).var() #var计算一组数据的方差
day_7_rolling_rate </code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>date
2015-01-05 NaN
2015-01-06 NaN
2015-01-07 NaN
2015-01-08 NaN
2015-01-09 NaN
...
2023-11-03 0.000457
2023-11-06 0.000442
2023-11-07 0.000513
2023-11-08 0.000510
2023-11-09 0.000525
Name: close, Length: 2153, dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>查找股票的市值最大和最小日
<ul>
<li>市值 = 收盘价 * 成交量</li>
<li>找出市值数据中最大最小值下标(市值最大和最小日期)
<ul>
<li>idxmax() & idxmin()</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#每日市值
day_values = df['close'] * df['volume']
day_values</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>date
2015-01-05 3.385811e+06
2015-01-06 1.736431e+06
2015-01-07 1.485766e+06
2015-01-08 1.055312e+06
2015-01-09 1.334597e+06
...
2023-11-03 5.465598e+07
2023-11-06 4.624949e+07
2023-11-07 3.507469e+07
2023-11-08 2.615865e+07
2023-11-09 2.296461e+07
Length: 2153, dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#找出市值数据中最大最小值下标(市值最大和最小日期)
day_values.idxmax() #求最大元素下标</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>Timestamp('2021-09-27 00:00:00')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>day_values.idxmin() #求最小元素下标</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>Timestamp('2015-02-02 00:00:00')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>输出该股票所有收盘比开盘上涨3%以上的日期
<ul>
<li>(收盘 - 开盘) / 开盘 > 0.03</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>ex = (df['close'] - df['open']) / df['open']> 0.03
ex #想获取所有True对应的索引</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>date
2015-01-05 True
2015-01-06 False
2015-01-07 False
2015-01-08 False
2015-01-09 False
...
2023-11-03 False
2023-11-06 False
2023-11-07 False
2023-11-08 False
2023-11-09 False
Length: 2153, dtype: bool</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#在DataFrame中,可以使用布尔值作为表格的行索引:就会保留True对应的行数据,忽略False对应的行数据
df.loc #取出了True对应的行数据(满足要求的行数据)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
<tr><th>date</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>2015-01-05</th>
<td>24.096</td>
<td>35.823</td>
<td>37.387</td>
<td>23.250</td>
<td>94515.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-15</th>
<td>18.887</td>
<td>20.869</td>
<td>21.169</td>
<td>17.605</td>
<td>48585.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-20</th>
<td>11.732</td>
<td>13.605</td>
<td>15.805</td>
<td>8.987</td>
<td>61022.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-21</th>
<td>13.778</td>
<td>17.496</td>
<td>17.987</td>
<td>12.805</td>
<td>52674.0</td>
<td>600519</td>
</tr>
<tr><th>2015-01-23</th>
<td>15.460</td>
<td>16.278</td>
<td>18.332</td>
<td>15.450</td>
<td>33084.0</td>
<td>600519</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>2022-11-15</th>
<td>1484.179</td>
<td>1540.179</td>
<td>1551.149</td>
<td>1473.179</td>
<td>56318.0</td>
<td>600519</td>
</tr>
<tr><th>2023-01-05</th>
<td>1711.089</td>
<td>1775.089</td>
<td>1775.089</td>
<td>1707.089</td>
<td>47943.0</td>
<td>600519</td>
</tr>
<tr><th>2023-02-20</th>
<td>1795.089</td>
<td>1849.089</td>
<td>1852.889</td>
<td>1791.289</td>
<td>29669.0</td>
<td>600519</td>
</tr>
<tr><th>2023-05-22</th>
<td>1664.099</td>
<td>1720.089</td>
<td>1726.089</td>
<td>1664.089</td>
<td>41284.0</td>
<td>600519</td>
</tr>
<tr><th>2023-07-28</th>
<td>1832.000</td>
<td>1897.000</td>
<td>1900.000</td>
<td>1828.010</td>
<td>39018.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
<p>252 rows × 6 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.loc.index #获取了满足要求行数据的行索引</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>DatetimeIndex(['2015-01-05', '2015-01-15', '2015-01-20', '2015-01-21',
'2015-01-23', '2015-01-26', '2015-02-03', '2015-02-09',
'2015-02-11', '2015-02-16',
...
'2022-06-10', '2022-06-17', '2022-08-31', '2022-11-01',
'2022-11-04', '2022-11-15', '2023-01-05', '2023-02-20',
'2023-05-22', '2023-07-28'],
dtype='datetime64', name='date', length=252, freq=None)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>
<p>假如张三从2015年1月1日开始,每月第一个交易日买入1手股票,每年最后一个交易日卖出所有股票,到今天为止,我的收益如何?</p>
<ul>
<li>分析:
<ul>
<li>买入股票
<ul>
<li>一个完整的年,需要买入12手1200支股票。以购买当期的开盘价进行股票的买卖。</li>
</ul>
</li>
<li>卖出股票
<ul>
<li>一个完整的年,需要卖出1200支股票(收盘价为单价)</li>
</ul>
</li>
<li>特殊情况:
<ul>
<li>最后一年就是一个特殊的年(因为没有到该年最后一个交易日),只可以买不可以卖,但是手里剩余的股票是需要计算到总收益中。</li>
</ul>
</li>
</ul>
</li>
<li><code>resample</code>函数介绍:<code>pandas</code>库中的<code>resample</code>函数主要用于将时间序列数据重新采样到不同的时间频率,例如从按天采样重新采样为按周或按月采样。<code>resample</code>函数的常用语法如下:
<ul>
<li>df.resample(rule, ...).func()</li>
<li>其中,df是一个时间序列数据的DataFrame,rule是指定重采样频率的规则字符串(H小时、W星期、M月、A年等),func是用于聚合数据的函数(例如求和、平均值等)。例如:
<ul>
<li>df.resample('H').mean()</li>
<li>df.resample('W').sum()</li>
<li>df.resample('M').max()</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>分析计算张三买入股票一共花了多少钱?</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#找出每个月的第一个交易日的开盘价
monthly = df.resample('M').first() #取出了每个月第一个交易日对应的行数据
monthly.head(5)
#获取的数据会发现日期是每月最后一天的日期并不是第一个交易日的日期?(无需解决,自身存在的bug),但是行数据是没错</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
<tr><th>date</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>2015-01-31</th>
<td>24.096</td>
<td>35.823</td>
<td>37.387</td>
<td>23.250</td>
<td>94515.0</td>
<td>600519</td>
</tr>
<tr><th>2015-02-28</th>
<td>11.269</td>
<td>10.641</td>
<td>12.078</td>
<td>9.441</td>
<td>33983.0</td>
<td>600519</td>
</tr>
<tr><th>2015-03-31</th>
<td>25.169</td>
<td>25.105</td>
<td>27.896</td>
<td>23.532</td>
<td>31098.0</td>
<td>600519</td>
</tr>
<tr><th>2015-04-30</th>
<td>29.923</td>
<td>29.296</td>
<td>31.314</td>
<td>28.514</td>
<td>76875.0</td>
<td>600519</td>
</tr>
<tr><th>2015-05-31</th>
<td>81.405</td>
<td>83.505</td>
<td>87.169</td>
<td>79.005</td>
<td>54739.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#买入股票的总花费
total_cost = monthly['open'].sum() * 100
total_cost</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>10173374.9</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>卖出股票到手多少钱?</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>yearly = df.resample('A').last() #A表示年
yearly</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>open</th><th>close</th><th>high</th><th>low</th><th>volume</th><th>code</th></tr>
<tr><th>date</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>2015-12-31</th>
<td>73.910</td>
<td>73.880</td>
<td>75.190</td>
<td>73.510</td>
<td>19673.0</td>
<td>600519</td>
</tr>
<tr><th>2016-12-31</th>
<td>188.471</td>
<td>196.011</td>
<td>197.151</td>
<td>188.471</td>
<td>34687.0</td>
<td>600519</td>
</tr>
<tr><th>2017-12-31</th>
<td>586.648</td>
<td>566.138</td>
<td>595.148</td>
<td>560.248</td>
<td>76038.0</td>
<td>600519</td>
</tr>
<tr><th>2018-12-31</th>
<td>442.947</td>
<td>469.657</td>
<td>476.047</td>
<td>439.647</td>
<td>63678.0</td>
<td>600519</td>
</tr>
<tr><th>2019-12-31</th>
<td>1077.186</td>
<td>1077.186</td>
<td>1082.186</td>
<td>1070.696</td>
<td>22588.0</td>
<td>600519</td>
</tr>
<tr><th>2020-12-31</th>
<td>1852.211</td>
<td>1909.211</td>
<td>1910.191</td>
<td>1850.211</td>
<td>38860.0</td>
<td>600519</td>
</tr>
<tr><th>2021-12-31</th>
<td>2000.504</td>
<td>1980.504</td>
<td>2003.484</td>
<td>1958.504</td>
<td>29665.0</td>
<td>600519</td>
</tr>
<tr><th>2022-12-31</th>
<td>1710.089</td>
<td>1701.089</td>
<td>1727.079</td>
<td>1701.089</td>
<td>25333.0</td>
<td>600519</td>
</tr>
<tr><th>2023-12-31</th>
<td>1790.110</td>
<td>1794.110</td>
<td>1799.000</td>
<td>1783.000</td>
<td>12800.0</td>
<td>600519</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>recv = yearly['close'].sum() * 1200
recv</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>11721343.2</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#计算总收益
recv - total_cost</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>1547968.2999999989</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E6%95%B0%E6%8D%AE%E6%B8%85%E6%B4%97">数据清洗¶</h3>
<h4 id="%E6%A6%82%E8%BF%B0">概述¶</h4>
<p>数据清洗是指对原始数据进行处理和转换,以去除无效、重复、缺失或错误的数据,使数据符合分析的要求。</p>
<h4 id="%E4%BD%9C%E7%94%A8%E5%92%8C%E6%84%8F%E4%B9%89">作用和意义¶</h4>
<ul>
<li>提高数据质量:
<ul>
<li>通过数据清洗,数据质量得到提升,减少错误分析和错误决策。</li>
</ul>
</li>
<li>增加数据可用性:
<ul>
<li>清洗后的数据更加规整和易于使用,提高数据的可用性和可读性。</li>
</ul>
</li>
</ul>
<h4 id="%E6%B8%85%E6%B4%97%E7%BB%B4%E5%BA%A6">清洗维度¶</h4>
<ul>
<li>缺失值处理:
<ul>
<li>对于缺失的数据,可以删除包含缺失值的行或列或者填充缺失值。</li>
</ul>
</li>
<li>重复值处理:
<ul>
<li>识别和删除重复的数据行,避免重复数据对分析结果产生误导。</li>
</ul>
</li>
<li>异常值处理:
<ul>
<li>检测和处理异常值,决定是删除、替换或保留异常值。</li>
</ul>
</li>
</ul>
<h4 id="%E7%BC%BA%E5%A4%B1%E5%80%BC%E6%B8%85%E6%B4%97">缺失值清洗¶</h4>
<h5 id="%E7%BC%BA%E5%A4%B1%E5%80%BC/%E7%A9%BA%E5%80%BC%E7%9A%84%E5%88%A0%E9%99%A4">缺失值/空值的删除¶</h5>
<ul>
<li>伪造缺失值数据</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>import pandas as pd
from pandas import DataFrame,Series
df = pd.read_csv('./data/none.csv',index_col=0)
df #NAN就是None空白</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>22</td>
<td>6</td>
<td>44.0</td>
<td>NaN</td>
<td>11</td>
</tr>
<tr><th>1</th>
<td>98</td>
<td>88</td>
<td>20.0</td>
<td>85.0</td>
<td>16</td>
</tr>
<tr><th>2</th>
<td>19</td>
<td>83</td>
<td>NaN</td>
<td>84.0</td>
<td>46</td>
</tr>
<tr><th>3</th>
<td>93</td>
<td>64</td>
<td>76.0</td>
<td>NaN</td>
<td>85</td>
</tr>
<tr><th>4</th>
<td>7</td>
<td>63</td>
<td>20.0</td>
<td>21.0</td>
<td>45</td>
</tr>
<tr><th>5</th>
<td>36</td>
<td>19</td>
<td>36.0</td>
<td>NaN</td>
<td>82</td>
</tr>
<tr><th>6</th>
<td>53</td>
<td>98</td>
<td>7.0</td>
<td>89.0</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>缺失值的检测和删除,相关方法:
<ul>
<li>isnull():检测df中的每一个元素是否为空值,为空则给该元素返回True,否则返回False</li>
<li>notnull():检测df中的每一个元素是否为非空值,为非空则给该元素返回True,否则返回False</li>
<li>any():检测一行或一列布尔值中是否存在一个或多个True,有则返回True,否则返回False</li>
<li>all():检测一行或一列布尔值中是否存全部为True,有则返回True,否则返回False</li>
<li>dropna():将存在缺失值/空值的行或者列进行删除</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#检测哪些列中存在空值
df.isnull()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>True</td>
<td>False</td>
</tr>
<tr><th>1</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
</tr>
<tr><th>2</th>
<td>False</td>
<td>False</td>
<td>True</td>
<td>False</td>
<td>False</td>
</tr>
<tr><th>3</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>True</td>
<td>False</td>
</tr>
<tr><th>4</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
</tr>
<tr><th>5</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>True</td>
<td>False</td>
</tr>
<tr><th>6</th>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
<td>False</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.notnull()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>False</td>
<td>True</td>
</tr>
<tr><th>1</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
</tr>
<tr><th>2</th>
<td>True</td>
<td>True</td>
<td>False</td>
<td>True</td>
<td>True</td>
</tr>
<tr><th>3</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>False</td>
<td>True</td>
</tr>
<tr><th>4</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
</tr>
<tr><th>5</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>False</td>
<td>True</td>
</tr>
<tr><th>6</th>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
<td>True</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#可以判定哪些列中存在空值
df.isnull().any(axis=0)
#axis=0表示针对列进行any操作
#axis=1表示针对行进行any操作</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 False
1 False
2 True
3 True
4 False
dtype: bool</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.notnull().all(axis=0)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 True
1 True
2 False
3 False
4 True
dtype: bool</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>dropna()进行空值检测和过滤</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.dropna() #直接返回删除空值对应行后的结果,不会直接改变原始数据</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>1</th>
<td>98</td>
<td>88</td>
<td>20.0</td>
<td>85.0</td>
<td>16</td>
</tr>
<tr><th>4</th>
<td>7</td>
<td>63</td>
<td>20.0</td>
<td>21.0</td>
<td>45</td>
</tr>
<tr><th>6</th>
<td>53</td>
<td>98</td>
<td>7.0</td>
<td>89.0</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>计算df中每一列存在缺失值的个数和占比</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>for col in df.columns:
#满足该条件则表示第col列中是存在空值
if df.isnull().sum() > 0:
#求出了该列空值的个数
null_count = df.isnull().sum()
#求出该列中空值的占比:空值的数量/列的总元素个数
p = format(null_count / df.size,'.2%')
print(col,null_count,p) </code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt"> </div>
<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>2 1 14.29%
3 3 42.86%</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>缺失值/空值的填充
<ul>
<li>
<ul>
<li>fillna(value,method,axis)</li>
</ul>
</li>
<li>参数介绍:
<ul>
<li>value:给空值填充的值</li>
<li>method:填充方式,可以为bfill向后填充和ffill向前填充</li>
<li>axis:填充轴向</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>使用任意值填充空值</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.fillna(value=666)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>22</td>
<td>6</td>
<td>44.0</td>
<td>666.0</td>
<td>11</td>
</tr>
<tr><th>1</th>
<td>98</td>
<td>88</td>
<td>20.0</td>
<td>85.0</td>
<td>16</td>
</tr>
<tr><th>2</th>
<td>19</td>
<td>83</td>
<td>666.0</td>
<td>84.0</td>
<td>46</td>
</tr>
<tr><th>3</th>
<td>93</td>
<td>64</td>
<td>76.0</td>
<td>666.0</td>
<td>85</td>
</tr>
<tr><th>4</th>
<td>7</td>
<td>63</td>
<td>20.0</td>
<td>21.0</td>
<td>45</td>
</tr>
<tr><th>5</th>
<td>36</td>
<td>19</td>
<td>36.0</td>
<td>666.0</td>
<td>82</td>
</tr>
<tr><th>6</th>
<td>53</td>
<td>98</td>
<td>7.0</td>
<td>89.0</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>使用近邻值填充空值</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.fillna(axis=0,method='ffill').fillna(axis=0,method='bfill') #bfill
#在竖直方向上,会用空前面的值填充空值</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>22</td>
<td>6</td>
<td>44.0</td>
<td>85.0</td>
<td>11</td>
</tr>
<tr><th>1</th>
<td>98</td>
<td>88</td>
<td>20.0</td>
<td>85.0</td>
<td>16</td>
</tr>
<tr><th>2</th>
<td>19</td>
<td>83</td>
<td>20.0</td>
<td>84.0</td>
<td>46</td>
</tr>
<tr><th>3</th>
<td>93</td>
<td>64</td>
<td>76.0</td>
<td>84.0</td>
<td>85</td>
</tr>
<tr><th>4</th>
<td>7</td>
<td>63</td>
<td>20.0</td>
<td>21.0</td>
<td>45</td>
</tr>
<tr><th>5</th>
<td>36</td>
<td>19</td>
<td>36.0</td>
<td>21.0</td>
<td>82</td>
</tr>
<tr><th>6</th>
<td>53</td>
<td>98</td>
<td>7.0</td>
<td>89.0</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>使用相关的统计值填充空值</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#可以使用空值列的均值、中位数等统计指标对空值进行填充
for col in df.columns:
if df.isnull().sum() > 0:
#计算出空值列对应的均值
mean_value = df.mean()
df.fillna(value=mean_value,inplace=True)</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>22</td>
<td>6</td>
<td>44.000000</td>
<td>69.75</td>
<td>11</td>
</tr>
<tr><th>1</th>
<td>98</td>
<td>88</td>
<td>20.000000</td>
<td>85.00</td>
<td>16</td>
</tr>
<tr><th>2</th>
<td>19</td>
<td>83</td>
<td>33.833333</td>
<td>84.00</td>
<td>46</td>
</tr>
<tr><th>3</th>
<td>93</td>
<td>64</td>
<td>76.000000</td>
<td>69.75</td>
<td>85</td>
</tr>
<tr><th>4</th>
<td>7</td>
<td>63</td>
<td>20.000000</td>
<td>21.00</td>
<td>45</td>
</tr>
<tr><th>5</th>
<td>36</td>
<td>19</td>
<td>36.000000</td>
<td>69.75</td>
<td>82</td>
</tr>
<tr><th>6</th>
<td>53</td>
<td>98</td>
<td>7.000000</td>
<td>89.00</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p><strong>注意:实现空值的清洗最好选择删除的方式,如果删除的成本比较高,再选择填充的方式。</strong></p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E9%87%8D%E5%A4%8D%E5%80%BC%E6%B8%85%E6%B4%97">重复值清洗¶</h4>
<ul>
<li>伪造重复行的数据源</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df = pd.read_csv('data/repeat.csv',index_col=0)
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>7</td>
<td>68</td>
<td>20</td>
<td>14</td>
<td>95</td>
</tr>
<tr><th>1</th>
<td>70</td>
<td>85</td>
<td>37</td>
<td>72</td>
<td>86</td>
</tr>
<tr><th>2</th>
<td>79</td>
<td>6</td>
<td>92</td>
<td>24</td>
<td>5</td>
</tr>
<tr><th>3</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr><th>4</th>
<td>56</td>
<td>46</td>
<td>25</td>
<td>14</td>
<td>49</td>
</tr>
<tr><th>5</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr><th>6</th>
<td>66</td>
<td>28</td>
<td>98</td>
<td>14</td>
<td>1</td>
</tr>
<tr><th>7</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>使用duplicated()方法检测重复的行数据</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.duplicated().sum()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>2</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>使用drop_duplicates()方法检测且删除重复的行数据</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df.drop_duplicates()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>0</th><th>1</th><th>2</th><th>3</th><th>4</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>7</td>
<td>68</td>
<td>20</td>
<td>14</td>
<td>95</td>
</tr>
<tr><th>1</th>
<td>70</td>
<td>85</td>
<td>37</td>
<td>72</td>
<td>86</td>
</tr>
<tr><th>2</th>
<td>79</td>
<td>6</td>
<td>92</td>
<td>24</td>
<td>5</td>
</tr>
<tr><th>3</th>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr><th>4</th>
<td>56</td>
<td>46</td>
<td>25</td>
<td>14</td>
<td>49</td>
</tr>
<tr><th>6</th>
<td>66</td>
<td>28</td>
<td>98</td>
<td>14</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E5%BC%82%E5%B8%B8%E5%80%BC%E6%B8%85%E6%B4%97">异常值清洗¶</h4>
<p>异常值是分析师和数据科学家常用的术语,因为它需要密切注意,否则可能导致错误的估计。 简单来说,异常值是一个观察值,远远超出了样本中的整体模式。</p>
<p>异常值在统计学上的全称是疑似异常值,也称作离群点,异常值的分析也称作离群点分析。异常值是指样本中出现的“极端值”,数据值看起来异常大或异常小,其分布明显偏离其余的观测值。异常值分析是检验数据中是否存在不合常理的数据。</p>
<ul>
<li>给定条件的异常数据处理
<ul>
<li>自定义一个1000行3列(A,B,C)取值范围为0-1的数据源,然后将C列中的值大于其两倍标准差的异常值进行清洗</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>data = pd.read_csv('./data/outlier.csv',index_col=0)
data</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0.794514</td>
<td>0.337913</td>
<td>0.299290</td>
</tr>
<tr><th>1</th>
<td>0.596259</td>
<td>0.512930</td>
<td>0.554369</td>
</tr>
<tr><th>2</th>
<td>0.115003</td>
<td>0.401490</td>
<td>0.669573</td>
</tr>
<tr><th>3</th>
<td>0.773007</td>
<td>0.547263</td>
<td>0.780857</td>
</tr>
<tr><th>4</th>
<td>0.469255</td>
<td>0.316957</td>
<td>0.214900</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>995</th>
<td>0.650119</td>
<td>0.042532</td>
<td>0.405112</td>
</tr>
<tr><th>996</th>
<td>0.704271</td>
<td>0.317155</td>
<td>0.779764</td>
</tr>
<tr><th>997</th>
<td>0.138225</td>
<td>0.493625</td>
<td>0.152215</td>
</tr>
<tr><th>998</th>
<td>0.273130</td>
<td>0.763846</td>
<td>0.031242</td>
</tr>
<tr><th>999</th>
<td>0.536671</td>
<td>0.674845</td>
<td>0.004224</td>
</tr>
</tbody>
</table>
<p>1000 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#C列的2倍标准差
twice_std = data['C'].std() * 2
twice_std</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0.5705417437083701</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#判定异常值
ex = data['C'] > twice_std
ex #True表示为异常值,False表示正常值</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>0 False
1 False
2 True
3 True
4 False
...
995 False
996 True
997 False
998 False
999 False
Name: C, Length: 1000, dtype: bool</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>data.loc #取出了True对应的行数据(异常值对应的行数据)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>2</th>
<td>0.115003</td>
<td>0.401490</td>
<td>0.669573</td>
</tr>
<tr><th>3</th>
<td>0.773007</td>
<td>0.547263</td>
<td>0.780857</td>
</tr>
<tr><th>7</th>
<td>0.013230</td>
<td>0.419507</td>
<td>0.960728</td>
</tr>
<tr><th>8</th>
<td>0.858091</td>
<td>0.805964</td>
<td>0.586865</td>
</tr>
<tr><th>10</th>
<td>0.158810</td>
<td>0.095586</td>
<td>0.775476</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>981</th>
<td>0.803646</td>
<td>0.791588</td>
<td>0.782859</td>
</tr>
<tr><th>989</th>
<td>0.534287</td>
<td>0.734984</td>
<td>0.701372</td>
</tr>
<tr><th>991</th>
<td>0.258987</td>
<td>0.039801</td>
<td>0.751450</td>
</tr>
<tr><th>993</th>
<td>0.002957</td>
<td>0.939943</td>
<td>0.673207</td>
</tr>
<tr><th>996</th>
<td>0.704271</td>
<td>0.317155</td>
<td>0.779764</td>
</tr>
</tbody>
</table>
<p>424 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>drop_indexs = data.loc.index #提取了异常值对应行数据的行索引
drop_indexs</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>Int64Index([2, 3, 7, 8,10,11,13,14,20,21,
...
964, 965, 972, 975, 978, 981, 989, 991, 993, 996],
dtype='int64', length=424)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#将异常值对应的行从数据表格中进行删除
data.drop(index=drop_indexs)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0.794514</td>
<td>0.337913</td>
<td>0.299290</td>
</tr>
<tr><th>1</th>
<td>0.596259</td>
<td>0.512930</td>
<td>0.554369</td>
</tr>
<tr><th>4</th>
<td>0.469255</td>
<td>0.316957</td>
<td>0.214900</td>
</tr>
<tr><th>5</th>
<td>0.539357</td>
<td>0.107476</td>
<td>0.187495</td>
</tr>
<tr><th>6</th>
<td>0.385599</td>
<td>0.561930</td>
<td>0.377683</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>994</th>
<td>0.494248</td>
<td>0.558101</td>
<td>0.128541</td>
</tr>
<tr><th>995</th>
<td>0.650119</td>
<td>0.042532</td>
<td>0.405112</td>
</tr>
<tr><th>997</th>
<td>0.138225</td>
<td>0.493625</td>
<td>0.152215</td>
</tr>
<tr><th>998</th>
<td>0.273130</td>
<td>0.763846</td>
<td>0.031242</td>
</tr>
<tr><th>999</th>
<td>0.536671</td>
<td>0.674845</td>
<td>0.004224</td>
</tr>
</tbody>
</table>
<p>576 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="map%E6%98%A0%E5%B0%84">map映射¶</h4>
<ul>
<li>映射就是指给一组数据中的每一个元素绑定一个固定的数据</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df = pd.read_csv('./data/map.csv').drop(columns='Unnamed: 0')
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>name</th><th>salary</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>张三</td>
<td>10000</td>
</tr>
<tr><th>1</th>
<td>李四</td>
<td>15000</td>
</tr>
<tr><th>2</th>
<td>王五</td>
<td>21000</td>
</tr>
<tr><th>3</th>
<td>张三</td>
<td>10000</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#给每个人起一个英文名,将其作为表格中新的一列存在
dic = {
'张三':'Tom',
'李四':'Jerry',
'王五':'Jay'
}#映射关系表
df['ename'] = df['name'].map(dic)
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>name</th><th>salary</th><th>ename</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>张三</td>
<td>10000</td>
<td>Tom</td>
</tr>
<tr><th>1</th>
<td>李四</td>
<td>15000</td>
<td>Jerry</td>
</tr>
<tr><th>2</th>
<td>王五</td>
<td>21000</td>
<td>Jay</td>
</tr>
<tr><th>3</th>
<td>张三</td>
<td>10000</td>
<td>Tom</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="map%E5%85%85%E5%BD%93%E8%BF%90%E7%AE%97%E5%B7%A5%E5%85%B7">map充当运算工具¶</h4>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#将每一个人的税后薪资进行计算:超过5000部分的钱需要缴纳25%的税
def after_sal(s): #参数s就依次表示每一个人的薪资数据
return s - (s-5000)*0.25
df['after_sal'] = df['salary'].map(after_sal)
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>name</th><th>salary</th><th>ename</th><th>after_sal</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>张三</td>
<td>10000</td>
<td>Tom</td>
<td>8750.0</td>
</tr>
<tr><th>1</th>
<td>李四</td>
<td>15000</td>
<td>Jerry</td>
<td>12500.0</td>
</tr>
<tr><th>2</th>
<td>王五</td>
<td>21000</td>
<td>Jay</td>
<td>17000.0</td>
</tr>
<tr><th>3</th>
<td>张三</td>
<td>10000</td>
<td>Tom</td>
<td>8750.0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E6%8E%92%E5%BA%8F">排序¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>data = pd.read_csv('./data/outlier.csv',index_col=0)
data.head()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0.794514</td>
<td>0.337913</td>
<td>0.299290</td>
</tr>
<tr><th>1</th>
<td>0.596259</td>
<td>0.512930</td>
<td>0.554369</td>
</tr>
<tr><th>2</th>
<td>0.115003</td>
<td>0.401490</td>
<td>0.669573</td>
</tr>
<tr><th>3</th>
<td>0.773007</td>
<td>0.547263</td>
<td>0.780857</td>
</tr>
<tr><th>4</th>
<td>0.469255</td>
<td>0.316957</td>
<td>0.214900</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>data.sort_values(by='C') #默认根据C列中的元素从小到大进行排序</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>647</th>
<td>0.102826</td>
<td>0.268895</td>
<td>0.000036</td>
</tr>
<tr><th>521</th>
<td>0.491587</td>
<td>0.767086</td>
<td>0.000680</td>
</tr>
<tr><th>599</th>
<td>0.560323</td>
<td>0.884960</td>
<td>0.001386</td>
</tr>
<tr><th>17</th>
<td>0.475333</td>
<td>0.968809</td>
<td>0.002639</td>
</tr>
<tr><th>717</th>
<td>0.561099</td>
<td>0.596751</td>
<td>0.002810</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>913</th>
<td>0.575918</td>
<td>0.155275</td>
<td>0.995703</td>
</tr>
<tr><th>91</th>
<td>0.914415</td>
<td>0.738960</td>
<td>0.996564</td>
</tr>
<tr><th>273</th>
<td>0.746750</td>
<td>0.470466</td>
<td>0.996640</td>
</tr>
<tr><th>67</th>
<td>0.803291</td>
<td>0.959692</td>
<td>0.996780</td>
</tr>
<tr><th>329</th>
<td>0.728317</td>
<td>0.810622</td>
<td>0.998517</td>
</tr>
</tbody>
</table>
<p>1000 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>data.sort_values(by='C',ascending=False)#从大到小排序</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>A</th><th>B</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>329</th>
<td>0.728317</td>
<td>0.810622</td>
<td>0.998517</td>
</tr>
<tr><th>67</th>
<td>0.803291</td>
<td>0.959692</td>
<td>0.996780</td>
</tr>
<tr><th>273</th>
<td>0.746750</td>
<td>0.470466</td>
<td>0.996640</td>
</tr>
<tr><th>91</th>
<td>0.914415</td>
<td>0.738960</td>
<td>0.996564</td>
</tr>
<tr><th>913</th>
<td>0.575918</td>
<td>0.155275</td>
<td>0.995703</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>717</th>
<td>0.561099</td>
<td>0.596751</td>
<td>0.002810</td>
</tr>
<tr><th>17</th>
<td>0.475333</td>
<td>0.968809</td>
<td>0.002639</td>
</tr>
<tr><th>599</th>
<td>0.560323</td>
<td>0.884960</td>
<td>0.001386</td>
</tr>
<tr><th>521</th>
<td>0.491587</td>
<td>0.767086</td>
<td>0.000680</td>
</tr>
<tr><th>647</th>
<td>0.102826</td>
<td>0.268895</td>
<td>0.000036</td>
</tr>
</tbody>
</table>
<p>1000 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#axis=0表示的行,axis=1表示的是列
data.sort_index(axis=1,ascending=False)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>C</th><th>B</th><th>A</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0.299290</td>
<td>0.337913</td>
<td>0.794514</td>
</tr>
<tr><th>1</th>
<td>0.554369</td>
<td>0.512930</td>
<td>0.596259</td>
</tr>
<tr><th>2</th>
<td>0.669573</td>
<td>0.401490</td>
<td>0.115003</td>
</tr>
<tr><th>3</th>
<td>0.780857</td>
<td>0.547263</td>
<td>0.773007</td>
</tr>
<tr><th>4</th>
<td>0.214900</td>
<td>0.316957</td>
<td>0.469255</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>995</th>
<td>0.405112</td>
<td>0.042532</td>
<td>0.650119</td>
</tr>
<tr><th>996</th>
<td>0.779764</td>
<td>0.317155</td>
<td>0.704271</td>
</tr>
<tr><th>997</th>
<td>0.152215</td>
<td>0.493625</td>
<td>0.138225</td>
</tr>
<tr><th>998</th>
<td>0.031242</td>
<td>0.763846</td>
<td>0.273130</td>
</tr>
<tr><th>999</th>
<td>0.004224</td>
<td>0.674845</td>
<td>0.536671</td>
</tr>
</tbody>
</table>
<p>1000 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#手动对列索引进行排列,此处indices表示排列的结果(只能用隐式索引)
#axis=0表示的行,axis=1表示的是列
data.take(indices=,axis=1)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>B</th><th>A</th><th>C</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>0.337913</td>
<td>0.794514</td>
<td>0.299290</td>
</tr>
<tr><th>1</th>
<td>0.512930</td>
<td>0.596259</td>
<td>0.554369</td>
</tr>
<tr><th>2</th>
<td>0.401490</td>
<td>0.115003</td>
<td>0.669573</td>
</tr>
<tr><th>3</th>
<td>0.547263</td>
<td>0.773007</td>
<td>0.780857</td>
</tr>
<tr><th>4</th>
<td>0.316957</td>
<td>0.469255</td>
<td>0.214900</td>
</tr>
<tr><th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr><th>995</th>
<td>0.042532</td>
<td>0.650119</td>
<td>0.405112</td>
</tr>
<tr><th>996</th>
<td>0.317155</td>
<td>0.704271</td>
<td>0.779764</td>
</tr>
<tr><th>997</th>
<td>0.493625</td>
<td>0.138225</td>
<td>0.152215</td>
</tr>
<tr><th>998</th>
<td>0.763846</td>
<td>0.273130</td>
<td>0.031242</td>
</tr>
<tr><th>999</th>
<td>0.674845</td>
<td>0.536671</td>
<td>0.004224</td>
</tr>
</tbody>
</table>
<p>1000 rows × 3 columns</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="map%E8%BF%90%E7%AE%97">map运算¶</h4>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#计算下面表格中每个人的税后薪资:超过3000部分的钱缴纳50%的税,计算每个人的税后薪资
#加载数据
df = pd.read_csv('./data/fruits.csv').drop(columns='Unnamed: 0')
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>item</th><th>price</th><th>color</th><th>weight</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Apple</td>
<td>4.0</td>
<td>red</td>
<td>12</td>
</tr>
<tr><th>1</th>
<td>Banana</td>
<td>3.0</td>
<td>yellow</td>
<td>20</td>
</tr>
<tr><th>2</th>
<td>Orange</td>
<td>3.0</td>
<td>yellow</td>
<td>50</td>
</tr>
<tr><th>3</th>
<td>Banana</td>
<td>2.5</td>
<td>green</td>
<td>30</td>
</tr>
<tr><th>4</th>
<td>Orange</td>
<td>4.0</td>
<td>green</td>
<td>20</td>
</tr>
<tr><th>5</th>
<td>Apple</td>
<td>2.0</td>
<td>green</td>
<td>44</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E5%88%86%E7%BB%84%E8%81%9A%E5%90%88">分组聚合¶</h4>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>数据分类处理的核心:
<ul>
<li>groupby()函数</li>
<li>groups属性查看分组情况</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#加载数据
df = pd.read_csv('./data/fruits.csv').drop(columns='Unnamed: 0')
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>item</th><th>price</th><th>color</th><th>weight</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Apple</td>
<td>4.0</td>
<td>red</td>
<td>12</td>
</tr>
<tr><th>1</th>
<td>Banana</td>
<td>3.0</td>
<td>yellow</td>
<td>20</td>
</tr>
<tr><th>2</th>
<td>Orange</td>
<td>3.0</td>
<td>yellow</td>
<td>50</td>
</tr>
<tr><th>3</th>
<td>Banana</td>
<td>2.5</td>
<td>green</td>
<td>30</td>
</tr>
<tr><th>4</th>
<td>Orange</td>
<td>4.0</td>
<td>green</td>
<td>20</td>
</tr>
<tr><th>5</th>
<td>Apple</td>
<td>2.0</td>
<td>green</td>
<td>44</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#想根据不同水果种类对数据进行分组
df.groupby(by='item').groups #使用groupby分组后,调用groups查看分组的结果</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>{'Apple': , 'Banana': , 'Orange': }</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#计算不同水果的平均价格
df.groupby(by='item')['price'] #单独取出每组数据的价格数据
mean_price = df.groupby(by='item')['price'].mean()
mean_price</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>item
Apple 3.00
Banana 2.75
Orange 3.50
Name: price, dtype: float64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>mean_price.to_dict()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>{'Apple': 3.0, 'Banana': 2.75, 'Orange': 3.5}</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#将每种水果的平均价格汇总到原始表格中
dic = {
'Apple':3.00,
'Banana':2.75,
'Orange':3.50
}
#dic = mean_price.to_dict()
df['mean_price'] = df['item'].map(dic)
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>item</th><th>price</th><th>color</th><th>weight</th><th>mean_price</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Apple</td>
<td>4.0</td>
<td>red</td>
<td>12</td>
<td>3.00</td>
</tr>
<tr><th>1</th>
<td>Banana</td>
<td>3.0</td>
<td>yellow</td>
<td>20</td>
<td>2.75</td>
</tr>
<tr><th>2</th>
<td>Orange</td>
<td>3.0</td>
<td>yellow</td>
<td>50</td>
<td>3.50</td>
</tr>
<tr><th>3</th>
<td>Banana</td>
<td>2.5</td>
<td>green</td>
<td>30</td>
<td>2.75</td>
</tr>
<tr><th>4</th>
<td>Orange</td>
<td>4.0</td>
<td>green</td>
<td>20</td>
<td>3.50</td>
</tr>
<tr><th>5</th>
<td>Apple</td>
<td>2.0</td>
<td>green</td>
<td>44</td>
<td>3.00</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#计算不同颜色水果的最大重量
color_max_weight = df.groupby(by='color')['weight'].max()
color_max_weight</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedText jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/plain">
<pre class="highlighter-hljs"><code>color
green 44
red 12
yellow 50
Name: weight, dtype: int64</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df['max_weight'] = df['color'].map(color_max_weight.to_dict())
df</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>item</th><th>price</th><th>color</th><th>weight</th><th>mean_price</th><th>max_weight</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Apple</td>
<td>4.0</td>
<td>red</td>
<td>12</td>
<td>3.00</td>
<td>12</td>
</tr>
<tr><th>1</th>
<td>Banana</td>
<td>3.0</td>
<td>yellow</td>
<td>20</td>
<td>2.75</td>
<td>50</td>
</tr>
<tr><th>2</th>
<td>Orange</td>
<td>3.0</td>
<td>yellow</td>
<td>50</td>
<td>3.50</td>
<td>50</td>
</tr>
<tr><th>3</th>
<td>Banana</td>
<td>2.5</td>
<td>green</td>
<td>30</td>
<td>2.75</td>
<td>44</td>
</tr>
<tr><th>4</th>
<td>Orange</td>
<td>4.0</td>
<td>green</td>
<td>20</td>
<td>3.50</td>
<td>44</td>
</tr>
<tr><th>5</th>
<td>Apple</td>
<td>2.0</td>
<td>green</td>
<td>44</td>
<td>3.00</td>
<td>44</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>对分组后的结果进行多种不同形式的聚合操作</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#求每种水果的平均价格和最高价格、最低价格
df.groupby(by='item')['price'].agg(['mean','max','min'])</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>mean</th><th>max</th><th>min</th></tr>
<tr><th>item</th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>Apple</th>
<td>3.00</td>
<td>4.0</td>
<td>2.0</td>
</tr>
<tr><th>Banana</th>
<td>2.75</td>
<td>3.0</td>
<td>2.5</td>
</tr>
<tr><th>Orange</th>
<td>3.50</td>
<td>4.0</td>
<td>3.0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E9%80%8F%E8%A7%86%E8%A1%A8">透视表¶</h4>
<p>透视表是一种可以对数据动态排布并且分类汇总的表格格式。或许大多数人都在Excel使用过数据透视表,也体会到它的强大功能,而在pandas中它被称作pivot_table。</p>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df = pd.read_csv('./data/透视表-篮球赛.csv')
df.head(3)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>对手</th><th>胜负</th><th>主客场</th><th>命中</th><th>投篮数</th><th>投篮命中率</th><th>3分命中率</th><th>篮板</th><th>助攻</th><th>得分</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>勇士</td>
<td>胜</td>
<td>客</td>
<td>10</td>
<td>23</td>
<td>0.435</td>
<td>0.444</td>
<td>6</td>
<td>11</td>
<td>27</td>
</tr>
<tr><th>1</th>
<td>国王</td>
<td>胜</td>
<td>客</td>
<td>8</td>
<td>21</td>
<td>0.381</td>
<td>0.286</td>
<td>3</td>
<td>9</td>
<td>27</td>
</tr>
<tr><th>2</th>
<td>小牛</td>
<td>胜</td>
<td>主</td>
<td>10</td>
<td>19</td>
<td>0.526</td>
<td>0.462</td>
<td>3</td>
<td>7</td>
<td>29</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#根据胜负字段进行数据的分组,然后对每组数据进行均值计算
df.pivot_table(index='对手',aggfunc='mean')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>3分命中率</th><th>助攻</th><th>命中</th><th>得分</th><th>投篮命中率</th><th>投篮数</th><th>篮板</th></tr>
<tr><th>对手</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>76人</th>
<td>0.33950</td>
<td>10.00</td>
<td>9.0</td>
<td>28.00</td>
<td>0.4405</td>
<td>20.5</td>
<td>3.5</td>
</tr>
<tr><th>勇士</th>
<td>0.44400</td>
<td>11.00</td>
<td>10.0</td>
<td>27.00</td>
<td>0.4350</td>
<td>23.0</td>
<td>6.0</td>
</tr>
<tr><th>国王</th>
<td>0.28600</td>
<td>9.00</td>
<td>8.0</td>
<td>27.00</td>
<td>0.3810</td>
<td>21.0</td>
<td>3.0</td>
</tr>
<tr><th>太阳</th>
<td>0.54500</td>
<td>7.00</td>
<td>12.0</td>
<td>48.00</td>
<td>0.5450</td>
<td>22.0</td>
<td>2.0</td>
</tr>
<tr><th>小牛</th>
<td>0.46200</td>
<td>7.00</td>
<td>10.0</td>
<td>29.00</td>
<td>0.5260</td>
<td>19.0</td>
<td>3.0</td>
</tr>
<tr><th>尼克斯</th>
<td>0.36900</td>
<td>9.50</td>
<td>10.5</td>
<td>34.00</td>
<td>0.4175</td>
<td>25.0</td>
<td>3.5</td>
</tr>
<tr><th>开拓者</th>
<td>0.57100</td>
<td>3.00</td>
<td>16.0</td>
<td>48.00</td>
<td>0.5520</td>
<td>29.0</td>
<td>8.0</td>
</tr>
<tr><th>掘金</th>
<td>0.14300</td>
<td>9.00</td>
<td>6.0</td>
<td>21.00</td>
<td>0.3750</td>
<td>16.0</td>
<td>8.0</td>
</tr>
<tr><th>步行者</th>
<td>0.29150</td>
<td>12.50</td>
<td>8.5</td>
<td>27.50</td>
<td>0.3965</td>
<td>21.5</td>
<td>6.5</td>
</tr>
<tr><th>湖人</th>
<td>0.44400</td>
<td>9.00</td>
<td>13.0</td>
<td>36.00</td>
<td>0.5910</td>
<td>22.0</td>
<td>4.0</td>
</tr>
<tr><th>灰熊</th>
<td>0.35025</td>
<td>7.75</td>
<td>8.5</td>
<td>27.25</td>
<td>0.4015</td>
<td>21.0</td>
<td>4.5</td>
</tr>
<tr><th>爵士</th>
<td>0.60400</td>
<td>8.00</td>
<td>13.5</td>
<td>42.50</td>
<td>0.5905</td>
<td>22.0</td>
<td>3.5</td>
</tr>
<tr><th>猛龙</th>
<td>0.27300</td>
<td>11.00</td>
<td>8.0</td>
<td>38.00</td>
<td>0.3200</td>
<td>25.0</td>
<td>6.0</td>
</tr>
<tr><th>篮网</th>
<td>0.61500</td>
<td>8.00</td>
<td>13.0</td>
<td>37.00</td>
<td>0.6500</td>
<td>20.0</td>
<td>10.0</td>
</tr>
<tr><th>老鹰</th>
<td>0.54500</td>
<td>11.00</td>
<td>8.0</td>
<td>29.00</td>
<td>0.5330</td>
<td>15.0</td>
<td>3.0</td>
</tr>
<tr><th>骑士</th>
<td>0.42900</td>
<td>13.00</td>
<td>8.0</td>
<td>35.00</td>
<td>0.3810</td>
<td>21.0</td>
<td>11.0</td>
</tr>
<tr><th>鹈鹕</th>
<td>0.40000</td>
<td>17.00</td>
<td>8.0</td>
<td>26.00</td>
<td>0.5000</td>
<td>16.0</td>
<td>1.0</td>
</tr>
<tr><th>黄蜂</th>
<td>0.40000</td>
<td>11.00</td>
<td>8.0</td>
<td>27.00</td>
<td>0.4440</td>
<td>18.0</td>
<td>10.0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#根据胜负字段进行数据的分组,对分组中的篮板和得分两个字段进行求和运算
df.pivot_table(index='胜负',values=['篮板','得分'],aggfunc='sum')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>得分</th><th>篮板</th></tr>
<tr><th>胜负</th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>胜</th>
<td>692</td>
<td>108</td>
</tr>
<tr><th>负</th>
<td>109</td>
<td>19</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#根据主客场字段进行数据分类后,对分类后的得分字段求最大值、篮板字段求均值和助攻字段求累加和操作
df.pivot_table(index='主客场',aggfunc={'得分':'max','篮板':'mean','助攻':'sum'})</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>助攻</th><th>得分</th><th>篮板</th></tr>
<tr><th>主客场</th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>主</th>
<td>121</td>
<td>56</td>
<td>5.333333</td>
</tr>
<tr><th>客</th>
<td>116</td>
<td>48</td>
<td>4.846154</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#获取所有队主客场的总得分
df.pivot_table(index='主客场',values='得分',aggfunc='sum')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>得分</th></tr>
<tr><th>主客场</th><th> </th></tr>
</thead>
<tbody>
<tr><th>主</th>
<td>397</td>
</tr>
<tr><th>客</th>
<td>404</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看主客场下的总得分都是哪些具体球队的得分构成的
df.pivot_table(index='主客场',values='得分',aggfunc='sum',columns='对手')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th>对手</th><th>76人</th><th>勇士</th><th>国王</th><th>太阳</th><th>小牛</th><th>尼克斯</th><th>开拓者</th><th>掘金</th><th>步行者</th><th>湖人</th><th>灰熊</th><th>爵士</th><th>猛龙</th><th>篮网</th><th>老鹰</th><th>骑士</th><th>鹈鹕</th><th>黄蜂</th></tr>
<tr><th>主客场</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>主</th>
<td>29.0</td>
<td>NaN</td>
<td>NaN</td>
<td>NaN</td>
<td>29.0</td>
<td>37.0</td>
<td>NaN</td>
<td>21.0</td>
<td>29.0</td>
<td>NaN</td>
<td>60.0</td>
<td>56.0</td>
<td>38.0</td>
<td>37.0</td>
<td>NaN</td>
<td>35.0</td>
<td>26.0</td>
<td>NaN</td>
</tr>
<tr><th>客</th>
<td>27.0</td>
<td>27.0</td>
<td>27.0</td>
<td>48.0</td>
<td>NaN</td>
<td>31.0</td>
<td>48.0</td>
<td>NaN</td>
<td>26.0</td>
<td>36.0</td>
<td>49.0</td>
<td>29.0</td>
<td>NaN</td>
<td>NaN</td>
<td>29.0</td>
<td>NaN</td>
<td>NaN</td>
<td>27.0</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看主客场下的总得分都是哪些具体球队的得分构成的
df.pivot_table(index='主客场',values='得分',aggfunc='sum',columns='对手',fill_value=0)</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th>对手</th><th>76人</th><th>勇士</th><th>国王</th><th>太阳</th><th>小牛</th><th>尼克斯</th><th>开拓者</th><th>掘金</th><th>步行者</th><th>湖人</th><th>灰熊</th><th>爵士</th><th>猛龙</th><th>篮网</th><th>老鹰</th><th>骑士</th><th>鹈鹕</th><th>黄蜂</th></tr>
<tr><th>主客场</th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th><th> </th></tr>
</thead>
<tbody>
<tr><th>主</th>
<td>29</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>29</td>
<td>37</td>
<td>0</td>
<td>21</td>
<td>29</td>
<td>0</td>
<td>60</td>
<td>56</td>
<td>38</td>
<td>37</td>
<td>0</td>
<td>35</td>
<td>26</td>
<td>0</td>
</tr>
<tr><th>客</th>
<td>27</td>
<td>27</td>
<td>27</td>
<td>48</td>
<td>0</td>
<td>31</td>
<td>48</td>
<td>0</td>
<td>26</td>
<td>36</td>
<td>49</td>
<td>29</td>
<td>0</td>
<td>0</td>
<td>29</td>
<td>0</td>
<td>0</td>
<td>27</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#多条件分类汇总操作
df.pivot_table(index=['主客场','对手'],values='得分',aggfunc='sum')</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th> </th><th>得分</th></tr>
<tr><th>主客场</th><th>对手</th><th> </th></tr>
</thead>
<tbody>
<tr><th rowspan="11" valign="top">主</th><th>76人</th>
<td>29</td>
</tr>
<tr><th>小牛</th>
<td>29</td>
</tr>
<tr><th>尼克斯</th>
<td>37</td>
</tr>
<tr><th>掘金</th>
<td>21</td>
</tr>
<tr><th>步行者</th>
<td>29</td>
</tr>
<tr><th>灰熊</th>
<td>60</td>
</tr>
<tr><th>爵士</th>
<td>56</td>
</tr>
<tr><th>猛龙</th>
<td>38</td>
</tr>
<tr><th>篮网</th>
<td>37</td>
</tr>
<tr><th>骑士</th>
<td>35</td>
</tr>
<tr><th>鹈鹕</th>
<td>26</td>
</tr>
<tr><th rowspan="12" valign="top">客</th><th>76人</th>
<td>27</td>
</tr>
<tr><th>勇士</th>
<td>27</td>
</tr>
<tr><th>国王</th>
<td>27</td>
</tr>
<tr><th>太阳</th>
<td>48</td>
</tr>
<tr><th>尼克斯</th>
<td>31</td>
</tr>
<tr><th>开拓者</th>
<td>48</td>
</tr>
<tr><th>步行者</th>
<td>26</td>
</tr>
<tr><th>湖人</th>
<td>36</td>
</tr>
<tr><th>灰熊</th>
<td>49</td>
</tr>
<tr><th>爵士</th>
<td>29</td>
</tr>
<tr><th>老鹰</th>
<td>29</td>
</tr>
<tr><th>黄蜂</th>
<td>27</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>快捷键:
<ul>
<li>增加cell:a,b</li>
<li>删除cell:x</li>
<li>运行cell:shift+enter</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="%E4%BB%8A%E6%97%A5%E9%87%8D%E7%82%B9%EF%BC%9A%E6%95%B0%E6%8D%AE%E6%B8%85%E6%B4%97%E3%80%81map%E6%98%A0%E5%B0%84%E5%92%8Cmap%E5%85%85%E5%BD%93%E8%BF%90%E7%AE%97%E5%B7%A5%E5%85%B7%E3%80%81groupby%E5%88%86%E7%BB%84%E8%81%9A%E5%90%88%E3%80%81pivot_table%E9%80%8F%E8%A7%86">今日重点:数据清洗、map映射和map充当运算工具、groupby分组聚合、pivot_table透视¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In [ ]:</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>pdf有表格pandas+。。。</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h4 id="%E6%89%8B%E6%9C%BA%E9%94%80%E9%87%8F%E5%88%86%E6%9E%90%E6%A1%88%E4%BE%8B">手机销量分析案例¶</h4>
<ul>
<li>巩固分组聚合操作</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#加载数据
import pandas as pd
data = pd.read_excel('./data/Phone.xlsx')</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#缺失值处理</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看不同品牌手机的累计销量和累计销售额,且对累计销量进行降序</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看不同月份的销量情况,哪些月份销量比较高</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#不同年龄段的购买力</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看不同城市的购买力情况</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看不同品牌的不同型号的最高和最低价格是多少</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<p><strong>美国大选政治现金分析:</strong></p>
<ul>
<li>加载数据</li>
<li>查看数据的基本信息</li>
<li>指定数据截取,将如下字段的数据进行提取,其他数据舍弃
<ul>
<li>cand_nm :候选人姓名</li>
<li>contbr_nm : 捐赠人姓名</li>
<li>contbr_st :捐赠人所在州</li>
<li>contbr_employer : 捐赠人所在公司</li>
<li>contbr_occupation : 捐赠人职业</li>
<li>contb_receipt_amt :捐赠数额(美元)</li>
<li>contb_receipt_dt : 捐款的日期</li>
</ul>
</li>
<li>对新数据进行总览,查看是否存在缺失数据</li>
<li>用统计学指标快速描述数值型属性的概要。</li>
<li>空值处理。可能因为忘记填写或者保密等等原因,相关字段出现了空值,将其填充为NOT PROVIDE</li>
<li>异常值处理。将捐款金额<=0的数据删除</li>
<li>新建一列为各个候选人所在党派party</li>
<li>查看party这一列中有哪些不同的元素</li>
<li>统计party列中各个元素出现次数</li>
<li>查看各个党派收到的政治献金总数contb_receipt_amt</li>
<li>查看具体每天各个党派收到的政治献金总数contb_receipt_amt</li>
<li>将表中日期格式转换为'yyyy-mm-dd'。</li>
<li>查看老兵(捐献者职业)DISABLED VETERAN主要支持谁</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>parties = {
'Bachmann, Michelle': 'Republican',
'Romney, Mitt': 'Republican',
'Obama, Barack': 'Democrat',
"Roemer, Charles E. 'Buddy' III": 'Reform',
'Pawlenty, Timothy': 'Republican',
'Johnson, Gary Earl': 'Libertarian',
'Paul, Ron': 'Republican',
'Santorum, Rick': 'Republican',
'Cain, Herman': 'Republican',
'Gingrich, Newt': 'Republican',
'McCotter, Thaddeus G': 'Republican',
'Huntsman, Jon': 'Republican',
'Perry, Rick': 'Republican'
}
months = {'JAN' : 1, 'FEB' : 2, 'MAR' : 3, 'APR' : 4, 'MAY' : 5, 'JUN' : 6,
'JUL' : 7, 'AUG' : 8, 'SEP' : 9, 'OCT': 10, 'NOV': 11, 'DEC' : 12}</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#加载数据:usa_election.txt
df = pd.read_csv('./data/usa_election.txt').drop(columns='Unnamed: 0')
df.head()</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell-outputWrapper">
<div class="jp-Collapser jp-OutputCollapser jp-Cell-outputCollapser"> </div>
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-OutputPrompt jp-OutputArea-prompt">Out:</div>
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"> </div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell ">
<div class="jp-Cell-outputWrapper">
<div class="jp-OutputArea jp-Cell-outputArea">
<div class="jp-OutputArea-child">
<div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html">
<div>
<table class="dataframe" border="1">
<thead>
<tr style="text-align: right"><th> </th><th>cand_nm</th><th>contbr_nm</th><th>contbr_st</th><th>contbr_employer</th><th>contbr_occupation</th><th>contb_receipt_amt</th><th>contb_receipt_dt</th></tr>
</thead>
<tbody>
<tr><th>0</th>
<td>Bachmann, Michelle</td>
<td>HARVEY, WILLIAM</td>
<td>AL</td>
<td>RETIRED</td>
<td>RETIRED</td>
<td>250.0</td>
<td>20-JUN-11</td>
</tr>
<tr><th>1</th>
<td>Bachmann, Michelle</td>
<td>HARVEY, WILLIAM</td>
<td>AL</td>
<td>RETIRED</td>
<td>RETIRED</td>
<td>50.0</td>
<td>23-JUN-11</td>
</tr>
<tr><th>2</th>
<td>Bachmann, Michelle</td>
<td>SMITH, LANIER</td>
<td>AL</td>
<td>INFORMATION REQUESTED</td>
<td>INFORMATION REQUESTED</td>
<td>250.0</td>
<td>05-JUL-11</td>
</tr>
<tr><th>3</th>
<td>Bachmann, Michelle</td>
<td>BLEVINS, DARONDA</td>
<td>AR</td>
<td>NONE</td>
<td>RETIRED</td>
<td>250.0</td>
<td>01-AUG-11</td>
</tr>
<tr><th>4</th>
<td>Bachmann, Michelle</td>
<td>WARDENBURG, HAROLD</td>
<td>AR</td>
<td>NONE</td>
<td>RETIRED</td>
<td>300.0</td>
<td>20-JUN-11</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<h3 id="Matplotlib%E7%BB%98%E5%9B%BE%E6%93%8D%E4%BD%9C">Matplotlib绘图操作¶</h3>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>在数据分析中画图的意义:
<ul>
<li>1.可以将相关的分析结论进行可视化的展示</li>
<li>2.通过画图的操作对相关的数据进行探索</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['PingFang HK'] #mac系统使用
# plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']# windows使用设置微软雅黑字体</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>线形图
<ul>
<li>用于显示数据随变量的变化趋势</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>df = pd.read_csv('data/CD_Sale.csv').drop(columns='Unnamed: 0')</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#所有用户每月的消费总次数</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#统计每月的消费人数</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>散点图
<ul>
<li>用于展示两个数值变量之间的关系</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看每个用户的总消费金额和总消费产品数量之间的关系</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>直方图
<ul>
<li>用于展示一组数据密度的分布情况。它将数据划分为多个相邻的区间,并计算每个区间内数据的频数或频率,然后将这些频数或频率表示为纵向的矩形条,从而形成一个条形图。</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#查看每个用户消费次数的分布情况</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>柱状图:
<ul>
<li>用于展示不同类别或组的数据之间的比较和关系</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>GDP =
city = ['Beijing','Shanghai','Tianjin','Chongqing']</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt"> </div>
<div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
<ul>
<li>饼图
<ul>
<li>饼图可以显示不同类别或组成部分在整体中的占比,用于传达数据的分布和比例关系。</li>
</ul>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="jp-Cell jp-CodeCell jp-Notebook-cell jp-mod-noOutputs">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser"> </div>
<div class="jp-InputArea jp-Cell-inputArea">
<div class="jp-InputPrompt jp-InputArea-prompt">In :</div>
<div class="jp-CodeMirrorEditor jp-Editor jp-InputArea-editor" data-type="inline">
<div class="CodeMirror cm-s-jupyter">
<div class=" highlight hl-ipython3">
<pre class="highlighter-hljs"><code>#简单的饼图
education =
labels = ['小学', '初中', '高中', '大学', '研究生及以上']</code></pre>
</div>
</div>
</div>
</div>
</div>
</div><br><br>
来源:https://www.cnblogs.com/fuminer/p/18823565
頁:
[1]