这里包含了 Python Data Science Handbook 的中文译本,当然也遵循了其采用 Jupyter notbook 的形式。在翻译的过程中,译者自感水平有限,为避免误导读者,在中文译文下保留了英文原文,中文内容仅仅辅助加速阅读速度。翻译过程中在尽量遵循原著的文笔的前提下,采用了意译。每翻译完成一个章节都会将下面的目录更改成标题,当然有些标题实在是不适合翻译为中文,就保留了英文的形式。
本书编写和测试的环境是 Python 3.5,作者提及在 Python 2.7 这样的环境运行也问题不大,作为译者我就在 Python 2.7 对此进行了检查并对一些不适用于 Python 2.7 的代码做了更改(当然,都有明确的批注)。
本书详细的介绍了目前在 Python 中进行数据分析、数据处理、机器学习相关的几个重要的库的使用,其中包含:IPython,NumPy,Pandas,Matplotlib, Scikit-Learn以及其他相关库。阅读本书的前提是对 Python 语言有一定的了解,如果你需要语言的大概的介绍,可以去看A Whirlwind Tour of Python:它是一本面向研究人员和科学家的 Python 入门书籍。
以下的目录指向 nbviewer:
- 在 IPython 中使用帮助与文档
- IPython 中的快捷键
- IPython 中的魔法命令
- 输入输出历史
- 在 IPython 中使用 shell 命令
- 错误与调试
- 性能分析
- 更多 IPython 资源
- 理解 Python 的数据类型
- NumPy 数组基础
- NumPy 数组的计算方式:Universal Functions
- 聚合:最大值,最小值以及其他
- 矩阵计算:广播
- 比较、掩码、布尔运算
- Fancy Indexing
- 排序
- 结构化数据:NumPy 的结构化数组
- Pandas 中的对象
- Data Indexing and Selection
- Operating on Data in Pandas
- Handling Missing Data
- Hierarchical Indexing
- Combining Datasets: Concat and Append
- Combining Datasets: Merge and Join
- Aggregation and Grouping
- Pivot Tables
- Vectorized String Operations
- Working with Time Series
- High-Performance Pandas: eval() and query()
- Further Resources
- Simple Line Plots
- Simple Scatter Plots
- Visualizing Errors
- Density and Contour Plots
- Histograms, Binnings, and Density
- Customizing Plot Legends
- Customizing Colorbars
- Multiple Subplots
- Text and Annotation
- Customizing Ticks
- Customizing Matplotlib: Configurations and Stylesheets
- Three-Dimensional Plotting in Matplotlib
- Geographic Data with Basemap
- Visualization with Seaborn
- Further Resources
- What Is Machine Learning?
- Introducing Scikit-Learn
- Hyperparameters and Model Validation
- Feature Engineering
- In-Depth: Naive Bayes Classification
- In-Depth: Linear Regression
- In-Depth: Support Vector Machines
- In-Depth: Decision Trees and Random Forests
- In-Depth: Principal Component Analysis
- In-Depth: Manifold Learning
- In-Depth: k-Means Clustering
- In-Depth: Gaussian Mixture Models
- In-Depth: Kernel Density Estimation
- Application: A Face Detection Pipeline
- Further Machine Learning Resources
The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.
The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:
$ conda install --file requirements.txt
To create a stand-alone environment named PDSH
with Python 3.5 and all the required package versions, run the following:
$ conda create -n PDSH python=3.5 --file requirements.txt
You can read more about using conda environments in the Managing Environments section of the conda documentation.
The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.
The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.