Notebook

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

< Errors and Debugging | Contents | More IPython Resources >

性能分析¶

Profiling and Timing Code¶

在进行数据处理开发的时候，通常要在各种实现之间进行权衡。在算法开发的早期，考虑太多可能会适得其反。正如唐纳德·克努特（Donald Knuth）所说的那样，“过早优化是万恶之源，我们应该忽略那些占 97% 的无效优化。“

In the process of developing code and creating data processing pipelines, there are often trade-offs you can make between various implementations. Early in developing your algorithm, it can be counterproductive to worry about such things. As Donald Knuth famously quipped, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

如果你完成了编码的工作，那就可以进行一些优化的工作了。有时候需要检查给定命令或命令集的执行时间; 而有时需要查看复杂的多线过程并确定瓶颈在哪里。IPython 为各种情况提供了计时和性能分析的功能。在这里，我们将讨论以下 IPython 魔法命令：

But once you have your code working, it can be useful to dig into its efficiency a bit. Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations. IPython provides access to a wide array of functionality for this kind of timing and profiling of code. Here we'll discuss the following IPython magic commands:

%time：计算单个语句的执行时间
%timeit：对单个语句重复计时以获得更高的精度
%prun：使用分析器运行代码
%lprun：使用逐行分析器运行代码
%memit：测量单个语句的内存使用
%mprun：使用逐行内存分析器运行代码
%time: Time the execution of a single statement
%timeit: Time repeated execution of a single statement for more accuracy
%prun: Run code with the profiler
%lprun: Run code with the line-by-line profiler
%memit: Measure the memory use of a single statement
%mprun: Run code with the line-by-line memory profiler

最后四个命令 IPython 没有默认提供，需要自己安装 line_profiler 和 memory_profiler 扩展，我们将在以下部分中讨论。

The last four commands are not bundled with IPython–you'll need to get the line_profiler and memory_profiler extensions, which we will discuss in the following sections.

用 `%timeit` 和 `%time` 对代码片段计时¶

Timing Code Snippets: `%timeit` and `%time`¶

在IPython 魔法命令中我们就看到了对单行执行的 %timeit 和对多行执行的 %%timeit；它们可以用于代码片段的计时。

We saw the %timeit line-magic and %%timeit cell-magic in the introduction to magic functions in IPython Magic Commands; it can be used to time the repeated execution of snippets of code:

In [1]:

%timeit sum(range(100))

100000 loops, best of 3: 1.54 µs per loop

因为这个语句执行速度太快了 %timeit 会自动对其进行重复执行以获取更准确的执行时间。对于比较缓慢的语句 %timeit 会自动调整重复执行的次数。

Note that because this operation is so fast, %timeit automatically does a large number of repetitions. For slower commands, %timeit will automatically adjust and perform fewer repetitions:

In [2]:

%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

1 loops, best of 3: 407 ms per loop

不是所有情况都适合重复执行的。例如，如果我们要对一个列表排序，重复执行就会给我们错误的结果。因为在一个已经排好序的列表执行排序算法要比一个无序列表上执行排序算法要快得多，因此重复将会导致结果偏差：

Sometimes repeating an operation is not the best option. For example, if we have a list that we'd like to sort, we might be misled by a repeated operation. Sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:

In [3]:

import random
L = [random.random() for i in range(100000)]
%timeit L.sort()

100 loops, best of 3: 1.9 ms per loop

这时候就应该用 %time 方法。当然执行比较耗时的命令也应该用 %time 毕竟对于系统相关的演示对其影响甚微。这里我们看一个对无序序列（相对于已经排好序的序列）排序的例子：

For this, the %time magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result. Let's time the sorting of an unsorted and a presorted list:

In [4]:

import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()

sorting an unsorted list:
CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms
Wall time: 41.5 ms

In [5]:

print("sorting an already sorted list:")
%time L.sort()

sorting an already sorted list:
CPU times: user 8.18 ms, sys: 10 µs, total: 8.19 ms
Wall time: 8.24 ms

可以看到对预排序列表排序要快得多，并且 %time 获取的执行时间要比 %timeit 获取的执行时间要长得多，即使在预排序算法上也是如此！因为 %timeit 执行的时候做了一些额外的事情以防止系统调用干扰计时。例如，它阻止清除未使用的 Python 对象（称为垃圾回收），否则这些对象可能会影响计时。因此，%timeit 结果通常明显快于 %time 结果。

Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with %time versus %timeit, even for the presorted list! This is a result of the fact that %timeit does some clever things under the hood to prevent system calls from interfering with the timing. For example, it prevents cleanup of unused Python objects (known as garbage collection) which might otherwise affect the timing. For this reason, %timeit results are usually noticeably faster than %time results.

和 %timeit 类似，用 %%time 就可以对多行代码进行计时：

For %time as with %timeit, using the double-percent-sign cell magic syntax allows timing of multiline scripts:

In [6]:

%%time
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

CPU times: user 504 ms, sys: 979 µs, total: 505 ms
Wall time: 505 ms

有关 %time 和 %timeit 的更多信息，请看IPython 的帮助文档（比如用 %time? 在 IPython 查看）。

For more information on %time and %timeit, as well as their available options, use the IPython help functionality (i.e., type %time? at the IPython prompt).

用 `%prun` 对整个文件进行性能分析¶

Profiling Full Scripts: `%prun`¶

一段程序包含很多语句，有时候在上下文中进行性能分析要比单个分析每一段代码更有效。Python 内置了一个代码调优器（详见 Python 文档），不过在 IPython 中提供了更好的工具 %prun。

A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own. Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.

我们首先创建一个这样的函数：

By way of example, we'll define a simple function that does some calculations:

In [1]:

def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

然后我们就可以调用 %prun 来查看性能分析结果了：

Now we can call %prun with a function call to see the profiled results:

In [2]:

%prun sum_of_lists(1000000)

如果在 notebook 中执行，其输出会显示在一个单独的弹出层上，结果大概是这个样子：

In the notebook, the output is printed to the pager, and looks something like this:

14 function calls in 0.714 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.599    0.120    0.599    0.120 <ipython-input-19>:4(<listcomp>)
        5    0.064    0.013    0.064    0.013 {built-in method sum}
        1    0.036    0.036    0.699    0.699 <ipython-input-19>:1(sum_of_lists)
        1    0.014    0.014    0.714    0.714 <string>:1(<module>)
        1    0.000    0.000    0.714    0.714 {built-in method exec}

结果是一个表表格，它以函数调用的总时间的顺序展示各个函数。在这里 sum_of_lists 中的 list comprehension 最耗时。然后我们就可以考虑做什么改进以提高算法的性能了。

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_lists. From here, we could start thinking about what changes we might make to improve the performance in the algorithm.

更多有关 %prun 的信息请参阅 IPython 帮助（在 IPython 中输入 %prun?）。

For more information on %prun, as well as its available options, use the IPython help functionality (i.e., type %prun? at the IPython prompt).

用 `%lprun` 按行进行性能分析¶

Line-By-Line Profiling with `%lprun`¶

前面介绍了用 %prun 按函数进行性能分析，但是有时候还需要每一行代码的执行效率。line_profiler 就是这样一个工具，但是它并不是 Python 或者 IPython 的默认安装包，需要自行安装。比如可以使用 pip 来安装它：

The function-by-function profiling of %prun is useful, but sometimes it's more convenient to have a line-by-line profile report. This is not built into Python or IPython, but there is a line_profiler package available for installation that can do this. Start by using Python's packaging tool, pip, to install the line_profiler package:

$ pip install line_profiler

然后你就可以在 IPython 中加载它的扩展了：

Next, you can use IPython to load the line_profiler IPython extension, offered as part of this package:

In [9]:

%load_ext line_profiler

%lprun 可以对任何函数按行进行性能分析，这里我们我们把需要分析的函数传递给它：

Now the %lprun command will do a line-by-line profiling of any function–in this case, we need to tell it explicitly which functions we're interested in profiling:

In [10]:

%lprun -f sum_of_lists sum_of_lists(5000)

和之前一样，notebook 生成的结果显示在一个单独的弹出层上，内容如下：

As before, the notebook sends the result to the pager, but it looks something like this:

Timer unit: 1e-06 s

Total time: 0.009382 s
File: <ipython-input-19-fa2be176cc3e>
Function: sum_of_lists at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def sum_of_lists(N):
     2         1            2      2.0      0.0      total = 0
     3         6            8      1.3      0.1      for i in range(5):
     4         5         9001   1800.2     95.9          L = [j ^ (j >> i) for j in range(N)]
     5         5          371     74.2      4.0          total += sum(L)
     6         1            0      0.0      0.0      return total

顶部包含了这次性能分析的基本信息，比如时间以微秒为单位，执行的总时间是多少。利用这个报告我们就能知道具体要修改那些内容以提升代码的执行效率。

更多有关 %lprun 的信息详见 IPython 帮助文档（在 IPython 中输入 %lprun? 查看）。

The information at the top gives us the key to reading the results: the time is reported in microseconds and we can see where the program is spending the most time. At this point, we may be able to use this information to modify aspects of the script and make it perform better for our desired use case.

For more information on %lprun, as well as its available options, use the IPython help functionality (i.e., type %lprun? at the IPython prompt).

用 `%memit` 和 `%mprun` 进行内存使用分析¶

Profiling Memory Use: `%memit` and `%mprun`¶

性能分析的另一部分就是内存使用分析。和 line_profiler 类似还有一个 memory_profiler 可以通过 pip 安装：

Another aspect of profiling is the amount of memory an operation uses. This can be evaluated with another IPython extension, the memory_profiler. As with the line_profiler, we start by pip-installing the extension:

$ pip install memory_profiler

然后在 IPython 中加载它的扩展：

Then we can use IPython to load the extension:

In [12]:

%load_ext memory_profiler

内存分析器包含两个魔法命令：%memit （相当于内存分析版本的 %timeit）和 %mprun（相当于内存分析版的 %lprun）。%memit 使用方法如下：

The memory profiler extension contains two useful magic functions: the %memit magic (which offers a memory-measuring equivalent of %timeit) and the %mprun function (which offers a memory-measuring equivalent of %lprun). The %memit function can be used rather simply:

In [13]:

%memit sum_of_lists(1000000)

peak memory: 100.08 MiB, increment: 61.36 MiB

可以看到这个函数占用了 100M 内存。想用逐行分析内存使用，我们可以使用 %mprun。不幸的是，这个方法只适用于在独立的模块中定义的函数，而不能是 notebook 中的模块，所以我们用 %%file 创建一个简单的模块 mprun_demo.py，它包含我们的 sum_of_lists 函数，这样我们才能看到其执行的内存分析的结果：

We see that this function uses about 100 MB of memory.

For a line-by-line description of memory use, we can use the %mprun magic. Unfortunately, this magic works only for functions defined in separate modules rather than the notebook itself, so we'll start by using the %%file magic to create a simple module called mprun_demo.py, which contains our sum_of_lists function, with one addition that will make our memory profiling results more clear:

In [14]:

%%file mprun_demo.py
def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
        del L # remove reference to L
    return total

Overwriting mprun_demo.py

然后我们引入新版本的函数并使用内存分析器：

We can now import the new version of this function and run the memory line profiler:

In [15]:

from mprun_demo import sum_of_lists
%mprun -f sum_of_lists sum_of_lists(1000000)

结果是这个函数的内存使用情况，如下所示：

The result, printed to the pager, gives us a summary of the memory use of the function, and looks something like this:

Filename: ./mprun_demo.py

Line #    Mem usage    Increment   Line Contents
================================================
     4     71.9 MiB      0.0 MiB           L = [j ^ (j >> i) for j in range(N)]


Filename: ./mprun_demo.py

Line #    Mem usage    Increment   Line Contents
================================================
     1     39.0 MiB      0.0 MiB   def sum_of_lists(N):
     2     39.0 MiB      0.0 MiB       total = 0
     3     46.5 MiB      7.5 MiB       for i in range(5):
     4     71.9 MiB     25.4 MiB           L = [j ^ (j >> i) for j in range(N)]
     5     71.9 MiB      0.0 MiB           total += sum(L)
     6     46.5 MiB    -25.4 MiB           del L # remove reference to L
     7     39.1 MiB     -7.4 MiB       return total

Increment 是指每行导致内存占用增加了多少；可以看到创建和删除 L 列表，内存使用相应的增加或是减少了 25M。更多有关 %memit 和 mprun 信息详见 IPython 文档（用 %memit? 查看）。

Here the Increment column tells us how much each line affects the total memory budget: observe that when we create and delete the list L, we are adding about 25 MB of memory usage. This is on top of the background memory usage from the Python interpreter itself.

For more information on %memit and %mprun, as well as their available options, use the IPython help functionality (i.e., type %memit? at the IPython prompt).

< Errors and Debugging | Contents | More IPython Resources >

性能分析¶

Profiling and Timing Code¶

用 %timeit 和 %time 对代码片段计时¶

Timing Code Snippets: %timeit and %time¶

用 %prun 对整个文件进行性能分析¶

Profiling Full Scripts: %prun¶

用 %lprun 按行进行性能分析¶

Line-By-Line Profiling with %lprun¶

用 %memit 和 %mprun 进行内存使用分析¶

Profiling Memory Use: %memit and %mprun¶