The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!
在进行数据处理开发的时候,通常要在各种实现之间进行权衡。在算法开发的早期,考虑太多可能会适得其反。正如唐纳德·克努特(Donald Knuth)所说的那样,“过早优化是万恶之源,我们应该忽略那些占 97% 的无效优化。“
In the process of developing code and creating data processing pipelines, there are often trade-offs you can make between various implementations. Early in developing your algorithm, it can be counterproductive to worry about such things. As Donald Knuth famously quipped, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
如果你完成了编码的工作,那就可以进行一些优化的工作了。有时候需要检查给定命令或命令集的执行时间; 而有时需要查看复杂的多线过程并确定瓶颈在哪里。IPython 为各种情况提供了计时和性能分析的功能。在这里,我们将讨论以下 IPython 魔法命令:
But once you have your code working, it can be useful to dig into its efficiency a bit. Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations. IPython provides access to a wide array of functionality for this kind of timing and profiling of code. Here we'll discuss the following IPython magic commands:
%time
:计算单个语句的执行时间
%timeit
:对单个语句重复计时以获得更高的精度
%prun
:使用分析器运行代码
%lprun
:使用逐行分析器运行代码
%memit
:测量单个语句的内存使用
%mprun
:使用逐行内存分析器运行代码
%time
: Time the execution of a single statement
%timeit
: Time repeated execution of a single statement for more accuracy
%prun
: Run code with the profiler
%lprun
: Run code with the line-by-line profiler
%memit
: Measure the memory use of a single statement
%mprun
: Run code with the line-by-line memory profiler
最后四个命令 IPython 没有默认提供,需要自己安装 line_profiler
和 memory_profiler
扩展,我们将在以下部分中讨论。
The last four commands are not bundled with IPython–you'll need to get the line_profiler
and memory_profiler
extensions, which we will discuss in the following sections.
%timeit
和 %time
对代码片段计时¶%timeit
and %time
¶在IPython 魔法命令中我们就看到了对单行执行的 %timeit
和对多行执行的 %%timeit
;它们可以用于代码片段的计时。
We saw the %timeit
line-magic and %%timeit
cell-magic in the introduction to magic functions in IPython Magic Commands; it can be used to time the repeated execution of snippets of code:
%timeit sum(range(100))
100000 loops, best of 3: 1.54 µs per loop
因为这个语句执行速度太快了 %timeit
会自动对其进行重复执行以获取更准确的执行时间。对于比较缓慢的语句 %timeit
会自动调整重复执行的次数。
Note that because this operation is so fast, %timeit
automatically does a large number of repetitions.
For slower commands, %timeit
will automatically adjust and perform fewer repetitions:
%%timeit
total = 0
for i in range(1000):
for j in range(1000):
total += i * (-1) ** j
1 loops, best of 3: 407 ms per loop
不是所有情况都适合重复执行的。例如,如果我们要对一个列表排序,重复执行就会给我们错误的结果。因为在一个已经排好序的列表执行排序算法要比一个无序列表上执行排序算法要快得多,因此重复将会导致结果偏差:
Sometimes repeating an operation is not the best option. For example, if we have a list that we'd like to sort, we might be misled by a repeated operation. Sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
100 loops, best of 3: 1.9 ms per loop
这时候就应该用 %time
方法。当然执行比较耗时的命令也应该用 %time
毕竟对于系统相关的演示对其影响甚微。这里我们看一个对无序序列(相对于已经排好序的序列)排序的例子:
For this, the %time
magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result.
Let's time the sorting of an unsorted and a presorted list:
import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()
sorting an unsorted list: CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms Wall time: 41.5 ms
print("sorting an already sorted list:")
%time L.sort()
sorting an already sorted list: CPU times: user 8.18 ms, sys: 10 µs, total: 8.19 ms Wall time: 8.24 ms
可以看到对预排序列表排序要快得多,并且 %time
获取的执行时间要比 %timeit
获取的执行时间要长得多,即使在预排序算法上也是如此!因为 %timeit
执行的时候做了一些额外的事情以防止系统调用干扰计时。例如,它阻止清除未使用的 Python 对象(称为垃圾回收),否则这些对象可能会影响计时。因此,%timeit
结果通常明显快于 %time
结果。
Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with %time
versus %timeit
, even for the presorted list!
This is a result of the fact that %timeit
does some clever things under the hood to prevent system calls from interfering with the timing.
For example, it prevents cleanup of unused Python objects (known as garbage collection) which might otherwise affect the timing.
For this reason, %timeit
results are usually noticeably faster than %time
results.
和 %timeit
类似,用 %%time
就可以对多行代码进行计时:
For %time
as with %timeit
, using the double-percent-sign cell magic syntax allows timing of multiline scripts:
%%time
total = 0
for i in range(1000):
for j in range(1000):
total += i * (-1) ** j
CPU times: user 504 ms, sys: 979 µs, total: 505 ms Wall time: 505 ms
有关 %time
和 %timeit
的更多信息,请看IPython 的帮助文档(比如用 %time?
在 IPython 查看)。
For more information on %time
and %timeit
, as well as their available options, use the IPython help functionality (i.e., type %time?
at the IPython prompt).
%prun
对整个文件进行性能分析¶%prun
¶一段程序包含很多语句,有时候在上下文中进行性能分析要比单个分析每一段代码更有效。Python 内置了一个代码调优器(详见 Python 文档),不过在 IPython 中提供了更好的工具 %prun
。
A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own.
Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun
.
我们首先创建一个这样的函数:
By way of example, we'll define a simple function that does some calculations:
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
return total
然后我们就可以调用 %prun
来查看性能分析结果了:
Now we can call %prun
with a function call to see the profiled results:
%prun sum_of_lists(1000000)
如果在 notebook 中执行,其输出会显示在一个单独的弹出层上,结果大概是这个样子:
In the notebook, the output is printed to the pager, and looks something like this:
14 function calls in 0.714 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.599 0.120 0.599 0.120 <ipython-input-19>:4(<listcomp>)
5 0.064 0.013 0.064 0.013 {built-in method sum}
1 0.036 0.036 0.699 0.699 <ipython-input-19>:1(sum_of_lists)
1 0.014 0.014 0.714 0.714 <string>:1(<module>)
1 0.000 0.000 0.714 0.714 {built-in method exec}
结果是一个表表格,它以函数调用的总时间的顺序展示各个函数。在这里 sum_of_lists
中的 list comprehension 最耗时。然后我们就可以考虑做什么改进以提高算法的性能了。
The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_lists
.
From here, we could start thinking about what changes we might make to improve the performance in the algorithm.
更多有关 %prun
的信息请参阅 IPython 帮助(在 IPython 中输入 %prun?
)。
For more information on %prun
, as well as its available options, use the IPython help functionality (i.e., type %prun?
at the IPython prompt).
%lprun
按行进行性能分析¶%lprun
¶前面介绍了用 %prun
按函数进行性能分析,但是有时候还需要每一行代码的执行效率。line_profiler
就是这样一个工具,但是它并不是 Python 或者 IPython 的默认安装包,需要自行安装。比如可以使用 pip
来安装它:
The function-by-function profiling of %prun
is useful, but sometimes it's more convenient to have a line-by-line profile report.
This is not built into Python or IPython, but there is a line_profiler
package available for installation that can do this.
Start by using Python's packaging tool, pip
, to install the line_profiler
package:
$ pip install line_profiler
然后你就可以在 IPython 中加载它的扩展了:
Next, you can use IPython to load the line_profiler
IPython extension, offered as part of this package:
%load_ext line_profiler
%lprun
可以对任何函数按行进行性能分析,这里我们我们把需要分析的函数传递给它:
Now the %lprun
command will do a line-by-line profiling of any function–in this case, we need to tell it explicitly which functions we're interested in profiling:
%lprun -f sum_of_lists sum_of_lists(5000)
和之前一样,notebook 生成的结果显示在一个单独的弹出层上,内容如下:
As before, the notebook sends the result to the pager, but it looks something like this:
Timer unit: 1e-06 s
Total time: 0.009382 s
File: <ipython-input-19-fa2be176cc3e>
Function: sum_of_lists at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def sum_of_lists(N):
2 1 2 2.0 0.0 total = 0
3 6 8 1.3 0.1 for i in range(5):
4 5 9001 1800.2 95.9 L = [j ^ (j >> i) for j in range(N)]
5 5 371 74.2 4.0 total += sum(L)
6 1 0 0.0 0.0 return total
顶部包含了这次性能分析的基本信息,比如时间以微秒为单位,执行的总时间是多少。利用这个报告我们就能知道具体要修改那些内容以提升代码的执行效率。
更多有关 %lprun
的信息详见 IPython 帮助文档(在 IPython 中输入 %lprun?
查看)。
The information at the top gives us the key to reading the results: the time is reported in microseconds and we can see where the program is spending the most time. At this point, we may be able to use this information to modify aspects of the script and make it perform better for our desired use case.
For more information on %lprun
, as well as its available options, use the IPython help functionality (i.e., type %lprun?
at the IPython prompt).
%memit
和 %mprun
进行内存使用分析¶%memit
and %mprun
¶性能分析的另一部分就是内存使用分析。和 line_profiler
类似还有一个 memory_profiler
可以通过 pip
安装:
Another aspect of profiling is the amount of memory an operation uses.
This can be evaluated with another IPython extension, the memory_profiler
.
As with the line_profiler
, we start by pip
-installing the extension:
$ pip install memory_profiler
然后在 IPython 中加载它的扩展:
Then we can use IPython to load the extension:
%load_ext memory_profiler
内存分析器包含两个魔法命令:%memit
(相当于内存分析版本的 %timeit
)和 %mprun
(相当于内存分析版的 %lprun
)。%memit
使用方法如下:
The memory profiler extension contains two useful magic functions: the %memit
magic (which offers a memory-measuring equivalent of %timeit
) and the %mprun
function (which offers a memory-measuring equivalent of %lprun
).
The %memit
function can be used rather simply:
%memit sum_of_lists(1000000)
peak memory: 100.08 MiB, increment: 61.36 MiB
可以看到这个函数占用了 100M 内存。想用逐行分析内存使用,我们可以使用 %mprun
。不幸的是,这个方法只适用于在独立的模块中定义的函数,而不能是 notebook 中的模块,所以我们用 %%file
创建一个简单的模块 mprun_demo.py
,它包含我们的 sum_of_lists
函数,这样我们才能看到其执行的内存分析的结果:
We see that this function uses about 100 MB of memory.
For a line-by-line description of memory use, we can use the %mprun
magic.
Unfortunately, this magic works only for functions defined in separate modules rather than the notebook itself, so we'll start by using the %%file
magic to create a simple module called mprun_demo.py
, which contains our sum_of_lists
function, with one addition that will make our memory profiling results more clear:
%%file mprun_demo.py
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
del L # remove reference to L
return total
Overwriting mprun_demo.py
然后我们引入新版本的函数并使用内存分析器:
We can now import the new version of this function and run the memory line profiler:
from mprun_demo import sum_of_lists
%mprun -f sum_of_lists sum_of_lists(1000000)
结果是这个函数的内存使用情况,如下所示:
The result, printed to the pager, gives us a summary of the memory use of the function, and looks something like this:
Filename: ./mprun_demo.py
Line # Mem usage Increment Line Contents
================================================
4 71.9 MiB 0.0 MiB L = [j ^ (j >> i) for j in range(N)]
Filename: ./mprun_demo.py
Line # Mem usage Increment Line Contents
================================================
1 39.0 MiB 0.0 MiB def sum_of_lists(N):
2 39.0 MiB 0.0 MiB total = 0
3 46.5 MiB 7.5 MiB for i in range(5):
4 71.9 MiB 25.4 MiB L = [j ^ (j >> i) for j in range(N)]
5 71.9 MiB 0.0 MiB total += sum(L)
6 46.5 MiB -25.4 MiB del L # remove reference to L
7 39.1 MiB -7.4 MiB return total
Increment
是指每行导致内存占用增加了多少;可以看到创建和删除 L
列表,内存使用相应的增加或是减少了 25M。更多有关 %memit
和 mprun
信息详见 IPython 文档(用 %memit?
查看)。
Here the Increment
column tells us how much each line affects the total memory budget: observe that when we create and delete the list L
, we are adding about 25 MB of memory usage.
This is on top of the background memory usage from the Python interpreter itself.
For more information on %memit
and %mprun
, as well as their available options, use the IPython help functionality (i.e., type %memit?
at the IPython prompt).