Notebook

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

< Aggregations: Min, Max, and Everything In Between | Contents | Comparisons, Masks, and Boolean Logic >

矩阵计算：广播¶

Computation on Arrays: Broadcasting¶

我们在前面的章节中看到了 NumPy 的 ufuncs 如何用于向量化操作，从而避免缓慢的 Python 循环。使用向量化的另一个方式是使用NumPy 的广播功能。广播是在长度不同的数组上执行 ufunc（例如，加法，减法，乘法等）的一组规则。

We saw in the previous section how NumPy's universal functions can be used to vectorize operations and thereby remove slow Python loops. Another means of vectorizing operations is to use NumPy's broadcasting functionality. Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

广播入门¶

Introducing Broadcasting¶

回想一下，对于相同大小的数组，逐个元素执行二元操作：

Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:

In [1]:

import numpy as np

In [2]:

a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

Out[2]:

array([5, 6, 7])

广播允许对不同大小的数组执行这些操作 - 例如，我们可以将一个标量（想象它是一个 0 维数组）和一个数组相加：

Broadcasting allows these types of binary operations to be performed on arrays of different sizes–for example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:

In [3]:

a + 5

Out[3]:

array([5, 6, 7])

我们可以认为是这个操作首先把 5 转换为了数组 [5, 5, 5] 然后进行运算。NumPy 广播在实际运算中并没有这么做，但是我们可以借用这个思路来理解广播。

我们可以类似地将其扩展到更高维度的数组。来看当我们将一维数组添加到二维数组时的情况：

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5, 5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of values does not actually take place, but it is a useful mental model as we think about broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a one-dimensional array to a two-dimensional array:

In [4]:

M = np.ones((3, 3))
M

Out[4]:

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [5]:

M + a

Out[5]:

array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

一维数组 a 在第二维被拉伸（或者说是在第二维被广播）以便匹配 M 的维度。

这些示例相对容易理解，但是也有两个数组广播的复杂情况，考虑以下示例：

Here the one-dimensional array a is stretched, or broadcast across the second dimension in order to match the shape of M.

While these examples are relatively easy to understand, more complicated cases can involve broadcasting of both arrays. Consider the following example:

In [6]:

a = np.arange(3)
b = np.arange(3)[:, np.newaxis]

print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]

In [7]:

a + b

Out[7]:

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

和之前类似，我们拉伸或广播一个值以匹配另一个的形状，这里我们同时拉伸 a 和 b 到一个形状，其运算结果是一个二维数组。以上几个运算的视图如下所示（生成这些视图的代码可以在附录中找到，它们源自 astroML 的文档）。

Just as before we stretched or broadcasted one value to match the shape of the other, here we've stretched both a and b to match a common shape, and the result is a two-dimensional array! The geometry of these examples is visualized in the following figure (Code to produce this plot can be found in the appendix, and is adapted from source published in the astroML documentation. Used by permission).

Broadcasting Visual

浅色部分代表广播获取的数值：再次强调，实际上并没有真正的在内存中创建这些数据，只是在概念上帮助大家理解广播的效果。

The light boxes represent the broadcasted values: again, this extra memory is not actually allocated in the course of the operation, but it can be useful conceptually to imagine that it is.

广播的规则¶

Rules of Broadcasting¶

NumPy 严格按照一些规则对数组运算进行广播：

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

规则1：如果连个数组的维度不同，那么维度较少的数组在自己当前维度的前面填充长度为 1 的维度。
规则2：如果两个数组任意一个维度的长度不符，那么在这个维度上长度为 1 的那个数组在该维度上进行拉伸，即填充同样的数据以适应另一个数组。
规则3：如果任意维度上长度不等，但两个数组在该维度的长度都不是 1，则报错
Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

为说明以上规则，我们来看几个示例：

To make these rules clear, let's consider a few examples in detail.

示例1¶

Broadcasting example 1¶

这是一个把二维数组与一维数组相加的例子：

Let's look at adding a two-dimensional array to a one-dimensional array:

In [8]:

M = np.ones((2, 3))
a = np.arange(3)

两个数组的维度如下：

Let's consider an operation on these two arrays. The shape of the arrays are

M.shape = (2, 3)
a.shape = (3,)

按照规则1，a 的维度少，所以增加它的维度：

We see by rule 1 that the array a has fewer dimensions, so we pad it on the left with ones:

M.shape -> (2, 3)
a.shape -> (1, 3)

按照规则2，第一维上两者的长度不同，所以对 a 拉伸它的第一维：

By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:

M.shape -> (2, 3)
a.shape -> (2, 3)

之后两者的维度都会成为 (2, 3)：

The shapes match, and we see that the final shape will be (2, 3):

In [9]:

M + a

Out[9]:

array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

示例2¶

Broadcasting example 2¶

让我们看一个两个数组都需要做广播的情况：

Let's take a look at an example where both arrays need to be broadcast:

In [10]:

a = np.arange(3).reshape((3, 1))
b = np.arange(3)

我们还是先列出两者的维度：

Again, we'll start by writing out the shape of the arrays:

a.shape = (3, 1)
b.shape = (3,)

按照规则1我们应该添加 b 的维度：

Rule 1 says we must pad the shape of b with ones:

a.shape -> (3, 1)
b.shape -> (1, 3)

按照规则2更新两个数组的维度：

And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

a.shape -> (3, 3)
b.shape -> (3, 3)

按照规则2处理后两者的维度是兼容的，我们可以继续进行计算：

Because the result matches, these shapes are compatible. We can see this here:

In [11]:

a + b

Out[11]:

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

示例3¶

Broadcasting example 3¶

我们再看一种不兼容的情况：

Now let's take a look at an example in which the two arrays are not compatible:

In [10]:

M = np.ones((3, 2))
a = np.arange(3)

这种情况和第一种情况略有区别：矩阵 M 是之前的转置，我们看看这样会有什么样的不同，首先还是它们各自的维度：

This is just a slightly different situation than in the first example: the matrix M is transposed. How does this affect the calculation? The shape of the arrays are

M.shape = (3, 2)
a.shape = (3,)

按照规则1 a 做同样的处理：

Again, rule 1 tells us that we must pad the shape of a with ones:

M.shape -> (3, 2)
a.shape -> (1, 3)

按照规则2，a 的第一维被拉伸以和 M 的第一维匹配：

By rule 2, the first dimension of a is stretched to match that of M:

M.shape -> (3, 2)
a.shape -> (3, 3)

但是到这里我们就遇到了规则3提到的不兼容的情况，我们就能看到这样做的后果：

Now we hit rule 3–the final shapes do not match, so these two arrays are incompatible, as we can observe by attempting this operation:

In [11]:

M + a

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-9e16e9f98da6> in <module>()
----> 1 M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,)

你可能会想到这样一种方式可以让 a 和 M 通过广播后可以完成运算：在 a 原有维度的后面而不是前面增加维度。但这和广播的规则不符，它可能会带来更多的歧义。如果你想要做到这点，那么你需要显式的重塑数组的维度（使用在 NumPy 数组基础中提到的 np.newaxis）：

Note the potential confusion here: you could imagine making a and M compatible by, say, padding a's shape with ones on the right rather than the left. But this is not how the broadcasting rules work! That sort of flexibility might be useful in some cases, but it would lead to potential areas of ambiguity. If right-side padding is what you'd like, you can do this explicitly by reshaping the array (we'll use the np.newaxis keyword introduced in The Basics of NumPy Arrays):

In [12]:

a[:, np.newaxis].shape

Out[12]:

(3, 1)

In [13]:

M + a[:, np.newaxis]

Out[13]:

array([[ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.]])

虽然我们在这里只提到了 + 运算，但实际上广播适用于任何二元 ufunc。例如对函数 logaddexp(a, b)，它比 log(exp(a) + exp(b)) 有更高的精度：

Also note that while we've been focusing on the + operator here, these broadcasting rules apply to any binary ufunc. For example, here is the logaddexp(a, b) function, which computes log(exp(a) + exp(b)) with more precision than the naive approach:

In [14]:

np.logaddexp(M, a[:, np.newaxis])

Out[14]:

array([[ 1.31326169,  1.31326169],
       [ 1.69314718,  1.69314718],
       [ 2.31326169,  2.31326169]])

在NumPy 数组的计算方式：Universal Functions可以找到更多有关 ufunc 的内容。

For more information on the many available universal functions, refer to Computation on NumPy Arrays: Universal Functions.

运用广播¶

Broadcasting in Practice¶

之后的很多例子里都会用到广播，我们在这里看几个用得到广播的例子：

Broadcasting operations form the core of many examples we'll see throughout this book. We'll now take a look at a couple simple examples of where they can be useful.

数据中心化¶

Centering an array¶

前面的章节中我们看到使用 ufunc 可以避免采用缓慢的 Python 循环，广播可以进一步提升这种能力。一个常见的例子是对数据进行中心化。如果你有 10 个观察数据，每个包含三个值，采用标准的方式（详见Scikit-Learn 中的数据表示），我们会把它们存在一个 $10 \times 3$ 的数组中：

In the previous section, we saw that ufuncs allow a NumPy user to remove the need to explicitly write slow Python loops. Broadcasting extends this ability. One commonly seen example is when centering an array of data. Imagine you have an array of 10 observations, each of which consists of 3 values. Using the standard convention (see Data Representation in Scikit-Learn), we'll store this in a $10 \times 3$ array:

In [15]:

X = np.random.random((10, 3))

我们可以用 mean 函数按照第一维计算每一个特征的均值：

We can compute the mean of each feature using the mean aggregate across the first dimension:

In [16]:

Xmean = X.mean(0)
Xmean

Out[16]:

array([ 0.37574859,  0.58518436,  0.46515223])

把原有数据减去它们的均值（这就是一个广播操作）：

And now we can center the X array by subtracting the mean (this is a broadcasting operation):

In [19]:

X_centered = X - Xmean

来确认一下我们是不是做对了，看看处理后的数据的均值是不是非常接近 0：

To double-check that we've done this correctly, we can check that the centered array has near zero mean:

In [20]:

X_centered.mean(0)

Out[20]:

array([  2.22044605e-17,  -7.77156117e-17,  -1.66533454e-17])

考虑机器的精度，我们的处理是正确的。

To within machine precision, the mean is now zero.

绘制二维函数¶

Plotting a two-dimensional function¶

把二维数据以图像的形式展示是常用到广播的情况。定义函数 $z = f(x, y)$，广播可以用来计算网格上的函数：

One place that broadcasting is very useful is in displaying images based on two-dimensional functions. If we want to define a function $z = f(x, y)$, broadcasting can be used to compute the function across the grid:

In [18]:

# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

我们用 Matplotlib 绘制这个二维数组（更多工具的使用会在Density and Contour Plots中介绍）：

We'll use Matplotlib to plot this two-dimensional array (these tools will be discussed in full in Density and Contour Plots):

In [19]:

%matplotlib inline
import matplotlib.pyplot as plt

In [23]:

plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],
           cmap='viridis')
plt.colorbar();

The result is a compelling visualization of the two-dimensional function.

< Aggregations: Min, Max, and Everything In Between | Contents | Comparisons, Masks, and Boolean Logic >