NumPy

The NumPy and Pandas libraries are central to data science in Python. NumPy allows for the efficient analysis and processing of data arrays with varying sizes, shapes, and number of dimensions while Pandas allows for reading in and working with data tables. In this section, we will focus on NumPy.

After working through this module you will be able to:

  1. describe and use NumPy data types.
  2. describe the data type, size, shape, and number of dimensions in an array.
  3. create, reshape, and slice NumPy arrays.
  4. perform numeric and comparison operations on arrays.

Creating NumPy Arrays

The NumPy library allows for creating and working with arrays. It is of specific value when you want to perform mathematical operations on values stored in arrays. Also, it is very fast and memory efficient due to its reliance on the C language. As mentioned in the prior modules, arrays are similar to lists in that they store a series of values or elements. However, arrays can be expanded to include many dimensions. For example, an image could be represented as an array with 3 dimensions: height, width, and channels. If you work with deep learning, tensors are the primary data model used to read and manipulate data and are essentially multidimensional arrays that can be stored in RAM or within GPU memory for faster computation. In short, array-based calculations and manipulations are essential to data science, so NumPy is an important library to learn if you work in the Python environment.

The complete documentation for NumPy can be found here.

Before you can use NumPy, you must make sure that it is installed into your Anaconda environment, as demonstrated in the set-up module. Once NumPy is installed, you will need to import it in order to use it in your code. It is common to assign NumPy an alias name of "np" to simplify your code.

import numpy as np

Lists can be converted to NumPy arrays using the array() method. Once the list object is converted to an array the type is defined as numpy.ndarray, which indicates that it is a NumPy array specifically. Since this array only has one dimension, it is specifically called a vector.

lst1 = [3, 6, 7, 8, 9]
arr1 = np.array(lst1)
print(type(lst1))
print(type(arr1))
print(arr1)
<class 'list'>
<class 'numpy.ndarray'>
[3 6 7 8 9]

A two dimensional array is known as a matrix. In the example below, I am generating a matrix array from a list of lists.

lst2 = [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]]
arr2 = np.array(lst2)
print(arr2)
[[3 6 7 8 9]
 [3 6 7 8 9]
 [3 6 7 8 9]]

Again, one of the powerful advantages of NumPy arrays is the ability to store data in arrays with many dimensions. In the example below, I am creating a three dimensional array from a list of lists of lists. This would be similar to an image with dimensions image height, image width, and image channels (for example, red, green, and blue). A four dimensional array could represent a time series (height, width, channels, and time) or a video containing multiple frames (frame height, frame width, channels, and frame number).

lst3 = [[[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]], 
         [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]], 
         [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]]]
arr3 = np.array(lst3)
print(arr3)
[[[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]

 [[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]

 [[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]]

The cell below provides some examples of NumPy methods for generating arrays.

The arange() method returns an array of evenly spaced values and accepts start, stop, step, and data type parameters. In the example, I have created an array of evenly spaced values from 0 to 100 with a step size of 5. I used 101 as opposed to 100 since the last value in the provided range is not included. I specifically define the data type as integer, but NumPy can infer a data type if it is not provided.

The linspace() method is similar to arange(); however, a number of samples is specified as opposed to a step size. In the example, since 5 samples are requested, 5 evenly spaced values between 0 and 100 are returned.

The ones() method is used to return an array of 1s. In the example, I have generated a three dimensional array where the first dimension has a length of 3, the second a length of 4, and the third a length of 4. The shape and dimensions of the array are specified using a tuple.

Similar to ones(), zeros() generates an array of zeros.

It is also possible to generate random values between 0 and 1 (random.rand()) and a specified number of random integer values between two values (random.randint()).

arr4 = np.arange(0, 101, 5, dtype="int")
print(arr4)

arr5 = np.linspace(0, 100, 5, dtype="int")
print(arr5)

arr6 = np.ones((3, 4, 4))
print(arr6)

arr7 = np.zeros((3, 4, 2))
print(arr7)

arr8 = np.random.rand(3, 4, 5)
print(arr8)

arr9 = np.random.randint(1, 200, 7)
print(arr9)
[  0   5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85
  90  95 100]
[  0  25  50  75 100]
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
[[[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]]
[[[0.31120543 0.90775916 0.14999824 0.9939034  0.94277863]
  [0.53387805 0.4711267  0.84033979 0.86675965 0.31429746]
  [0.5859617  0.32628012 0.15007969 0.79497275 0.25216347]
  [0.73599019 0.40059352 0.48377569 0.10046393 0.88812258]]

 [[0.13383815 0.10238955 0.58979035 0.68648671 0.08067008]
  [0.17418803 0.71397615 0.70455953 0.8254091  0.26178698]
  [0.30589628 0.59390863 0.72168191 0.04365877 0.52859414]
  [0.43429996 0.49355565 0.43217276 0.37844727 0.03878966]]

 [[0.79762046 0.22412081 0.72762178 0.30752609 0.53454478]
  [0.74531978 0.5899111  0.28801858 0.86724007 0.90602446]
  [0.65379145 0.43549822 0.82364164 0.51252386 0.34444615]
  [0.42689812 0.87619397 0.32682973 0.90393277 0.94198736]]]
[ 60  19  79 193  58 169  24]

NumPy Data Types

NumPy provides additional and more specific data types in comparison to base Python. Here, I provide a brief explanation of commonly used data types.

  • bool_: Boolean True or False
  • int8: 8-bit signed integer (-128 to 127)
  • in16: 16-bit signed integer (-32,768 to 32,767)
  • int32: 32-bit signed integer (-2,147,483,648 to 2,147,483,647)
  • int64: 64-bit signed integer (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • uint8: 8-bit unsigned integer (0 to 255)
  • uint16: 16-bit unsigned integer (0 to 65,535)
  • uint32: 32-bit unsigned integer (0 to 4,294,967,295)
  • uint64: 64-bit unsigned integer (0 to 18,446,744,073,709,551,615)
  • float16: half precision float
  • float32: single precision float
  • float16: double precision float

Signed integers can differentiate positive and negative values while unsigned integers cannot. Float data can store decimal values while integer data cannot. There are also data types for complex numbers, which we will not discuss here.

Below I have demonstrated how to define the data type with the dtype parameter. In all cases I am using .ones() to create an array with three elements. For both int8 and int16, 1 as an integer value is returned. When the data are defined as float16, 1 as a float value is returned (1.). Lastly, when the type is set to bool_ Boolean True is returned since 1 indicates True and 0 indicates False. Note that the data type will impact the amount of memory needed. For example an int8 will require less memory than an int16. I generally try to use the data type that can represent the data with the least amount of memory unless a specific data type is needed in an analysis.

arr1 = np.ones((3), dtype="int8")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="int16")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="float16")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="bool_")
print(arr1)
print(arr1.dtype)
[1 1 1]
int8
[1 1 1]
int16
[1. 1. 1.]
float16
[ True  True  True]
bool

Understanding and Manipulating Array Shape and Dimensions

Let's spend some time discussing the dimensions and shape of an array. The shape of an array relates to the length of each dimension. The len() function will return the length of the first dimension (in this case 3). To obtain a tuple of the lengths for all dimensions, you must use the shape property. So, the array generated has three dimensions with lengths of 3, 4, and 4, respectively. The number of dimensions is returned with the ndim property. The size property returns the number of features in the array. There are 48 features in the example array: 3 X 4 X 4 = 48. The dtype property provides the data type.

arr6 = np.ones((3, 4, 4))

print("Length of first dimension: " + str(len(arr6)))
print("Shape of array: " + str(arr6.shape))
print("Number of dimensions: " + str(arr6.ndim))
print("Size of array: " + str(arr6.size))
print("Data type of array: " + str(arr6.dtype))
Length of first dimension: 3
Shape of array: (3, 4, 4)
Number of dimensions: 3
Size of array: 48
Data type of array: float64

NumPy has a built-in methods for changing the shape of an array: .reshape(). Note that the number of features or size of the array must perfectly fill the new shape. In the first example, I am maintaining the number of dimensions but changing the shape or length of each dimension. In the second two examples, I am converting the three-dimensional array to two-dimensional arrays. Lastly, I convert the array to a one-dimensional array, or vector, with a length of 48.

arr6b = arr6.reshape(4, 4, 3)
arr6c = arr6.reshape(4, 12)
arr6d = arr6.reshape(12, 4)
arr6e = arr6.reshape(48)
print(arr6b)
print(arr6c)
print(arr6d)
print(arr6e)
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

As a second example, here I am converting a vector array into a multidimensional array or matrix.

arr10 = np.random.randint(1, 1200, 100)
arr10b = arr10.reshape(10, 10)
print(arr10b)
[[ 101  359   27   65   22  306  398  899  487  573]
 [  56  791  270  303  655  186  610  803 1179 1030]
 [1005  608 1144  809   88  450 1198 1084  733  552]
 [ 939  148  449  504 1082 1032  841  409  273  191]
 [ 136  288  393  213  402  818 1183  773  182  850]
 [ 571  280  326  869   51  454  360   64  247  176]
 [ 925  909  146  558 1009  276  690  821   36  233]
 [ 606  307  545 1168  433  963  691   37  156  673]
 [ 942  249  130  643  248  740  586 1036  614  197]
 [ 809   97  445  426  282  736  765  339  735 1018]]

When reshaping, it is possible to have NumPy determine the appropriate size of a single dimension to fill an array with the available elements. This is accomplished using -1 in the array dimension location when applying the .reshape() method.

arr10 = np.random.randint(1, 1200, 1000)
arr10b = arr10.reshape(-1, 10, 10)
arr10c = arr10.reshape(10, -1, 10)
arr10d = arr10.reshape(10, 10, -1)
print(arr10.shape)
print(arr10b.shape)
print(arr10c.shape)
print(arr10d.shape)
(1000,)
(10, 10, 10)
(10, 10, 10)
(10, 10, 10)

NumPy Array Indexing

Similar to lists, NumPy arrays are indexed. So, values from the array can be extracted or referenced using their associated index. Since arrays often have multiple dimensions, indexes must also extend into multiple dimensions. See the comments below for general array indexing rules. Remember that indexing starts at 0, the first index provided is included, and the last index provided is not included. Extracting portions of an array is known as slicing.

arr11 = np.linspace(0, 50, 50, dtype="int")
arr12 = arr11.reshape(2,5,5)
print("Original array")
print(arr12)
print("All values in first index of first dimension")
print(arr12[0]) #This will extract just the values from the first index in the first dimension
print("All values in second index of first dimension")
print(arr12[1]) #This will extract just the values from the second index in the first dimension
print("All values in first index of first dimension and first index of second dimension")
print(arr12[0][0]) #This will extract all values occurring in the first index of both the first and second dimensions
print("A single value specified with three indexes, one for reach dimension")
print(arr12[1, 3, 3]) #This will extract a specific value based on an index in all three dimensions
print("Incorporating ranges")
print(arr12[1, 0:2, 0:2]) #All values in second index of first dimension that are also include in the first to second index of the second and third dimensions
print("Using colons")
print(arr12[:, 0:2, 0:2]) #Only a colon means select all values in a dimension
print(arr12[:,2:,0:2]) #Can also use colons to select all values before or after an index
Original array
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 50]]]
All values in first index of first dimension
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
All values in second index of first dimension
[[25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]
 [40 41 42 43 44]
 [45 46 47 48 50]]
All values in first index of first dimension and first index of second dimension
[0 1 2 3 4]
A single value specified with three indexes, one for reach dimension
43
Incorporating ranges
[[25 26]
 [30 31]]
Using colons
[[[ 0  1]
  [ 5  6]]

 [[25 26]
  [30 31]]]
[[[10 11]
  [15 16]
  [20 21]]

 [[35 36]
  [40 41]
  [45 46]]]

Once values have been selected using index notation they can be changed. In the example below I have converted all values in the first index of the first dimension and the first index of the second dimension to 0.

arr12[0][0] = 0
print(arr12)
[[[ 0  0  0  0  0]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 50]]]

Boolean Arrays

It is also possible to create arrays of Boolean values as demonstrated below.

arr13 = np.array([True, False, True, False, True, False, True, False, False])
arr13b = arr13.reshape(3, 3)
print(arr13b)
[[ True False  True]
 [False  True False]
 [ True False False]]

Comparison Operators can be used to compare each value in an array to a value and return the Boolean result to the associated position in a new array.

arr10 = np.random.randint(1, 1200, 100)
arr10b = arr10.reshape(10, 10)
print(arr10b)
arr10bool = arr10b > 150
print(arr10bool)
[[ 421 1174  908 1126  575  904  557  619  712  164]
 [ 295  594  912  601  561  470   87  866  127 1065]
 [1042   78  571 1078  265  996  422  557  198  988]
 [ 476 1066  510   25  315 1134  657   98  237  593]
 [ 837  172   99  437  810  700   83  441  909 1023]
 [ 528  421  184  453  729  902  871 1089  238 1181]
 [ 942  585  735  770  125 1090  810  567  838  419]
 [ 505 1171  758 1148  490  455  192  753  462    4]
 [ 567  360  658  820  256  915  926  934  787  822]
 [ 211  381  113  978    6  378  772  617  991  110]]
[[ True  True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True False  True False  True]
 [ True False  True  True  True  True  True  True  True  True]
 [ True  True  True False  True  True  True False  True  True]
 [ True  True False  True  True  True False  True  True  True]
 [ True  True  True  True  True  True  True  True  True  True]
 [ True  True  True  True False  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True False]
 [ True  True  True  True  True  True  True  True  True  True]
 [ True  True False  True False  True  True  True  True False]]

Copy vs. View

In the first module, I explained how, for mutable data types, setting a variable equal to another variable will result in a reference to the original object or data in memory. So, changes to the original object or the new object will change both since they reference the same data.

The behavior for NumPy arrays is similar. Thus, two methods are available for creating a new variable relative to an existing variable: copy() and view().

When using view(), the variable will reference the same data or object in memory. So, changes to the original variable or the new variable created using view() will result in changes to both. Also using view(), you can reference portions of an array. This allows you to work with a subset of the data values without copying or replicating the data in memory.

In contrast to view(), copy() will copy the data or object in memory, so changes made to the original or copied object will not impact the original object.

The three examples below demonstrate this behavior. In the first example, arr2 is created as a view of arr1 while arr3 is created as a copy of arr1. A subsequent change to arr1 changes arr1 and arr2 but not arr3. In the second example, a change to arr2, a view of arr1, impacts both arr1 and arr2 but not arr3. Lastly, changes to arr3 impacts only arr3 and not arr1 or arr2 since it is a copy of arr1 as opposed to a view.

In summary, if you want to make a copy of an array as opposed to referencing the original data, you should use the .copy() method.

import copy

arr1 = np.array(25)
arr2 = np.array(25)
arr3 = np.array(25)

arr1 = np.random.randint(1, 100, 25)
arr1 = arr1.reshape(5, 5)
print(arr1)

arr2 = arr1.view()
arr3 = arr1.copy()
arr1[:, :] = 0
print(arr1)
print(arr2)
print(arr3)
[[28  2 20 49 66]
 [56 11 49 29 33]
 [ 8 37 91 70 53]
 [18 97 60 25 96]
 [30 21 74 10 32]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[28  2 20 49 66]
 [56 11 49 29 33]
 [ 8 37 91 70 53]
 [18 97 60 25 96]
 [30 21 74 10 32]]
import copy

arr1 = np.array(25)
arr2 = np.array(25)
arr3 = np.array(25)

arr1 = np.random.randint(1, 100, 25)
arr1 = arr1.reshape(5, 5)
print(arr1)

arr2 = arr1.view()
arr3 = arr1.copy()
arr2[:, :] = 0
print(arr1)
print(arr2)
print(arr3)
[[65 23 12 29 68]
 [70 70 22 55 98]
 [83 29 95 91 21]
 [28  1 89 60 18]
 [62 87 74 69 16]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[65 23 12 29 68]
 [70 70 22 55 98]
 [83 29 95 91 21]
 [28  1 89 60 18]
 [62 87 74 69 16]]
import copy

arr1 = np.array(25)
arr2 = np.array(25)
arr3 = np.array(25)

arr1 = np.random.randint(1, 100, 25)
arr1 = arr1.reshape(5, 5)
print(arr1)

arr2 = arr1.view()
arr3 = arr1.copy()
arr3[:, :] = 0
print(arr1)
print(arr2)
print(arr3)
[[82 52 25 32 94]
 [33 77 85 11  6]
 [59 78 53 15 36]
 [72  4 24 75 87]
 [56 18 35 70 32]]
[[82 52 25 32 94]
 [33 77 85 11  6]
 [59 78 53 15 36]
 [72  4 24 75 87]
 [56 18 35 70 32]]
[[82 52 25 32 94]
 [33 77 85 11  6]
 [59 78 53 15 36]
 [72  4 24 75 87]
 [56 18 35 70 32]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

Array Arithmetic and Operations

It is generally easy to perform mathematical operations on arrays as demonstrated below. In all cases, the same operation is applied to all elements in the array.

arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)
print(arr14b)
print(arr14b+21)
print(arr14b-52)
print(arr14b*2)
print(arr14b/3)
print(arr14b**2)
[[ 381   96  992 1069  423]
 [ 213  476 1155  233  624]
 [1035  947  147  490  302]
 [ 537  346  578  173   92]
 [ 606 1157   95  537  236]]
[[ 402  117 1013 1090  444]
 [ 234  497 1176  254  645]
 [1056  968  168  511  323]
 [ 558  367  599  194  113]
 [ 627 1178  116  558  257]]
[[ 329   44  940 1017  371]
 [ 161  424 1103  181  572]
 [ 983  895   95  438  250]
 [ 485  294  526  121   40]
 [ 554 1105   43  485  184]]
[[ 762  192 1984 2138  846]
 [ 426  952 2310  466 1248]
 [2070 1894  294  980  604]
 [1074  692 1156  346  184]
 [1212 2314  190 1074  472]]
[[127.          32.         330.66666667 356.33333333 141.        ]
 [ 71.         158.66666667 385.          77.66666667 208.        ]
 [345.         315.66666667  49.         163.33333333 100.66666667]
 [179.         115.33333333 192.66666667  57.66666667  30.66666667]
 [202.         385.66666667  31.66666667 179.          78.66666667]]
[[ 145161    9216  984064 1142761  178929]
 [  45369  226576 1334025   54289  389376]
 [1071225  896809   21609  240100   91204]
 [ 288369  119716  334084   29929    8464]
 [ 367236 1338649    9025  288369   55696]]

It is also possible to perform mathematical operations on sets of arrays as long as they have the same shape. In such cases, elements are matched based on having the same position within the array.

arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)
print(arr14b)
print(arr14b+arr14b)
print(arr14b-arr14b)
[[ 365  332  812  427  402]
 [  76    1  108  295  461]
 [ 824  255  917  328 1157]
 [1114  900  216  487  169]
 [ 353  405  290  191 1100]]
[[ 730  664 1624  854  804]
 [ 152    2  216  590  922]
 [1648  510 1834  656 2314]
 [2228 1800  432  974  338]
 [ 706  810  580  382 2200]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

To summarize the results from above it is possible to:

  1. perform mathematical operations between an array with any shape and a scalar (i.e., single value)
  2. perform mathematical operations between arrays that have the same shape

There are some other cases in which it is possible to perform mathematical operations using a technique known as broadcasting. The following rules summarize when broadcasting can be used and how.

  1. If two arrays have a different number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading side (for example, to multiply an array of shape (3) by an array of shape (3,3), the first array must be converted to shape (1,3)).
  2. If the shape of the arrays does not match in a dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In the example below, an array of shape (6, 6) is multiplied by an array of shape (6). This requires that the second array be broadcasted to a shape of (1, 6).

arr1 = np.random.randint(1, 100, 36)
arr1b = arr1.reshape(6, 6)

arr2 = np.ones((6))
arr2[:] = 2

print(arr1b)
print(arr2)
print(arr1b*arr2)
[[69  4 32 71 72  2]
 [45 66 96 54 88  7]
 [78  7 55 53 97 46]
 [22 91 29 34 85 37]
 [ 8 24 87 88 24  6]
 [33  6 64 38 29 44]]
[2. 2. 2. 2. 2. 2.]
[[138.   8.  64. 142. 144.   4.]
 [ 90. 132. 192. 108. 176.  14.]
 [156.  14. 110. 106. 194.  92.]
 [ 44. 182.  58.  68. 170.  74.]
 [ 16.  48. 174. 176.  48.  12.]
 [ 66.  12. 128.  76.  58.  88.]]

NumPy provides mathematical functions and methods for performing common tasks. The last block of code below provides some examples.

arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)

print(np.max(arr14b))
print(np.min(arr14b))
print(np.sqrt(arr14b))
1031
15
[[27.85677655 21.70253441 15.16575089 31.8747549  26.        ]
 [17.57839583 13.19090596  9.43398113 29.25747768 12.04159458]
 [ 3.87298335 25.29822128 10.39230485 25.43619468 17.43559577]
 [20.46948949 32.10918872 20.54263858 25.90366769 31.51190251]
 [ 6.08276253 22.7815715  31.33687923 18.41195264 30.18277655]]

Concluding Remarks

As mentioned above, NumPy is central to using Python for analyzing data. So, an understanding of NumPy is important for data and geospatial data scientists. Additional libraries and modules make use of NumPy to expand Python's data science functionalities. In the next section, we will explore one of these libraries: Pandas.