NumPy, short for Numerical Python, is a foundational package for numerical computations in Python. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.
To install NumPy, you can use pip:
pip install numpy
Once installed, you can import it in your Python script or Jupyter notebook as follows:
import numpy as np
At the core of NumPy is the ndarray
object, which encapsulates n-dimensional arrays of homogeneous data types. Let's explore some basic operations and properties of NumPy arrays.
# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
arr_1d
array([1, 2, 3, 4, 5])
# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
NumPy arrays have several attributes that give information about the array's size, shape, and data type:
# Attributes of the 1D array
print('Shape:', arr_1d.shape)
print('Size:', arr_1d.size)
print('Number of dimensions:', arr_1d.ndim)
print('Data type:', arr_1d.dtype)
Shape: (5,) Size: 5 Number of dimensions: 1 Data type: int32
# Attributes of the 2D array
print('Shape:', arr_2d.shape)
print('Size:', arr_2d.size)
print('Number of dimensions:', arr_2d.ndim)
print('Data type:', arr_2d.dtype)
Shape: (3, 3) Size: 9 Number of dimensions: 2 Data type: int32
Just like Python lists, NumPy arrays can be indexed and sliced. This allows for efficient access to and modification of the array's contents.
# Indexing a 1D array
print('First element:', arr_1d[0])
print('Second element:', arr_1d[1])
print('Last element:', arr_1d[-1])
First element: 1 Second element: 2 Last element: 5
# Slicing a 1D array
print('First three elements:', arr_1d[:3])
print('Elements from index 2 to 4:', arr_1d[2:5])
print('Every second element:', arr_1d[::2])
First three elements: [1 2 3] Elements from index 2 to 4: [3 4 5] Every second element: [1 3 5]
# Indexing a 2D array
print('Element at (0,0):', arr_2d[0, 0])
print('Element at (1,2):', arr_2d[1, 2])
print('Second row:', arr_2d[1])
Element at (0,0): 1 Element at (1,2): 6 Second row: [4 5 6]
# Slicing a 2D array
print('First two rows and first two columns:\n', arr_2d[:2, :2])
print('All rows, every other column:\n', arr_2d[:, ::2])
First two rows and first two columns: [[1 2] [4 5]] All rows, every other column: [[1 3] [4 6] [7 9]]
NumPy arrays support a variety of operations, both unary (operations with one operand) and binary (operations with two operands). These operations are performed element-wise, which means they are applied to each element of the array separately.
# Unary operations
print('Original array:\n', arr_1d)
print('Array + 5:\n', arr_1d + 5)
print('Array squared:\n', arr_1d**2)
Original array: [1 2 3 4 5] Array + 5: [ 6 7 8 9 10] Array squared: [ 1 4 9 16 25]
# Binary operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print('arr1:', arr1)
print('arr2:', arr2)
print('arr1 + arr2:', arr1 + arr2)
print('arr1 * arr2:', arr1 * arr2)
arr1: [1 2 3] arr2: [4 5 6] arr1 + arr2: [5 7 9] arr1 * arr2: [ 4 10 18]
NumPy provides a comprehensive set of mathematical functions that can be applied element-wise to arrays. These include trigonometric, logarithmic, exponential, and statistical functions, among others.
# Some mathematical functions
print('Sin values:', np.sin(arr_1d))
print('Natural logarithm:', np.log(arr_1d))
print('Exponential:', np.exp(arr_1d))
Sin values: [ 0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427] Natural logarithm: [0. 0.69314718 1.09861229 1.38629436 1.60943791] Exponential: [ 2.71828183 7.3890561 20.08553692 54.59815003 148.4131591 ]
NumPy provides functions to compute aggregated values like sum, mean, median, etc. These can be applied to the entire array or along a specified axis in case of multi-dimensional arrays.
# Aggregation functions on 1D array
print('Sum:', np.sum(arr_1d))
print('Mean:', np.mean(arr_1d))
print('Standard Deviation:', np.std(arr_1d))
Sum: 15 Mean: 3.0 Standard Deviation: 1.4142135623730951
# Aggregation functions on 2D array
print('Total Sum:', np.sum(arr_2d))
print('Sum along columns:', np.sum(arr_2d, axis=0))
print('Sum along rows:', np.sum(arr_2d, axis=1))
Total Sum: 45 Sum along columns: [12 15 18] Sum along rows: [ 6 15 24]
Broadcasting is a powerful feature in NumPy that allows for operations between arrays of different shapes. It does this by 'stretching' the smaller array to match the shape of the larger array, without actually copying any data.
# Broadcasting in action
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('Original array:\n', arr)
# Adding a scalar to a 2D array
print('\nArray after adding 5:\n', arr + 5)
# Adding a 1D array to a 2D array
vec = np.array([1, 0, -1])
print('\nArray after adding [1, 0, -1]:\n', arr + vec)
Original array: [[1 2 3] [4 5 6] [7 8 9]] Array after adding 5: [[ 6 7 8] [ 9 10 11] [12 13 14]] Array after adding [1, 0, -1]: [[2 2 2] [5 5 5] [8 8 8]]
Apart from basic indexing and slicing, NumPy offers more advanced indexing techniques. This includes integer array indexing and boolean indexing.
# Integer array indexing
print('Original array:\n', arr_1d)
indices = np.array([1, 3, 4])
print('Elements at indices 1, 3, and 4:', arr_1d[indices])
Original array: [1 2 3 4 5] Elements at indices 1, 3, and 4: [2 4 5]
# Boolean indexing
print('Original array:\n', arr_1d)
mask = arr_1d > 3
print('Mask of elements greater than 3:', mask)
print('Elements greater than 3:', arr_1d[mask])
Original array: [1 2 3 4 5] Mask of elements greater than 3: [False False False True True] Elements greater than 3: [4 5]
NumPy provides functionalities to change the shape of arrays without changing their data. This is particularly useful when you need to prepare data for certain libraries or operations that expect data in a particular shape.
# Reshaping an array
print('Original 2D array:\n', arr_2d)
reshaped = arr_2d.reshape(1, 9)
print('\nReshaped to 1x9 array:\n', reshaped)
Original 2D array: [[1 2 3] [4 5 6] [7 8 9]] Reshaped to 1x9 array: [[1 2 3 4 5 6 7 8 9]]
# Transposing an array
print('Original 2D array:\n', arr_2d)
transposed = arr_2d.T
print('\nTransposed array:\n', transposed)
Original 2D array: [[1 2 3] [4 5 6] [7 8 9]] Transposed array: [[1 4 7] [2 5 8] [3 6 9]]
The shapes (n, 1)
and (n,)
might seem similar, but they represent different structures:
(n,)
: Represents a one-dimensional array with n
elements.(n, 1)
: Represents a two-dimensional array with n
rows and 1 column.Let's explore these shapes in more detail and see how to convert between them.
# Creating an array with shape (n,)
one_d_array = np.array([1, 2, 3, 4, 5])
print('1D array:', one_d_array)
print('Shape:', one_d_array.shape)
# Reshaping to (n, 1)
two_d_array = one_d_array.reshape(-1, 1)
print('\n2D array:\n', two_d_array)
print('Shape:', two_d_array.shape)
1D array: [1 2 3 4 5] Shape: (5,) 2D array: [[1] [2] [3] [4] [5]] Shape: (5, 1)
# Converting back to shape (n,)
converted_one_d_array = two_d_array.reshape(-1)
print('Converted 1D array:', converted_one_d_array)
print('Shape:', converted_one_d_array.shape)
Converted 1D array: [1 2 3 4 5] Shape: (5,)
Another way to convert a multi-dimensional array into a one-dimensional array is by using the flatten
method. This method returns a copy of the original array, flattened to one dimension.
The ravel
method is another way to flatten multi-dimensional arrays into one dimension. It functions similarly to the flatten
method but with a key difference:
flatten
always returns a copy of the data.ravel
returns a flattened view of the original array whenever possible.Because of this behavior, modifications to the array returned by ravel
might affect the original array, whereas modifications to the array returned by flatten
will never affect the original array.
# Using the flatten method
flattened_array = two_d_array.flatten()
print('Flattened array:', flattened_array)
print('Shape:', flattened_array.shape)
# Using the ravel method
raveled_array = two_d_array.ravel()
print('Raveled array:', raveled_array)
print('Shape:', raveled_array.shape)
Flattened array: [1 2 3 4 5] Shape: (5,) Raveled array: [1 2 3 4 5] Shape: (5,)
Matrix operations are fundamental in linear algebra and have extensive applications in data science, especially in areas like machine learning.
The inner product, also known as the dot product, between two vectors is a single number obtained by multiplying corresponding entries and then summing those products. For two vectors a
and b
, the dot product is given by:
a . b = a1 b1 + a2 b2 + ... + an bn
Matrix multiplication, on the other hand, is a way to combine two matrices to produce a new matrix. It's defined such that the number in the i-th row and j-th column of the resulting matrix is the dot product of the i-th row of the first matrix and the j-th column of the second matrix.
In Python, with NumPy, the @
operator is used as a convenient way to perform matrix multiplication. It's more readable and concise than using the np.dot()
function.
# Demonstrating the inner product (dot product)
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])
dot_product = vector_a @ vector_b
dot_product
32
# Demonstrating matrix multiplication using @
matrix_A = np.array([[1, 2], [3, 4]])
matrix_B = np.array([[2, 0], [1, 3]])
result_matrix = matrix_A @ matrix_B
result_matrix
array([[ 4, 6], [10, 12]])
# Matrix and Vector Products
# Creating two matrices for demonstration
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 3]])
# Matrix multiplication using matmul()
matrix_product = np.matmul(A, B)
print('Matrix product using matmul():\n', matrix_product)
# Dot product of two vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot_product = np.dot(v1, v2)
print('\nDot product of v1 and v2:', dot_product)
Matrix product using matmul(): [[ 4 6] [10 12]] Dot product of v1 and v2: 32
NumPy provides several utility functions to create specific types of arrays with ease. Here are explanations for the functions you mentioned:
np.ones
:
np.ones((2, 3))
creates a 2x3 matrix filled with ones.np.zeros
:
np.zeros((3, 3))
creates a 3x3 matrix filled with zeros.np.diag
:
np.diag([1, 2, 3])
creates a 3x3 diagonal matrix with the diagonal [1, 2, 3].np.identity
:
np.identity(3)
creates a 3x3 identity matrix.Let's demonstrate each of these functions with examples.
# Demonstrating np.ones
ones_array = np.ones((2, 3))
print('Array filled with ones:\n', ones_array)
# Demonstrating np.zeros
zeros_array = np.zeros((3, 3))
print('\nArray filled with zeros:\n', zeros_array)
Array filled with ones: [[1. 1. 1.] [1. 1. 1.]] Array filled with zeros: [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
# Demonstrating np.diag
diagonal_matrix = np.diag([1, 2, 3])
print('\nDiagonal matrix:\n', diagonal_matrix)
extracted_diagonal = np.diag(diagonal_matrix)
print('\nExtracted diagonal from matrix:', extracted_diagonal)
# Demonstrating np.identity
identity_matrix = np.identity(3)
print('\nIdentity matrix:\n', identity_matrix)
Diagonal matrix: [[1 0 0] [0 2 0] [0 0 3]] Extracted diagonal from matrix: [1 2 3] Identity matrix: [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
np.concatenate
¶The np.concatenate
function is used to join two or more arrays along an existing axis. It's a versatile function that allows for the combination of arrays in various ways.
Parameters:
a1, a2, ...
: Arrays to be concatenated. They must have the same shape, except in the dimension corresponding to the specified axis.axis
: The axis along which the arrays will be joined. Default is 0.out
: If provided, the destination to place the result. The shape must be correct, matching that of what concatenate
would have returned if no out
argument were specified.Usage:
np.concatenate((a1, a2, ...), axis=0, out=None)
Let's look at some examples to understand how np.concatenate
works.
# Creating two 1-D arrays for demonstration
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
# Concatenating along axis 0 (default)
concatenated_1d = np.concatenate((array1, array2))
print('Concatenated 1-D array:', concatenated_1d)
Concatenated 1-D array: [1 2 3 4 5 6]
# Creating two 2-D arrays for demonstration
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
# Concatenating along axis 0 (rows)
concatenated_rows = np.concatenate((matrix1, matrix2), axis=0)
print('\nConcatenated along rows:\n', concatenated_rows)
# Concatenating along axis 1 (columns)
concatenated_columns = np.concatenate((matrix1, matrix2), axis=1)
print('\nConcatenated along columns:\n', concatenated_columns)
Concatenated along rows: [[1 2] [3 4] [5 6] [7 8]] Concatenated along columns: [[1 2 5 6] [3 4 7 8]]
np.arange
and np.linspace
¶The np.arange
and np.linspace
functions create grids of evenly spaced points between start and stop values. For np.arange
, you specify the space between points (step size). For np.linspace
, you specify the number of points (including start and stop).
# Generating numbers from 0 to 4
sequence1 = np.arange(5)
print('Numbers from 0 to 4:', sequence1)
# Generating numbers from 2 to 8 with a step of 2
sequence2 = np.arange(2, 9, 2)
print('\nNumbers from 2 to 8 with a step of 2:', sequence2)
# Generating numbers from 0 to 1 with a float step
sequence3 = np.arange(0, 1.1, 0.1)
print('\nNumbers from 0 to 1 with step = 0.1:', sequence3)
# Demonstrating np.linspace
sequence4 = np.linspace(0, 1, 11)
print('\nNumbers from 0 to 1 with step = 0.1:', sequence4)
Numbers from 0 to 4: [0 1 2 3 4] Numbers from 2 to 8 with a step of 2: [2 4 6 8] Numbers from 0 to 1 with step = 0.1: [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ] Numbers from 0 to 1 with step = 0.1: [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
np.random
Module¶The np.random
module in NumPy provides a suite of functions to generate random numbers for various distributions. It's a crucial tool for simulations, statistical sampling, and many other tasks in data science and scientific computing. Here are some of the important elements of the np.random
module:
Random Number Generation:
rand()
: Generates random numbers between 0 and 1 in a given shape.randn()
: Generates random numbers from a standard normal distribution (mean 0 and variance 1).randint()
: Generates random integers between specified low and high values.Random Sampling:
choice()
: Generates a random sample from a given 1-D array.shuffle()
: Modifies a sequence in-place by shuffling its contents.permutation()
: Returns a shuffled version of a sequence or returns a permuted range.Sampling from Distributions:
binomial()
: Draws samples from a binomial distribution.normal()
: Draws samples from a normal (Gaussian) distribution.poisson()
: Draws samples from a Poisson distribution.Random Seed:
seed()
: Sets the random seed, which allows for reproducibility of random numbers generated.Let's explore some of these functions with examples.
# Set a seed (not required)
np.random.seed(0)
# Random Number Generation
# Generating random numbers between 0 and 1
random_numbers = np.random.rand(5)
print('Random numbers between 0 and 1:', random_numbers)
# Generating random numbers from a standard normal distribution
normal_numbers = np.random.randn(5)
print('\nRandom numbers from a standard normal distribution:', normal_numbers)
# Generating random integers between 1 and 10
random_integers = np.random.randint(1, 10, size=5)
print('\nRandom integers between 1 and 10:', random_integers)
Random numbers between 0 and 1: [0.5488135 0.71518937 0.60276338 0.54488318 0.4236548 ] Random numbers from a standard normal distribution: [-0.84272405 1.96992445 1.26611853 -0.50587654 2.54520078] Random integers between 1 and 10: [6 9 5 4 1]
# Random Sampling
# Generating a random sample from a given 1-D array
sample_array = np.array([10, 20, 30, 40, 50])
random_choice = np.random.choice(sample_array, size=3)
print('Random sample from given array:', random_choice)
# Shuffling a sequence in-place
sequence_to_shuffle = np.array([1, 2, 3, 4, 5])
np.random.shuffle(sequence_to_shuffle)
print('\nShuffled sequence:', sequence_to_shuffle)
# Getting a permuted range
permuted_range = np.random.permutation(5)
print('\nPermuted range:', permuted_range)
Random sample from given array: [40 10 30] Shuffled sequence: [5 3 2 1 4] Permuted range: [0 1 2 4 3]
# Samples from Distributions
# Drawing samples from a binomial distribution
binomial_samples = np.random.binomial(n=10, p=0.5, size=5)
print('Samples from a binomial distribution:', binomial_samples)
# Drawing samples from a normal distribution with mean 0 and standard deviation 1
normal_samples = np.random.normal(loc=0, scale=1, size=5)
print('\nSamples from a normal distribution:', normal_samples)
# Drawing samples from a Poisson distribution with lambda=3
poisson_samples = np.random.poisson(lam=3, size=5)
print('\nSamples from a Poisson distribution:', poisson_samples)
Samples from a binomial distribution: [6 3 7 5 5] Samples from a normal distribution: [ 1.08081191 0.8644362 -0.74216502 2.26975462 -1.45436567] Samples from a Poisson distribution: [0 6 1 3 3]
np.linalg
Module¶The np.linalg
module in NumPy provides a collection of linear algebra functions. Here are some of the important elements of the np.linalg
module:
Matrix and Vector Products:
dot()
: Computes the dot product of two arrays.matmul()
: Performs matrix multiplication.inner()
: Computes the inner product of two arrays.outer()
: Computes the outer product of two arrays.Matrix Eigenvalues:
eig()
: Computes the eigenvalues and right eigenvectors of a square array.eigh()
: Computes the eigenvalues and eigenvectors of a Hermitian or symmetric matrix.eigvals()
: Computes the eigenvalues of a square array.Norms and Other Numbers:
norm()
: Computes the norm of a matrix or vector.det()
: Computes the determinant of an array.matrix_rank()
: Computes the numerical rank of a matrix.Solving Equations and Inverting Matrices:
solve()
: Solves a linear matrix equation.inv()
: Computes the multiplicative inverse of a matrix.Let's explore some of these functions with examples.
# Matrix Eigenvalues
# Eigenvalues and eigenvectors of matrix A
eigenvalues, eigenvectors = np.linalg.eig(A)
print('Eigenvalues of matrix A:', eigenvalues)
print('\nEigenvectors of matrix A:\n', eigenvectors)
Eigenvalues of matrix A: [-0.37228132 5.37228132] Eigenvectors of matrix A: [[-0.82456484 -0.41597356] [ 0.56576746 -0.90937671]]
# Norms and Other Numbers
# Norm of vector v1
vector_norm = np.linalg.norm(v1)
print('Norm of vector v1:', vector_norm)
# Determinant of matrix A
matrix_determinant = np.linalg.det(A)
print('\nDeterminant of matrix A:', matrix_determinant)
# Rank of matrix A
matrix_rank = np.linalg.matrix_rank(A)
print('\nRank of matrix A:', matrix_rank)
Norm of vector v1: 3.7416573867739413 Determinant of matrix A: -2.0000000000000004 Rank of matrix A: 2
# Solving Equations and Inverting Matrices
# Solving a linear matrix equation Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print('Solution x for Ax = b:', x)
# Inverse of matrix A
inverse_A = np.linalg.inv(A)
print('\nInverse of matrix A:\n', inverse_A)
Solution x for Ax = b: [1. 2.] Inverse of matrix A: [[-2. 1. ] [ 1.5 -0.5]]