NumPy Random Sampling
NumPy offers a powerful and flexible set of tools for random sampling—selecting random elements from arrays, generating random samples from various distributions, and more. These tools allow you to simulate data, perform bootstrap sampling, and conduct probabilistic experiments easily.
Key Features of NumPy Random Sampling
- Sampling with and without replacement: Choose random elements from arrays efficiently.
- Weighted sampling: Sample according to specified probabilities or weights.
- Shuffling: Randomly reorder elements of arrays in-place.
- Permutations: Create a randomly permuted copy of an array using rng.permutation().
Random Sampling with rng.choice()
The choice() method of the Generator class is used to randomly select elements from a 1D array or integer range. It is one of the most commonly used tools for random sampling in NumPy.
With rng.choice(), you can:
- Select a single random item from a list or array.
- Generate random samples of a specified size.
- Control whether sampling is done with or without replacement.
- Assign probabilities or weights to influence selection.
import numpy as np
rng = np.random.default_rng()
# Select one random element from a list
choice_single = rng.choice([1, 2, 3, 4, 5])
print("Random choice:", choice_single)
import numpy as np
rng = np.random.default_rng()
# Select one random element from a list
choice_single = rng.choice([1, 2, 3, 4, 5])
print("Random choice:", choice_single)
How It Works:
- choice() picks a random element (or elements) from the input array.
- The input can be a Python list, NumPy array, or an integer n (which acts like range(n)).
- If you provide size, it returns multiple samples.
- By default, sampling is done with replacement. Use replace=False to prevent duplicates.
Output
Random choice: 4
Random choice: 4
💡 Tip: Think of rng.choice() as a flexible way to select one or more items at random — with powerful options for customizing the sampling behavior.
Sampling With and Without Replacement
NumPy provides the choice() method on the Generator instance to randomly sample elements from a given 1D array. You can control whether sampling is done with or without replacement using the replace parameter.
import numpy as np
rng = np.random.default_rng()
data = [10, 20, 30, 40, 50]
# Sampling without replacement
sample_without_replacement = rng.choice(data, size=3, replace=False)
# Sampling with replacement
sample_with_replacement = rng.choice(data, size=3, replace=True)
print("Without replacement:", sample_without_replacement)
print("With replacement:", sample_with_replacement)
import numpy as np
rng = np.random.default_rng()
data = [10, 20, 30, 40, 50]
# Sampling without replacement
sample_without_replacement = rng.choice(data, size=3, replace=False)
# Sampling with replacement
sample_with_replacement = rng.choice(data, size=3, replace=True)
print("Without replacement:", sample_without_replacement)
print("With replacement:", sample_with_replacement)
How It Works:
- choice() randomly selects elements from the input array.
- size controls how many elements to sample.
- replace=False means elements cannot be selected more than once (no duplicates).
- replace=True allows elements to be selected multiple times (with duplicates).
Output
Without replacement: [30 10 50]
With replacement: [20 20 50]
Without replacement: [30 10 50]
With replacement: [20 20 50]
💡 Tip: Use sampling without replacement for random selection without duplication (e.g., drawing lottery numbers), and sampling with replacement when repetition is allowed (e.g., bootstrap resampling).
Weighted Sampling with rng.choice()
NumPy's rng.choice() method supports weighted random sampling through the p parameter. This allows you to assign custom probabilities to each element in the input array, so that some elements are more likely to be chosen than others.
The p parameter must be a sequence of probabilities that:
- Matches the length of the input array.
- Contains non-negative values.
- Sums to 1 (or will be normalized automatically).
import numpy as np
rng = np.random.default_rng()
items = ['orange', 'blue', 'green']
probabilities = [0.1, 0.3, 0.6] # green is most likely
# Weighted sampling with replacement
weighted_sample = rng.choice(items, size=5, replace=True, p=probabilities)
print("Weighted sample:", weighted_sample)
import numpy as np
rng = np.random.default_rng()
items = ['orange', 'blue', 'green']
probabilities = [0.1, 0.3, 0.6] # green is most likely
# Weighted sampling with replacement
weighted_sample = rng.choice(items, size=5, replace=True, p=probabilities)
print("Weighted sample:", weighted_sample)
How It Works:
- p assigns selection probability to each element in the input.
- Sampling is done with replacement by default.
- If the sum of probabilities isn’t exactly 1, NumPy will normalize them automatically.
Output
Weighted sample: ['blue' 'green' 'blue' 'green' 'green']
Weighted sample: ['blue' 'green' 'blue' 'green' 'green']
💡 Tip: Use weighted sampling when simulating biased processes, favoring certain outcomes, or when working with imbalanced datasets.
Shuffling Arrays with rng.shuffle()
NumPy’s rng.shuffle() randomly reorders the elements of an array in-place. This is useful when you want to randomly rearrange the order of data, such as shuffling rows before splitting a dataset.
Shuffling is done along the first axis of the array by default. The original array is modified — no new array is returned.
import numpy as np
rng = np.random.default_rng()
arr = np.array([1, 2, 3, 4, 5])
# Shuffle the array in-place
rng.shuffle(arr)
print("Shuffled array:", arr)
import numpy as np
rng = np.random.default_rng()
arr = np.array([1, 2, 3, 4, 5])
# Shuffle the array in-place
rng.shuffle(arr)
print("Shuffled array:", arr)
How It Works:
- rng.shuffle() modifies the input array in-place — it does not return a new one.
- Only the first axis is shuffled for multi-dimensional arrays (e.g., shuffles rows in a 2D array).
- If you need a shuffled copy instead, use rng.permutation().
Output
Shuffled array: [3 1 5 2 4]
Shuffled array: [3 1 5 2 4]
💡 Tip: Use rng.shuffle() when you want to reorder data for cross-validation, batch splitting, or randomized experiments.
Generating Random Permutations with rng.permutation()
NumPy’s rng.permutation() returns a new array with randomly permuted elements. Unlike rng.shuffle(), which modifies the original array in-place, permutation() creates a shuffled copy and leaves the original array unchanged.
You can pass either a sequence (like a list or NumPy array) or an integer n. If given an integer, it returns a permutation of np.arange(n).
import numpy as np
rng = np.random.default_rng()
data = [10, 20, 30, 40, 50]
# Create a permuted copy of the array
permuted = rng.permutation(data)
print("Original data:", data)
print("Permuted copy:", permuted)
# Generate a random permutation of integers 0 through 9
random_indices = rng.permutation(10)
print("Random permutation of 0-9:", random_indices)
import numpy as np
rng = np.random.default_rng()
data = [10, 20, 30, 40, 50]
# Create a permuted copy of the array
permuted = rng.permutation(data)
print("Original data:", data)
print("Permuted copy:", permuted)
# Generate a random permutation of integers 0 through 9
random_indices = rng.permutation(10)
print("Random permutation of 0-9:", random_indices)
How It Works:
- rng.permutation(seq) returns a new array with the elements of seq randomly reordered.
- rng.permutation(n) returns a random permutation of integers 0 to n - 1.
- The original input is not modified.
Output
Original data: [10, 20, 30, 40, 50]
Permuted copy: [40 10 30 50 20]
Random permutation of 0-9: [7 3 6 1 8 0 4 2 5 9]
Original data: [10, 20, 30, 40, 50]
Permuted copy: [40 10 30 50 20]
Random permutation of 0-9: [7 3 6 1 8 0 4 2 5 9]
💡 Tip: Use rng.permutation() when you need to preserve the original data while working with a randomized version — u seful for indexing, sampling, or cross-validation folds.
Frequently Asked Questions
How do I randomly select elements from an array in NumPy?
How do I randomly select elements from an array in NumPy?
Use rng.choice() to select one or more random elements from an array. It’s part of NumPy’s new random Generator API.
What is the difference between sampling with and without replacement?
What is the difference between sampling with and without replacement?
Sampling with replacement allows the same element to be chosen multiple times. Sampling without replacement ensures each element is chosen only once.
How do I perform weighted random sampling in NumPy?
How do I perform weighted random sampling in NumPy?
Provide the p parameter to rng.choice(), which accepts a list of probabilities corresponding to the likelihood of each element being selected.
What is the difference between rng.shuffle() and rng.permutation()?
What is the difference between rng.shuffle() and rng.permutation()?
rng.shuffle() modifies the original array in-place. In contrast, rng.permutation() returns a shuffled copy, preserving the original.
When should I use random sampling in data science?
When should I use random sampling in data science?
Random sampling is useful in tasks like simulation, bootstrapping, cross-validation, and generating synthetic datasets for model testing and evaluation.
What's Next?
Up next, we’ll explore Generating Random Numbers from Distributions in NumPy. This includes sampling from common statistical distributions such as normal, binomial, Poisson, and more — essential tools for simulations, probabilistic modeling, and statistical analysis in Python.