NumPy Structured Arrays

NumPy's structured arrays (also known as record arrays) allow you to store heterogeneous data types in a single NumPy array. This is useful for working with tabular or columnar data where each column can have a different type.

Key Features of Structured Arrays:

Heterogeneous fields: Each field (column) can have its own data type, such as int, float, or string.
Field names: Access data using named fields, making arrays behave like lightweight database tables or DataFrame rows.
Memory efficiency: Store structured data in a compact, contiguous block of memory.
Interoperability: Easily read from and write to binary files or interface with C-style structures.

Structured arrays are ideal when you need to organize mixed-type data efficiently and access elements by name or index. They are commonly used in data analysis, file I/O, and scientific simulations.

Creating Basic Structured Arrays

A structured array in NumPy is created by specifying a custom dtype with named fields. Each field can have its own data type and shape. This allows you to store complex, heterogeneous records in a single array.

python

import numpy as np

# Define a structured data type
person_dtype = np.dtype([
    ('name', 'U10'),   # Unicode string of max length 10
    ('age', 'i4'),     # 32-bit integer (4 bytes)
    ('weight', 'f4')   # 32-bit float   (4 bytes)
])

# Create a structured array with data
people = np.array([
    ('Jack', 25, 55.0),
    ('Jane', 30, 85.5)
], dtype=person_dtype)

print(people)

import numpy as np

# Define a structured data type
person_dtype = np.dtype([
    ('name', 'U10'),   # Unicode string of max length 10
    ('age', 'i4'),     # 32-bit integer (4 bytes)
    ('weight', 'f4')   # 32-bit float   (4 bytes)
])

# Create a structured array with data
people = np.array([
    ('Jack', 25, 55.0),
    ('Jane', 30, 85.5)
], dtype=person_dtype)

print(people)

How It Works:

np.dtype([...]) defines a structured data type with named fields and types.
Each element in the array is a record (like a row in a table) with named fields.
You can store strings, integers, floats, or even nested arrays inside each record.

Output

[('Jack', 25, 55. ) ('Jane', 30, 85.5)]

[('Jack', 25, 55. ) ('Jane', 30, 85.5)]

💡 Tip: Structured arrays are useful for representing row-like data where each column has a different type—similar to a lightweight DataFrame or a database row.

Data Types (dtypes) Formats in Structured Arrays

In NumPy structured arrays, each field has its own dtype, which defines the data type and memory layout for that field. The dtype can be specified using:

Standard NumPy data types (e.g., 'int32', 'float64')
Fixed-length strings (e.g., 'U10' for Unicode strings up to length 10)
Nested structured dtypes for complex records
Byte order and alignment specifiers

Common Data Type Formats

'i4' or 'int32': 32-bit signed integer
'i8' or 'int64': 64-bit signed integer
'u1' or 'uint8': 8-bit unsigned integer
'f4' or 'float32': 32-bit floating point number
'f8' or 'float64': 64-bit floating point number (double precision)
'b1': boolean (True/False)
'U10': Unicode string of max length 10 characters
'S10': Byte string of max length 10 bytes

Creating Nested Structured Arrays

Nested structured arrays allow fields within a structured array to themselves be structured arrays. This lets you represent complex, hierarchical data in a single NumPy array.

python

import numpy as np

# Define a nested structured dtype for chocolates
chocolate_dtype = np.dtype([
    ('name', 'U15'),  # Name of the chocolate
    ('price', 'f4'),  # Price in dollars
    ('ingredients', [('ingredient', 'U10'), ('quantity_g', 'f4')])  # Nested field for ingredient and quantity in grams
])

# Create an array with nested structured dtype
chocolates = np.array([
    ('Dark Delight', 3.50, ('Cocoa', 70.0)),
    ('Milk Magic', 2.75, ('Milk', 50.0)),
    ('Nutty Crunch', 4.00, ('Almonds', 30.0))
], dtype=chocolate_dtype)

print(chocolates)

import numpy as np

# Define a nested structured dtype for chocolates
chocolate_dtype = np.dtype([
    ('name', 'U15'),  # Name of the chocolate
    ('price', 'f4'),  # Price in dollars
    ('ingredients', [('ingredient', 'U10'), ('quantity_g', 'f4')])  # Nested field for ingredient and quantity in grams
])

# Create an array with nested structured dtype
chocolates = np.array([
    ('Dark Delight', 3.50, ('Cocoa', 70.0)),
    ('Milk Magic', 2.75, ('Milk', 50.0)),
    ('Nutty Crunch', 4.00, ('Almonds', 30.0))
], dtype=chocolate_dtype)

print(chocolates)

How It Works:

The ingredients field is nested with two subfields: ingredient (name) and quantity_g (quantity in grams).
Each record represents a chocolate with its name, price, and ingredient details.
You can access nested fields like chocolates['ingredients']['ingredient'] to get all ingredient names.

Output

[('Dark Delight', 3.5, ('Cocoa', 70.))
 ('Milk Magic', 2.75, ('Milk', 50.))
 ('Nutty Crunch', 4. , ('Almonds', 30.))]

[('Dark Delight', 3.5, ('Cocoa', 70.))
 ('Milk Magic', 2.75, ('Milk', 50.))
 ('Nutty Crunch', 4. , ('Almonds', 30.))]

💡 Tip: Nested structured arrays allow you to store complex product data compactly and access individual ingredient details easily.

Adding or Removing Fields in Structured Arrays

While NumPy structured arrays have fixed field names and types, you can add or remove fields using utility functions from np.lib.recfunctions. These functions return a new structured array with the updated schema.

To use them, you must first import the recfunctions module:

python

from numpy.lib import recfunctions as rfn

from numpy.lib import recfunctions as rfn

📥 Adding a New Field

Use rfn.append_fields() to add a new field to a structured array.

python

import numpy as np
from numpy.lib import recfunctions as rfn

# Original structured array
dtype = [('name', 'U10'), ('age', 'i4')]
data = np.array([('Ryan', 25), ('Owen', 30)], dtype=dtype)

# New field data to add
weights = [55.5, 80.0]

# Append 'weight' field
extended = rfn.append_fields(data, 'weight', weights, dtypes='f4', usemask=False)

print(extended)

import numpy as np
from numpy.lib import recfunctions as rfn

# Original structured array
dtype = [('name', 'U10'), ('age', 'i4')]
data = np.array([('Ryan', 25), ('Owen', 30)], dtype=dtype)

# New field data to add
weights = [55.5, 80.0]

# Append 'weight' field
extended = rfn.append_fields(data, 'weight', weights, dtypes='f4', usemask=False)

print(extended)

Output

[('Ryan', 25, 55.5) ('Owen', 30, 80.)]

[('Ryan', 25, 55.5) ('Owen', 30, 80.)]

💡 Tip: The usemask=False argument ensures the result is a standard structured array, not a masked array.

🗑️ Removing a Field

To remove one or more fields, use rfn.drop_fields(). It returns a new array without the specified field(s).

python

# Remove the 'age' field
cleaned = rfn.drop_fields(extended, 'age')

print(cleaned)

# Remove the 'age' field
cleaned = rfn.drop_fields(extended, 'age')

print(cleaned)

Output

[('Ryan', 55.5) ('Owen', 80.)]

[('Ryan', 55.5) ('Owen', 80.)]

💡 Tip: You can pass a list of field names to remove multiple fields at once.

Sorting Structured Arrays

NumPy allows you to sort structured arrays by one or more fields using np.sort() or np.argsort(). Sorting is based on the values in the specified field(s).

🔤 Sort by a Single Field

You can pass the field name to the order parameter to sort by a specific column.

python

import numpy as np

# Sample structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = np.array([
    ('Ryan', 25, 55.5),
    ('Owen', 30, 80.0),
    ('Aaron', 22, 68.0)
], dtype=dtype)

# Sort by age
sorted_by_age = np.sort(data, order='age')

print(sorted_by_age)

import numpy as np

# Sample structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = np.array([
    ('Ryan', 25, 55.5),
    ('Owen', 30, 80.0),
    ('Aaron', 22, 68.0)
], dtype=dtype)

# Sort by age
sorted_by_age = np.sort(data, order='age')

print(sorted_by_age)

Output

[('Aaron', 22, 68. ) ('Ryan', 25, 55.5) ('Owen', 30, 80. )]

[('Aaron', 22, 68. ) ('Ryan', 25, 55.5) ('Owen', 30, 80. )]

🔢 Sort by Multiple Fields

To sort by multiple fields (e.g. age first, then weight), pass a tuple of field names.

python

# Sort by age, then weight
sorted_multi = np.sort(data, order=('age', 'weight'))

print(sorted_multi)

# Sort by age, then weight
sorted_multi = np.sort(data, order=('age', 'weight'))

print(sorted_multi)

💡 Tip: Sorting by multiple fields works like SQL-style ORDER BY chaining — it breaks ties in the first field using the second field.

📌 Getting Sorted Indices

Use np.argsort() to get the sorted index order, which is useful for indirect sorting.

python

# Get sorted indices by age
sorted_indices = np.argsort(data, order='age')

# Use indices to reorder data
reordered = data[sorted_indices]

print(reordered)

# Get sorted indices by age
sorted_indices = np.argsort(data, order='age')

# Use indices to reorder data
reordered = data[sorted_indices]

print(reordered)

Output

[('Aaron', 22, 68. ) ('Ryan', 25, 55.5) ('Owen', 30, 80. )]

[('Aaron', 22, 68. ) ('Ryan', 25, 55.5) ('Owen', 30, 80. )]

💡 Tip: argsort() is helpful when you want to sort one array and apply the same ordering to another related array.

Operations on Structured Arrays

First, let's initialize a basic structured array called people that we will perform operations on:

python

import numpy as np

# Define a structured data type
person_dtype = np.dtype([
    ('name', 'U10'),   # Unicode string of max length 10
    ('age', 'i4'),     # 32-bit integer
    ('weight', 'f4')   # 32-bit float
])

# Create a structured array with data
people = np.array([
    ('Jack', 25, 55.0),
    ('Jane', 30, 85.5)
], dtype=person_dtype)

print(people)

import numpy as np

# Define a structured data type
person_dtype = np.dtype([
    ('name', 'U10'),   # Unicode string of max length 10
    ('age', 'i4'),     # 32-bit integer
    ('weight', 'f4')   # 32-bit float
])

# Create a structured array with data
people = np.array([
    ('Jack', 25, 55.0),
    ('Jane', 30, 85.5)
], dtype=person_dtype)

print(people)

Output

[('Jack', 25, 55. ) ('Jane', 30, 85.5)]

[('Jack', 25, 55. ) ('Jane', 30, 85.5)]

📌 Accessing Individual Fields

Access a specific field across all records using the field name as a key.

python

# Access the 'age' field
ages = people['age']
print(ages)

# Access the 'age' field
ages = people['age']
print(ages)

Output

[25 30]

[25 30]

🔍 Filtering Based on Conditions

Use boolean indexing to filter rows based on values in specific fields.

python

# Filter records where age > 26
older_than_26 = people[people['age'] > 26]
print(older_than_26)

# Filter records where age > 26
older_than_26 = people[people['age'] > 26]
print(older_than_26)

Output

[('Jane', 30, 85.5)]

[('Jane', 30, 85.5)]

➗ Vectorized Computation on Fields

Perform NumPy operations directly on fields, just like with regular arrays.

python

# Increase everyone's weight by 5%
people['weight'] *= 1.05
print(people)

# Increase everyone's weight by 5%
people['weight'] *= 1.05
print(people)

Output

[('Jack', 25, 57.75) ('Jane', 30, 89.775)]

[('Jack', 25, 57.75) ('Jane', 30, 89.775)]

📐 Field-Wise Aggregation

You can compute aggregates like mean, sum, or max on individual fields.

python

# Average age
mean_age = people['age'].mean()
print(mean_age)

# Average age
mean_age = people['age'].mean()
print(mean_age)

Output

27.5

27.5

🧱 Iterating Over Records

Use a loop or nditer to iterate over each record (row).

python

# Loop through each record
for person in people:
    print(person['name'], person['age'])

# Loop through each record
for person in people:
    print(person['name'], person['age'])

Output

Jack 25
Jane 30

Jack 25
Jane 30

💡 Tip: Structured arrays combine the flexibility of Python dictionaries with the speed of NumPy arrays, making them powerful tools for efficient data manipulation.

Frequently Asked Questions

What are NumPy structured arrays?

NumPy structured arrays are arrays that allow you to store data with different data types for each field, enabling you to work with heterogeneous data.

How do I create a NumPy structured array?

You can create a structured array using np.array() and specify the dtype that defines the field names and data types. For example:np.array([(1, 'John'), (2, 'Jane')], dtype=[('id', 'i4'), ('name', 'U10')])

How can I access data in a NumPy structured array?

You can access fields in a structured array using their field names. For example: array['name'] will return the 'name' field in the array.

Can I modify elements in a NumPy structured array?

Yes, you can modify elements in a structured array by accessing the field and assigning new values, just like in regular NumPy arrays.

What are record arrays in NumPy?

Record arrays are a subclass of structured arrays that allow you to access fields as attributes. This makes accessing fields more intuitive, like using object attributes.

How do I select rows from a NumPy structured array?

You can use boolean indexing or array slicing to select rows from a structured array. For example, array[array['id'] == 1] will select rows where the 'id' field is 1.

What's Next?

Up next, we dive into Masked Arrays in NumPy — a powerful tool for handling arrays with missing or invalid data. You'll learn how to create and manipulate masked arrays, apply conditions to mask data, and handle incomplete datasets with ease in real-world applications.