NumPy argsort() and lexsort()

NumPy provides powerful tools for indirect and multi-key sorting using the argsort() and lexsort() functions. These methods are essential when you need to sort by index positions or sort using multiple sorting keys (columns).

Why Use argsort() and lexsort()?

  • argsort(): Returns the indices that would sort an array. Useful for ranking, ordering, or sorting related arrays using index positions.
  • lexsort(): Performs an indirect stable sort using multiple keys. Ideal for sorting by primary and secondary criteria, like sorting a table by last name and then first name.

These functions are indispensable in data processing, especially when working with structured arrays, DataFrames, or performing custom multi-key sorts.


Understanding np.argsort()

The np.argsort() function returns the indices that would sort an array. Instead of returning the sorted values themselves, it gives you the order of the elements. This is especially useful for sorting related arrays, ranking elements, or reordering based on another array's sort order.

Example: Using argsort() for Sorting Indices

python
import numpy as np

arr = np.array([10, 40, 20, 90])

# Get the indices that would sort the array
indices = np.argsort(arr)

# Use indices to get sorted array
sorted_arr = arr[indices]

print("Original:", arr)
print("Indices:", indices)
print("Sorted using indices:", sorted_arr)

How It Works

  • np.argsort(arr) returns the indices that would sort arr.
  • You can use these indices to reorder the original array: arr[indices].
  • Great for indirect sorting, ranking, and sorting multiple arrays based on the same criteria.

Output

Original: [10 40 20 90]
Indices: [0 2 1 3]
Sorted using indices: [10 20 40 90]

💡 Tip: Use argsort() when you need the ranking or relative position of elements, not just the sorted values.


np.argsort() with Parameters

The argsort() function supports several parameters that let you customize how sorting is done. These include the axis to sort along, the sorting algorithm to use, and how to sort structured arrays.

argsort

python
numpy.argsort(a, axis=-1, kind='quicksort', order=None)

Key Parameters

  • a: Input array.
  • axis: Axis along which to sort. Default is -1 (last axis). Use None to sort the flattened array.
  • kind: Sorting algorithm. Options: 'quicksort' (default), 'mergesort', 'heapsort', 'stable'.
  • order: Used for sorting structured arrays by field name(s).

Example: Sorting with Axis and Kind

python
import numpy as np

arr = np.array([[3, 1, 2],
                [9, 7, 8]])

# Sort along each row (axis=1) using 'mergesort'
row_indices = np.argsort(arr, axis=1, kind='mergesort')

# Sort the flattened array (axis=None)
flat_indices = np.argsort(arr, axis=None)

print("Original array:\n", arr)
print("Argsort by row:\n", row_indices)
print("Argsort flattened:", flat_indices)

How It Works

  • The 2D array arr has two rows and three columns.
  • np.argsort(arr, axis=1, kind='mergesort') returns the indices that would sort each row individually.
  • For each row, the indices indicate the positions of the elements in ascending order.
  • np.argsort(arr, axis=None) flattens the array and returns indices that would sort the entire array as if it were 1D.
  • Using these indices helps you reorder or rank elements without modifying the original array.

Output

Original array:
 [[3 1 2]
  [9 7 8]]
Argsort by row:
 [[1 2 0]
  [1 2 0]]
Argsort flattened: [1 2 0 4 5 3]

Example: Sorting Structured Arrays with order

python
import numpy as np

# Create a structured array
data = np.array([(3, 'Toaster'), (1, 'Oven'), (2, 'Blender')],
                dtype=[('id', 'i4'), ('name', 'U10')])

# Sort by 'id'
sorted_by_id = data[np.argsort(data, order='id')]

# Sort by 'name'
sorted_by_name = data[np.argsort(data, order='name')]

print("Sorted by id:", sorted_by_id)
print("Sorted by name:", sorted_by_name)

How It Works

  • The structured array data contains fields 'id' (integer) and 'name' (string).
  • np.argsort(data, order='id') returns the indices that would sort the array by the 'id' field.
  • Indexing the array with these sorted indices (data[np.argsort(...)]) creates a new array sorted by that field.
  • Similarly, sorting by 'name' orders the array alphabetically by the string field.
  • This method lets you sort structured arrays by any specified field, beyond just numeric values.

Output

Sorted by id: [(1, 'Oven') (2, 'Blender') (3, 'Toaster')]
Sorted by name: [(2, 'Blender') (1, 'Oven') (3, 'Toaster')]

💡 Tip: Use kind='stable' when you need to preserve the order of equal elements, especially with structured data or secondary sorts.


Understanding np.lexsort()

The np.lexsort() function is used to perform indirect sorting using multiple keys. It's especially useful when you need to sort structured or tabular data by more than one column — similar to how you might sort products by category, then by name.

You pass a tuple of arrays to lexsort(), and it returns the indices that would sort the data based on those keys. The sorting starts from the last array in the tuple (which acts as the primary key), and moves leftward.

Visualization

Before Sorting

Index | Product Name  | Category
------|---------------|---------
  0   | Blender       | Kitchen
  1   | Toaster       | Kitchen
  2   | Hammer        | Tools
  3   | Screwdriver   | Tools

After Sorting (by Category, then Product Name)

Index | Product Name  | Category
------|---------------|---------
  2   | Hammer        | Tools
  3   | Screwdriver   | Tools
  0   | Blender       | Kitchen
  1   | Toaster       | Kitchen

Example: Sorting by Category, Then Product Name

python
import numpy as np

product_names = np.array(['Blender', 'Toaster', 'Hammer', 'Screwdriver'])
categories = np.array(['Kitchen', 'Kitchen', 'Tools', 'Tools'])

# Sort by category, then product name
indices = np.lexsort((product_names, categories))

# Use indices to reorder
sorted_products = list(zip(product_names[indices], categories[indices]))

print("Sorted products:")
for product in sorted_products:
    print(product)

How It Works

  • np.lexsort() takes a tuple of 1D arrays of equal length.
  • Sorting is performed using the last array as the primary key.
  • Returns an array of indices that can be used to reorder your data.

Output

Sorted products:
('Hammer', 'Tools')
('Screwdriver', 'Tools')
('Blender', 'Kitchen')
('Toaster', 'Kitchen')

💡 Tip: lexsort() is ideal when you want to sort records or table-like data using more than one field.


Frequently Asked Questions

What does NumPy's argsort() do?

np.argsort() returns the indices that would sort an array, which you can use to indirectly sort the array or reorder other arrays consistently.


How is lexsort() different from argsort()?

lexsort() allows sorting by multiple keys (like multiple columns). It returns indices that sort data first by the last key, then by preceding keys, enabling multi-level sorting.


Can I use argsort() to sort multi-dimensional arrays?

argsort() sorts along a given axis and returns the indices for sorting. For multi-key sorting across multiple columns, lexsort() is better suited.


How do I sort an array by multiple columns using NumPy?

Use np.lexsort() with a tuple of keys, with the primary sort key last, to perform multi-column indirect sorting.


Are argsort() and lexsort() stable sorting algorithms?

Yes, both use stable sorting algorithms that preserve the order of equal elements in the original array.



What's Next?

Coming up, we'll explore searchsorted in NumPy — a handy function to quickly find insertion points in sorted arrays. You'll learn how to efficiently locate positions to maintain sorted order and apply it in real-world scenarios.