Data Structures: John Samuel

This article is part of a series on Data Science.

1. Introduction

Data structures are fundamental constructs in computer science, enabling efficient data storage, access, manipulation, and organization. As computing needs evolve, so do the characteristics of data structures—from simple lists and trees to energy-efficient and disk-optimized models. This article provides a high-level yet rigorous overview of classical and next-generation data structures, categorized by operational patterns, access methods, and memory models.

2. Access Models: Sequential vs Parallel

Sequential access refers to reading or writing data in a specific linear order, as seen in arrays or linked lists. In contrast, parallel access enables concurrent read/write operations on data structures, essential in distributed systems or GPU-accelerated environments. Data structures such as concurrent hash maps, parallel tries, and read-optimized trees (e.g., B+ Trees) support scalable parallel access patterns.

3. Operations: CRUD, Traversal, Search, and Filtering

Operations on data structures are commonly categorized as:

Create: Construct new elements or structures (e.g., initialize an array or insert into a list).
Read (Access): Retrieve elements efficiently (e.g., hash lookup, tree search).
Update: Modify existing data (e.g., rebalancing a tree, updating a hashmap entry).
Delete: Remove data, often requiring structural adjustments.
Traversal: Iterate over all elements (e.g., DFS on graphs, in-order tree traversal).
Filtering: Efficiently test for membership or properties, as in Bloom filters.

4. Classical Data Structures

Arrays: Contiguous memory structures offering O(1) access time. Limitation: fixed size.
Lists: Dynamic linear structures—singly or doubly linked—with O(n) access but efficient insertions and deletions.
Trees: Hierarchical structures used for sorting, searching, and organizing data (e.g., binary search trees, AVL trees).
Graphs: Collections of nodes and edges modeling complex relationships (e.g., social networks, road maps).
Hash Maps: Key-value stores using hash functions, providing expected O(1) lookup and update time.

5. Specialized and Compressed Structures

Tries: Prefix trees used for efficient string storage and retrieval (e.g., dictionaries, autocomplete systems).
Bloom Filters: Probabilistic data structures that allow fast membership checks with a configurable false positive rate.
B Trees and B+ Trees: Balanced tree structures optimized for disk-based storage systems. B+ Trees store keys in internal nodes and values in leaves, enabling efficient range queries.
Lyndon Words: Strings that are lexicographically minimal among all their rotations. Used in algorithms for text compression and suffix automata.
Compressed Structures: Include succinct trees, wavelet trees, and run-length encoded bitmaps, which store data in nearly optimal space while allowing efficient queries.

6. Time and Space Complexity Analysis

Each data structure is associated with complexity trade-offs:

Structure	Access	Insert	Delete	Space
Array	`O(1)`	`O(n)`	`O(n)`	`O(n)`
Linked List	`O(n)`	`O(1)`	`O(1)`	`O(n)`
Hash Map	`O(1)` average	`O(1)` average	`O(1)` average	`O(n)`
B Tree	`O(log n)`	`O(log n)`	`O(log n)`	`O(n)`
Trie	`O(k)` where k = key length	`O(k)`	`O(k)`	`O(n·k)`

7. Memory Models: In-Memory vs Disk-Based

In-memory data structures reside entirely in RAM and provide low-latency access, ideal for fast applications such as caching, machine learning inference, or user-facing APIs.

Disk-based data structures are optimized for storage efficiency and block-based I/O, making them crucial in databases, file systems, and persistent key-value stores. B Trees, LSM Trees, and SSTables are common examples.

8. Next-Generation Data Structures and Energy Considerations

Modern applications in big data, mobile computing, and AI demand not only fast and scalable data structures but also energy-efficient ones. Several innovations address this:

Succinct Data Structures: These use close to the information-theoretic minimum space (e.g., bitwise wavelet trees), reducing cache pressure and energy use.
Immutable Data Structures: Favored in functional programming and concurrent systems for low-overhead updates with structural sharing, reducing unnecessary memory writes.
Energy-aware caching strategies: Coupled with data structures like energy-efficient tries, these aim to reduce power consumption on embedded and mobile devices.
AI-optimized Indexes: Learned indexes (e.g., by Kraska et al., 2018) model key distributions using machine learning to reduce lookup overhead and memory footprint.

9. Conclusion

Understanding data structures from both a theoretical and applied perspective is essential in modern computing. The choice of structure—whether classical or next-generation—depends on the operational needs, memory constraints, and energy consumption profiles of the application. As computing paradigms evolve, so will the sophistication and specialization of data structures used across disciplines.

10. References

Data structure - Wikipedia
Trie - Wikipedia
B-tree - Wikipedia
B+ tree - Wikipedia
Lyndon word - Wikipedia
Bloom filter - Wikipedia
The Case for Learned Index Structures - Kraska et al., 2018
CRUD Analysis (Internal Link)
Richard Johnson, Efficient String Processing with Trie Structures