Data Structures Flashcards
Arrays, Linked List, Hash Table, Stack, Queue, Heap, Binary Tree
array
a collection of data elements, linear order is given by their physical placement in memory
array: index (get)
O(1)
array: insert/remove at the end
amortized O(1) (for dynamically resizing array)
array: insert/remove at any position
O(n)
array: insert/remove at the beginning
O(n)
linked list
a collection of data elements not stored in adjacent physical memory. a group of nodes representing a sequence. each node is composed of data element itself and a pointer to the next node in the sequence.
linked list: insert/remove at any position
O(1)
linked list: index (get)
O(n)
binary tree
a tree with each node having at most two children
use cases of binary tree
- a way to access nodes based on value or label associated with each node. e.g. BST, binary heap (searching and sorting)
- as a representation of data with a relevant bifurcating structure where the particular arrangement of nodes under and/or to the left or right of other nodes is part of the information (that is, changing it would change the meaning).
balanced tree
|height(left) - height(right)| <= 1 for any node
height = O(log n)
binary tree: insert a leaf
O(1)
binary tree: remove a zero/single child node
O(1)
traversal of binary trees
pre-order, in-order, post-order
breath-first order
depth-first order
implementation of binary trees
nodes and pointers
binary search tree
sorted binary tree, node.val is greater or equal than all the nodes in the left subtree, node.val is less or equal than all the nodes in the right subtree
binary search tree: height
balanced bst: O(log n); any bst: O(n)
binary search tree: search
average O(log n), worst case O(n)
binary search tree: insert
average O(log n), worst case O(n)
binary search tree: remove
average O(log n), worst case O(n)
self-balancing binary search tree
AVL and black-red tree
AVL
Re-balance the tree by performing appropriate rotations on the subtree
black-red tree
1) Every node has a color either red or black.
2) Root of tree is always black.
3) There are no two adjacent red nodes (A red node cannot have a red parent or red child).
4) Every path from a node (including root) to any of its descendant NULL node has the same number of black nodes.
In Red-Black tree, we use two tools to do balancing.
1) Recoloring
2) Rotation
use case of binary search tree
lookup table, priority queue, dynamic sets
hash table
map between key and value, a hash function hash(key) = index, pointing to a buckets/slots contains the corresponding value(s)
collision of hash table
multiple key to the same index
hash table: get
O(1)
hash table: insert
amortized O(1)
hash table: remove
amortized O(1)
use case of hash table
associative arrays, database indexing, caches, and sets
hash table: choice of hash function
uniform distribution
hash table: collision resolution
- separate chaining: a linked list in each bucket
- open addressing: all entry records are stored in the bucket array itself. When a new entry has to be inserted, proceed in some probe sequence, until an unoccupied slot is found. search is the same procedure
hash table: dynamic resizing
copy all the entries to the new array when the size get bigger than a threshold, incremental resizing
queue
FIFO, addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position
implementations of queue
circular buffer (big array with head and tail floating around) linked list with tail pointer deque
queue: insert and delete (enqueue and dequeue)
O(1)
stack
last-in-first-out
stack: push
from top, O(1)
stack: pop
from top, O(1)
stack: peek
O(1)
implementations of stack
array
linked list
use cases of stack
expression evaluation and syntax parsing
backtracking, depth-first-search
memory
what is stack overflow?
Parameters and local variables are allocated on the stack (with reference types, the object lives on the heap and a variable in the stack references that object on the heap). The stack typically lives at the upper end of your address space and as it is used up it heads towards the bottom of the address space (i.e. towards zero).
Your process also has a heap, which lives at the bottom end of your process. As you allocate memory, this heap can grow towards the upper end of your address space. As you can see, there is a potential for the heap to “collide” with the stack.
priority queue
Each element has a “priority” associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue.
implementations of priority queue
heap: O(n) to build, O(log n) to insert and remove
self-balanced BST: O(n log n) to build initially, O(log n) to insert and remove
unordered array: O(n^2)
ordered array: O(n^2)
equivalence between priority queue and sorting
The semantics of priority queues naturally suggest a sorting method: insert all the elements to be sorted into a priority queue, and sequentially remove them; they will come out in sorted order. heap -------------------- heap sort self-balanced bst --- tree sort unordered array ---- selection sort ordered array -------- insertion sort
priority queue: build
heap implementation O(n), self-balance bst implementation O(n log n)
priority queue: push
O(n log n)
priority queue: pop
O(n log n)
use cases of priority queue
bandwidth management
dijkstra’s algorithm
hoffman coding
best-first search
heap
A specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property:
max-heap: each parent is larger or equal than its children
min-heap: each parent is smaller or equal than its children
Visualized as a tree, but is usually linear in storage (array, linked list).
heap: build
O(n)
heap: insert top
O(n log n)
heap: remove top
O(n log n)
heapify
create a heap from an unsorted list of elements
heap sort
sort a unsorted list of elements in place using max heap
min heap can not do in place
use cases of heap
priority queue k-way merge selection algorithm graph algorithms (dijkstra) order statistics (get median/k-largest)