Data structures Flashcards
List
ordered collection of nodes/items/elements
key property: ordering
operations:
push: adds element to end of list
pop: removes last element of the list
shift: removes first element of the list
unshift: adds an element to the front of the list
insert: adds an element at a given position
remove: removes the element at a given position
iterate: returns the items in order
also: sorting, searching, copying, joining, splitting
Stack
LIFO (last-in first-out) list
objects added (pushed) and removed (popped) from the same end of the list
use array or linked-list
linked list: add and remove in constant time
array: use int sp to track index at top of stack
resizing array: increase the capacity of the stack if it is full (doubling) / decrease the capacity if it is too big (quarter)
Amortisation / Performance of push (stack)
geometric series:
T (N ) = Nc + (4c + 8c + · · · + (N − 1)c/2 + (N − 1)c )
first Nc is the cost of writing N elements
rest is for copying to new arrays
time for copying is 2Nc - 6c (geometric series)
T(N) = 3Nc - 6c for this N
T(N) ≤ 3Nc - 6c for all N, so
T(N) = O(N) for N pushes
a push takes O(N)/N = Θ(1) time, on average
push time is AMORTISED Θ(1) (NOT same as Θ(1))
Amortisation
amortized analysis considers a sequence of operations
cost of expensive operations is ‘amortized’ across the sequence…
can never “be in debt”
pick a value to ‘pay’ per operation - show that this value covers all costs (never in debt)
Resizable array
compared with normal bounded-size array:
average insert time Θ(1) for both
actual time for inserting objects is higher (3c v c)
some very expensive inserts as N increases
keep benefits over linked list: random access, easy implementation
Queue
a list
next object removed
- earliest on added (FIFO)
- one with highest priority (Priority Queue)
does not support indexed access
Priority Queue
objects have keys
highest/lowest key is always at front
does not support indexed access
key property: maintain order within each branch
highest or lowest key at the root
add: O(N)
choose a short branch to keep longest branch as short as possible; start at the bottom and find the root
search: O(N)
Heap
a tree in an array
when adding: choose shortest branch
restore the shape then restore the order
always remove object at the root
restore shape then restore order
Binary heap
O(log2 N) for add and remove one object
height is Θ(log2 N)
time limited by the height of the heap
Set
unordered collection of objects each having a unique key
put (Add)
retrieve (Get)
by unique key
key is id for the data; need to compare keys, so must define < and =
Binary Search Tree
T.root is a pointer to the root node
at every node n in the tree..
n.left points to the root node of n’s left subtree
n.right points to the root node of n’s right subtree
can access key found at n, e.g. n.data.key
add: new key is always added as a leaf node - start at the root, move down, comparing against each key
Red-black tree
extension of binary search trees
RBT’s are self-balancing BST’s
(BST search O(N) in worst case
red-black tree, 5 properties
- all nodes (including NILs) are either red or black
- the root node is black
- every leaf (all NIL) is black
- both children of a red node are black
- all paths from a node to a descendant leaf contain the same number of black nodes
have a limited height: O(log2 N)
put is much more complicated
new node always colored red, recolor and problem will move closer to the root
restoration procedure works up from leaf towards root
at each node performs constant theta action
get: O(log2 N)
put: O(log2 N) time
Hash table
index data by key (indirectly)
like an array with m positions
hash function h to turn key into index
deterministic: key k always has same result
an object with key k is stored at T[h(k)]
time taken by h depends only on k
general set of possible keys is large if not unlimited
limitations, hash table does not support:
in order iteration
next key/object
minimum key
maximum key
because objects are stored in random order (by design)
Perfect hashing
every key is mapped to a different value ie no collisions
Uniform distribution (hashing)
if keys are not uniformly distributed, hashing can be very poor
Collisions
hashing: all objects whose key hashes to h(k) are placed into a linked list
table contains a pointer to the list
Chaining
resolve collisions by adding bucket at location
put: Θ(1) time
unaffected by uniformity of hashing
get: O(N) (worst case is all elements in one chain, O(N)
expected time is Θ(N/m)
N/m is called the load factor
SUHA
expected time for an unsuccessful search for key k in a hash table with m slots, containing N objects
expected length of chain: N/m
probability of needing to search chain at T[I] is 1/m
expected number of comparisons (including loop termination) is 1 + N/m
expected time complexity: Θ(N/m)
reasoning similar for successful get Θ(N/m)
if m is proportional to N, expected time for GET is amortized Θ(1)
design of table needs to ensure N/m is Θ(1)
as N increases, table needs to be resized (geometrically) - number of buckets is increased to keep N/m below some constant
all existing data has to be rehashed
Probing
probe until find an empty position
simplest: linear: consecutive slots are probed beginning with h(k) up to h(k) - 1
Uniform Hashing: implies every permutation is possible
a hash function h produces uniform hashing for a table with m buckets if, for each possible key k, the probability that ⟨h(k, 0), . . . , h(k, m − 1)⟩ is some permutation p of ⟨0,…,m−1⟩ is the same for all such p
linear probing DOES NOT produce uniform hashing
assuming uniform hashing, the expected number of keys compared when inserting an object depends on the load factor N/m
each probe is to a random bucket with probability N/m it is occupied
is m proportional to N, expected time for Put and Get is amortized constant
Simple Uniform Hashing Assumption
With reference to chaining
Given a hash table T with m buckets, using hash function h, the simple uniform hashing assumption (SUHA) states that each different key k is equally likely to hash into any of the buckets. So the probability that h(k) = I, for all 1 ≤ I ≤ m, is 1/m
most probable / expected outcome: same number in each bucket
expected time for an unsuccessful search for key k in a hash table with m slots, containing N objects
expected length of chain: N/m
probability of needing to search chain at T[I] is 1/m
expected number of comparisons (including loop termination) is 1 + N/m
expected time complexity: Θ(N/m)
reasoning similar for successful get Θ(N/m)
Uniform Hashing
with reference to probing
a hash function h produces uniform hashing for a table with m buckets if, for each possible key k, the probability that ⟨h(k, 0), . . . , h(k, m − 1)⟩ is some permutation p of ⟨0,…,m−1⟩ is the same for all such p
linear probing DOES NOT produce uniform hashing
assuming uniform hashing, the expected number of keys compared when inserting an object depends on the load factor N/m
each probe is to a random bucket with probability N/m it is occupied
is m proportional to N, expected time for Put and Get is amortized constant