Data structures Flashcards
List
ordered collection of nodes/items/elements
key property: ordering
operations:
push: adds element to end of list
pop: removes last element of the list
shift: removes first element of the list
unshift: adds an element to the front of the list
insert: adds an element at a given position
remove: removes the element at a given position
iterate: returns the items in order
also: sorting, searching, copying, joining, splitting
Stack
LIFO (last-in first-out) list
objects added (pushed) and removed (popped) from the same end of the list
use array or linked-list
linked list: add and remove in constant time
array: use int sp to track index at top of stack
resizing array: increase the capacity of the stack if it is full (doubling) / decrease the capacity if it is too big (quarter)
Amortisation / Performance of push (stack)
geometric series:
T (N ) = Nc + (4c + 8c + · · · + (N − 1)c/2 + (N − 1)c )
first Nc is the cost of writing N elements
rest is for copying to new arrays
time for copying is 2Nc - 6c (geometric series)
T(N) = 3Nc - 6c for this N
T(N) ≤ 3Nc - 6c for all N, so
T(N) = O(N) for N pushes
a push takes O(N)/N = Θ(1) time, on average
push time is AMORTISED Θ(1) (NOT same as Θ(1))
Amortisation
amortized analysis considers a sequence of operations
cost of expensive operations is ‘amortized’ across the sequence…
can never “be in debt”
pick a value to ‘pay’ per operation - show that this value covers all costs (never in debt)
Resizable array
compared with normal bounded-size array:
average insert time Θ(1) for both
actual time for inserting objects is higher (3c v c)
some very expensive inserts as N increases
keep benefits over linked list: random access, easy implementation
Queue
a list
next object removed
- earliest on added (FIFO)
- one with highest priority (Priority Queue)
does not support indexed access
Priority Queue
objects have keys
highest/lowest key is always at front
does not support indexed access
key property: maintain order within each branch
highest or lowest key at the root
add: O(N)
choose a short branch to keep longest branch as short as possible; start at the bottom and find the root
search: O(N)
Heap
a tree in an array
when adding: choose shortest branch
restore the shape then restore the order
always remove object at the root
restore shape then restore order
Binary heap
O(log2 N) for add and remove one object
height is Θ(log2 N)
time limited by the height of the heap
Set
unordered collection of objects each having a unique key
put (Add)
retrieve (Get)
by unique key
key is id for the data; need to compare keys, so must define < and =
Binary Search Tree
T.root is a pointer to the root node
at every node n in the tree..
n.left points to the root node of n’s left subtree
n.right points to the root node of n’s right subtree
can access key found at n, e.g. n.data.key
add: new key is always added as a leaf node - start at the root, move down, comparing against each key
Red-black tree
extension of binary search trees
RBT’s are self-balancing BST’s
(BST search O(N) in worst case
red-black tree, 5 properties
- all nodes (including NILs) are either red or black
- the root node is black
- every leaf (all NIL) is black
- both children of a red node are black
- all paths from a node to a descendant leaf contain the same number of black nodes
have a limited height: O(log2 N)
put is much more complicated
new node always colored red, recolor and problem will move closer to the root
restoration procedure works up from leaf towards root
at each node performs constant theta action
get: O(log2 N)
put: O(log2 N) time
Hash table
index data by key (indirectly)
like an array with m positions
hash function h to turn key into index
deterministic: key k always has same result
an object with key k is stored at T[h(k)]
time taken by h depends only on k
general set of possible keys is large if not unlimited
limitations, hash table does not support:
in order iteration
next key/object
minimum key
maximum key
because objects are stored in random order (by design)
Perfect hashing
every key is mapped to a different value ie no collisions
Uniform distribution (hashing)
if keys are not uniformly distributed, hashing can be very poor