Data Structures Flashcards
Data Type
Defines potential set of values a variable can hold and the operations that can be performed on it
Aspects of Data types
- Range of values
- Size of memory required
- How binary data is interpreted
- Operations that can be performed on it
Abstract Data Type (ADT)
Conceptual model of how data is to be stored and the operations that can be carried out on it
Data Structure
- (Partial) implementation of user-defined types (ADTs)
- Each item known as member and can be of different data types
Purpose of data structures
To combine multiple data items under a single identifier
Key ADTs
- Stacks
- Queues
- Graphs
- Trees
- Hash tables
Static data structures
- Max size fixed at compilation
- Data stored in contiguous memory locations
Advantages of static data structures
- Fast to access
- (Sometimes) require less memory
Disadvantages of static data structures
- Inflexible
- (Sometimes) wastes space
- Operations like insertion and deletion are very slow
Dynamic data structures
- Can change size at run-time
- Data not (usually) stored contiguously
- Each item points to the next item in the structure
Advantages of dynamic data structures
- More flexible
- No wasted space or limits
- (Sometimes) require less memory
Disadvantages of dynamic data structures
- (Sometimes) require more memory (have to store pointers)
- Slower to access
Queue
Data structure comprised of sequence of items and front and rear pointers
Order of item insertion in Queue
Items added at rear and retrieved from front in First-in-First-Out (FIFO)
Key operations of Queue
- Enqueue
- Dequeue
- isEmpty
- isFull
Enqueue
If not isFull() then add new item to rear
Dequeue
If not isEmpty() then remove front item
isFull
Test whether queue is full
isEmpty
Test whether queue is empty
Uses of queues
- Queues of print jobs
- Managing input buffers
- Handling multiple file downloads
- Priority-based resource allocation
- …
Queue implementations
- Linear queues
- Circular queues
- Priority queues
Linear queues
- Stores items in ‘line’
- Front and rear pointers move down memory
Advantages of linear queues
- Simple to program
- Limited capacity useful when representing finite resources
Disadvantages of linear queues
Wasted capacity (whole queue shifts down)
Solving waste problem in linear queue
- Shuffling every item one index forward when first item deleted
- However, it’s inefficient (O(n))
Circular queues
- ‘Wraps’ front and rear pointers using MOD function
- Static, array-based
Advantages of circular queues
- Avoids wasting space
- No need for shuffling
Disadvantage of circular queue
More complex to program
Priority queue
Items enqueued (or dequeued) based on priority
Priority queue implementations
- Can be implemented statically using a 2D array
- More efficient to implement dynamically, with each node storing value and priority
Disadvantages of static priority queue
- Inherent drawbacks of static (e.g. space limitation)
- However, dequeue and enqueue time is still O(n) since linear search required to find highest priority
Stack
- Comprised of sequence of items and stack pointer that points to top of stack
- Last-in-First-Out (LIFO) - items added to top and removed from top
Stack structure
int maxSize;
int items[maxSize];
int stackPointer = -1;
Stack operations
- Push
- Pop
- Peek
- isFull
- isEmpty
Push
If not isFull, push value to top of stack
Pop
If not isEmtpy, returns value at top of stack and removes it
Peek
Returns item at top of stack (doesn’t remove it)
Uses of stack
- Reversing sequence of items
- Storing stack frames
- Storing processor state when handling interrupt
- Evaluating mathematical expressions written in Reverse Polish (postfix) notation
Stack overflow
Attempt to push item when stack is full
Stack underflow
Attempt to pop item when stack is empty
Call stack
Area of memory used to store data used by running processes and subroutines
Stack frame
- Return addresses (keeps reference to parent subroutine)
- Parameter values (keeps parameter values of different subroutines separate)
- Local variables (isolates these from rest of system)
Call stack pushing and popping
- Stack frame is pushed with each new subroutine call
- Stack frame popped when process completes
Linked list
Dynamic data structure comprised of nodes that point to its consecutive node in the list
Aims of linked lists
- Convenience of storing multiple values per element
- Flexibility to grow
- Quick at inserting and deleting items (no shuffling)
Disadvantages of linked lists
- Slower to access items (traversal)
- Takes up more memory (each node store value + 64 bit pointer)
Graphs
Set of nodes connected by edges
Directed vs undirected graph
- In directed, edges can be unidirectional
- In undirected, all edges are bidirectional
Weighted graph
Edges have a cost of traversing (e.g. time)
Purpose of graphs
Modelling complex relationships between interralated entities
Uses of graphs
- Human (social) networks
- Transport networks
- Computer networks
- Planning telecommunication networks
- Project management
Typical graph operations
- Adding / removing node
- Adding / removing edge
- Test edge existence
- Get edges / adjacent nodes
- Get weight of edge
Implementations of graphs
- Adjacency matrix (static)
- Adjacency list (dynamic)
Adjacency matrix
Edges between nodes represented by ‘1’ at intersection in matrix
E.g. if edges[0][1] == 1, then there is an edge between node 0 and node 1
Representing direction in adjacency matrix
- From and to index must be predefined and consistent throughout
- Unidirection = edges[0][1] == 1, but edges[1][0] == 0
Representing weights in adjacency matrix
Store weight at intersection or NULL for no edge
Size of adjacency matrix
N^2 (where n = no. nodes)
Advantages of adjacency matrix
- Edges can be quickly identified in O(1) time
- Optimised for dense graphs (many edges that need updating) due to O(1) look-up
- Simple implementation using 2D array
Disadvantages of adjacency matrix
- More memory required ( Mathjax )
- Less efficient insertion and deletion (limited space or shuffling -> O(n))
- If sparse graph, matrix is sparsely populated -> Wasted space
Adjacency list
Array of nodes (representing every node) pointing to list of adjacent nodes
Size of adjacency list
N + E (where N = no. nodes & E = no. edges)
Implementing adjacency list
- Python - dictionary
- C - linked list and pointers
Representing direction in adjacency list
edges[A] → B but edges[B] doesn’t → A
Representing weight in adjacency list
Each node in linked list stores weight
Advantages of adjacency list
- Requires less memory (data only stored where edges exist)
- Easier and quicker to insert/delete nodes (no shuffling)
- Optimised for sparse graphs (fewer edges and frequent adding and deletion of nodes)
Disadvantages of adjacency list
- Slower to test presence of edge (traversal = O(n))
- Harder to implement
- Less optimised for dense graph (more edges → more nodes in lists → slower traversal and more memory)
Graph traversal methods
- Depth-first
- Breadth-first
Depth-first
- Traverse one whole route before exploring breadth
- Recursively visits adjacent nodes from a given starting point until there are no adjacent nodes left, at which point the function returns
- Uses a stack (automatically implemented via call stack) to maintain FIFO order
Uses of depth-first traversal
- Solution to maze (when modelled as graph)
- Find valid path between nodes (not shortest path)
- Determining order of dependent processes (e.g. module compilation)
Breadth-first
- Visits all adjacent node before exploring depth
- Uses queue to maintain LIFO order of visiting
Uses of breadth-first traversal
- Identifiying friends in social networks
- Finding immediately connected devices and networks
- Crawling web pages to build index of linked pages
Tree
A connected, undirected graph with no cycles
Purposes of trees
- Manipulating and storing hierachical data (e.g. folder structures)
- Making information easy to search
- Manipulating sorted lists of data
Uses of trees
- Processing syntax
- Huffman trees
- Implementing probability and decision trees
Root
Node that has no incoming edges
Child
Node that has an incoming edge
Parent
Node that connects to its children nodes via outgoing edges
Subtree
Set of nodes and edges comprised of a parent and all its descendants
Leaf node
Node that has no children
Rooted tree
Has one node dedicated as the root
Binary tree
Rooted tree in which each node has a maximum of two children
Binary search tree
Holds items in a way that the tree can be searched quickly in O(log n) time complexity
Tree operations
- Determing whether node is root
- Getting list of children nodes
- Adding a child node
- …
Static implementations of binary tree
- Associative arrays
- Dictionary
- Adjacency matrix
Associative array implementation of binary tree
- Uses 3 1D arrays
- Nodes[] - stores data values in each node
- Left[] - stores index of node to left of node represented by index number
- Right[] - stores index of node to right of node represented by index number
Adjacency matrix
Same as graph
Dictionary implementation of binary tree
- Key = Value stored in node (e.g. name)
- Each index stores tuples holding keys of left and right nodes
Dynamic implementation of Binary tree
- Double linked list of nodes
- Each node store payload value, left node pointer and right node pointer
Advantages of dynamic binary tree
- Quicker insertion and deletion
- Maintains O(log n) search time (even with traversal requirement)
Tree traversal algorithms
- Pre-order
- In-order
- Post-order
Pre-order traversal
- Visit current node
- Traverse left subtree (recurse call function on left)
- Traverse right subtree (recurse call function on right)
In-order traversal
- Traverse left subtree
- Visit current node
- Traverse right subtree
Post-order traversal
- Traverse left subtree
- Traverse right subtree
- Visit current node
Hand-tracing tree traversal algorithms
- Draw outline around tree, starting from left of root and ending at right of root
- Pre-order - place dot to left of each node
- In-order - place dot beneath each node
- Post-order - place dot to right of each node
- When outline passes node, write/output its value
Uses of pre-order traversal
- Copying a tree
- Evaluating mathematical expression tree using prefix notation
Uses of in-order traversal
Outputting contents of binary tree in ascending order
Uses of post-order traversal
- Converting mathematical expressions from infix to postfix
- Producing postfix notation from expression tree
- Emptying a tree
- Completing dependent processes in order (e.g. recursive functions with multiple recursive calls)
Key-value pair
Association of a key with a value
Static implementations of key-value structure
- Multidimensional array
- 2 (or more) associative arrays to store key-value pairs
Limitations of static key-value implementations
- Searching for key requires linear search (O(n) time complexity)
- If improved via binary search, requires costly insertion/deletion to maintain order (worse for associated arrays)
- Fixed size (wasted/limited space)
Hash table
ADT whose purpose is to provide an efficient means of accessing items from a large, unordered data set
Inserting data into hash table
- Value processed by hashing algorithm which returns hash value
- Data then placed in index specified by hash value
Hashing algorithm
Always outputs same when given the same input such that the output is practically unique
Accessing data in hash table
Get data’s index by running hashing algorithm over input data again
AQA definition of hash table
Data structure that stores key-value pairs based on an index calculated from a hashing algorithm
AQA component’s of hash table
- An array or multidimensional array to hold data
- Data comprised of keys paired with values
- Hashing algorithm to determine index of data
Advantages of hash tables
- O(1) time complexity
- Determines item’s existence without linear search
Collision
When a hashing algorithm generates same hash value for two or more keys
Collision handling methods
- Rehashing (bad)
- Linear probing (worse)
- Chaining (good)
Rehashing
Apply additional algorithm to original key or conflicting hash value to generate new hash value
Limitations of rehashing
- Increases sparsity of array → wasted space
- Knock-on effect - further collisions may occur, meaning more rehashing is needed
- Additional process → Slower insertion and searching
Linear probing
Looks for next available index
Limitations of linear probing
- No known association between key and index
- Increases potential for further collisions (knock-on)
- Promotes clustering (indices not evenly distributed)
- Items whose hash value point to shunted index have to be moved elsewhere
- Worst-case, linear search across whole table required
Chaining
- Store linked list at each location in hash table
- If collision, simply link item to end of list
Limitations of chaining
Requires some linear searching (however not too much depending on spread of items across buckets)
Benefits of chaining
- Can keep adding items without clustering or increasing chance of collisions
- Retains most efficiency benefits of hash tables
Writing good hashing algorithms
- Can handle text input
- Same hash value for a given key
- Evenly distributed hash values
- Use most of input data
- Use computationally fast operations (e.g. integer calculations)
- Handle collisions efficiently
Uses of hash tables
- Dictionary data structures (store key-value pairs)
- Databases (for quick storage and retrieval)
- Cache memory (determine memory address quickly)
- Operating systems (store location of apps and file for quicker access)
Dictionaries
Data structrues that store collections of key-value pairs where the value is accessed via its associated key
Dictionary operations
- Retrieving value from given key
- Inserting values with a new key
- Updating values of a given key
- Removing a key-value pair
- Testing if a key exists in a dictionary
Uses of dictionaries
Wherever data can be stored in a key-value relationship
Implementations of dictionaries
- Associative arrays
- Multi-dimensional array
- Hash table
- Binary search tree
Associative array implementation of dictionary
Pair of 1D arrays storing key and value
Multi-dimensional array implementation of dictionary
Key and paired value stored in same master index
Limitations of array implementations of dictionary
- Fixed size (wasted or limited space)
- Linear search required for access (O(n) complexity)
Hash table implementation of dictionary
Most directly echoes mapping concept
Limitations of hash table dictionary implementation
Application of hash algorithm “heavy” (I.e. too relatively computationally expensive) for a few key-value pairs
Binary search tree implementation of dictionary
Each node contains both key and paired value (sorted based on key’s value)
Benefits of binary search tree implementation of dictionary
- O(log n) time complexity for searching
- Easy insertion and deletion
- Can grow or shrink as needed
- No need for heavy hashing algorithm
Vector
- Quantity with magnitude and direction
- Quantity consisting of multiple values
Representations of vectors
- List (1D array)
- Arrows on coordinate plain
- As functions (e.g. f(x) : S → R; f(x) = {V_0 if x = 0, V_1 if x = 1}
- As dictionaries (useful if viewed as function due to explicit mapping)
Vector addition
a + b = [a_x + b_x, a_y + b_y]
Scalar-vector multiplication
ka = [ka_x, ka_y]
Dot (scalar) product
a ⋅ b = a_x × b_x + a_y × b_y
Angle between vectors
a ⋅ b = |a||b|cosθ
Convex combination
- When sum of two vectors is a vector that sits within the convex hull
- D = ⍺v βu where ⍺, β ≥ 0, ⍺ + β = 1