HTat Flashcards
-1: Constant time growth
The nomal amount of time an instruction takes under the RAM model
-Log n: logarithmic
Occurs in algos that transform a bigger problem into a smaller problem where the input size is a ration of the origional problem, common in searching and trees
-n: Linear
Algorithms that are forced to pass through all elements of the input(n) a number of times yield linear running times
n^k log n: polylogarithmic
Algorithms that break a problem into smaller parts, solve the ssmaller problems and combine into a solution k>=1
n^2: Quadratic
Subset of polynomial solution. Relatively efficeint to small or medium scale problems. typical of algos that have to analyse all pairs of elements of the input
n^3(+) cubic
Not very efficient but still polynomial. An example of an algorithm in this class is matrix multiplication
2^n: exponential
This is as bad as testing all possible answers to a problem, when algorithms fall into this catagory designers look for aproximation algorithms
Worst case of an algorithm
The worst-case efficiency of an algorithm is its efficiency for the worst-case input of size n, which is an input
(or inputs) of size n for which the algorithm runs the longest among all possible inputs of that size
Advantages of finding the worse case of an algorithm
-Binds runnning time to an upper bound
-ensures no matter the input size it cant be worse that CWorst(n)
Best case of an algorithm
The best-case efficiency of an algorithm is its efficiency for the best-case input of size n, which is an input
(or inputs) of size n for which the algorithm runs the fastest among all the inputs of that size
Advantages of finding best case
If the best case is not good enough the approach may need to be revisited
How can we measure the time efficiency of an algorithm
The standard approach is to identify the basic operation(s) and count how many times it is executed.
As a rule of thumb, it is the most time-consuming operation in the innermost loop of the algorithm
The established framework is:
Count the number of times the algorithm’s basic operation is executed for inputs of size n
Why can’t we use a standard unit of time to measure the efficiency of an algorithm
We can’t have influence from external factors such as:
Dependence on the speed of a particular computer
* Dependence on the quality of the program implementing the algorithm
* Dependence on the quality of the compiler used to generate the executable
* Difficulty of clocking the time precisely
What does RAM stand for in the RAM model of computation
Random access machine
Do we have to take constants into account when talking about growth functions
No, for example in a function that looks like 2x^3 + 10x^2+ 3 the only important term for growth is x^3 according to asymptotic notation
What is Asymptotic notation?
It provides the basic vocabulary for discussing the design and analysis of algorithms
Why is Asymptotic notation used
It’s ubiquitous because it identifies the key point for reasoning about algorithms.
It’s course enough to suppress all the details that should be ignored
precise enough to make useful comparisons between different algorithms
Big O notation
look it up yourself fucking nerd
What is the formula to find average case of an algoritm
(1 x p/n + 2 x p/n + .. + i x p/n) + n x (1-p)
Where to p is the probability of the first match occuring in the ith position for every i
Did you look over week 4 again?
Do it
look over lecture 3
<
look over all of week 3
<
Data Structues
Data structures are the building blocks of programming
-define how data is organised and allocated
Definition of ADT?
Abstract data types
Do data structures have operations
Yes they have operations to manipulate data
What must be true for a structure to be linear ?
-unique first element
-unique last element
-each component has a unique predecessor(bar first)
-each component has a unique succesor(bar last)
What are the two methods for accessing data in linear data structures
Sequential access - elements must all be accessed sequentially
Direct access - any element can be accessed directly, no prior access required
Give advantages and disadvantages of using arrays
Advantages
-Noramlly pre-defined in most programming language
-Elements in arrays can be directly accessed
Disadvantages
-Fixed size
Give the definition of a bernolli trial
An expermient or random process that has two possible outcomes
What is a Bernolli process?
A series of executions of Bernolli trials
What is a list?
A linear structure that only provides sequential access it has:
-two special elements; head and tail
-a homogenus structure
-easy insertation and deletion of nodes
What are advantages of using lists over arrays
-They don’t have a fixed size
-can be used as the basis of several other structures e.g queues and stacks
-Easier to insert and delete nodes
How does a linked list work?
consists of a collection of records called nodes each containing a member pointing to the next node in the list(implicit)
What is a better alternative to lists and why?
Lists using dynamic allocation(e.g linked lists)
-Because we have a link member we can store data anywhere in the list without worrying
Advantages of linked lists
-Fair use of memory
-Size of list doesn’t need to be declared in advance
-common operations(ins, del eg.g) are cheaper
Disadvantages of linked lists?
-Algorithms are more complex therefore harder to understand and debug
-Allocation and deAllocation of memory space can cause some overhead
What do we have to consider when examining a node in a linkedlist
-The node is the first node
-The node is an interior node(any node not first or last)
-The node is the last node
What is a sentinal node?
-A placeholder node containing no actual data designed to simplify operations on the linked list
-For example if the list is being searched and a trailing sentinal node is reached the algorithm knows it can stop searching
-if a sentinal node is added to the beggining and end then every insertion can be counted as a general insertion as there will never be a need to insert into the first and last node
What are Ciruclar lists
Variation of a singually linked list wherin the last node points to the first node instead of to nothing
When are circular lists useful/used?
-They are useful in problems where we aren’t interested in what node is the first or the last
-Some problems are better described using circular lists e.g. Josephus election
-In this structure the head can point to any node and isn’t used to keep nodes from being list
-Algorithms using circular lists are simper
What is the Josephus algorithm
An algorithm that makes use of circular lists, it will naviagate a circular list eliminating every kth person until only one remains
How can an array be used to implement linked lists?
The standard way to do this involves:
-An array of nodes to simulate the memory
-An array of boolean to keep track of free space in the memory
What is a doubly linked list?
It’s similar to a linked list however in addition to the head a pointer called the tail is also maintained.
That way each node is linked to both the node in front and behind it
One advantage and disadvantage of doubly linked lists
Operations are made easier to understand
More overhead due to increased number of pointers
What is a stack?
A restricted form of list
What are some properties of a stack
-Follows a Last in First out structure (LIFO)
-Can only refer to the top of the stack
-Only has two operations, one to add something to the top of the stack(push) and one to remove from the top of the stack(pull)
Applications of stacks
-Express evaluation
-Runtime memory management
-Search evaluation
What is a queue
A restricted form of a list
What are the properties of a queue?
-Implements a first in first out methodology
-Has a reference at the front and end of the queue
-Most important operations are enqueue and dequeue
What are some application of a queue?
-Job Scheduling
-printer spool
-mutitask operating systems (timesharing)
-Search problems(Knowing what part of memory hasn’t been explored yet)
What are some special purpose queues and stacks
-Double ended queue
-There are modifications that remove duplicates
What are the elementry sorting algorithms?
-Insertion sort
-Bubble sort
-Selection sort
When is a sorting method said to be stable?
It preserves the relative order of items with duplicate keys on the list. e.g sorting by name then by age if we want the primary sorting parameter to be age, sorting by name first ensures items with the same age value are sorted alphabetically
Why is stability important?
-More efficient as keys arent swapped if they don’t need to be
-Easier to be parallelised
-Allows multi key sort to be easily implemented via multiple runs of the sorting algorithm
-
How does selection sort work?
Finds the index of the maximum element and puts it in it’s place on the list(highest position)
Properties of selection sort
-Simple algorithm
-Inefficient for large values of n
-Easily implemented on linked lists
What is the worst case of selection sort?
Main loop is executed n- 1 times where n is the size of the array:
No. steps given by n^2/2 - n/2
How does Insertion sort work?
-We get an element and insert it into the position it belongs, some elements may have to be shifted to open space
When is Insertion sort used
-Commonly used when we want to generate another list with the items sorted and keep the original list
-Useful as it can be done in place without requiring an additional list
What is the worst case of insertion sort?
The main loop is executed n-1 times therefore the inner loop is executed n-1 times. This gives an efficiency of O(n^2)
Properties of bubble sort?
-One of the simplest and slowest sorting algorithms
-In theory has the same efficiency as selection sort
-After each outer loop interaction one element is placed in it’s final location
Advantages of quicksort
Average case is O(nlogn)
Only uses a small stack as an auxillary structure
Disadvantages of quicksort
Recursive and costly
Worst case O(n^2)
fragile
What is the basic idea of quicksort
Pick one item from the list and call it pivot
2. Partition the items in the list using the pivot.
Reorganise the list so that: all the elements that are smaller than the pivot are in the left
partition and the right partition contains only elements that are greater than the pivot.
You can put equal elements in one of the partitions
3. Use recursion to sort the partitions
In quicksort, what conditions must be true for the sublists to be considered rearranged
-The element chosen as the pivot must be in it’s final place in the list
-All the elements in list[first],…,list[i-1] are less than or equal to list[i]
- All the elements in list[i+1],…,list[last] are greater than list[i]
What is the worst case of quicksort
While the performance of quicksort depends on how balanced the recursion tree is the worst case of quicksort yields O(n^2)
Worst case of quicksort using reccurance
O(n) + T(n-1)
Best case of quicksort
-Depth of recursion is O(logn)
How can we avoid the worst case for quicksort
We can use the median of 3:
(i think this is creating 3 pivots and finding the median but idrk)
Was Mergesort proposed by Vonn Neuman
There was some evidence saying it was but it’s inconculsive so no hard evidence
What is the difference between quicksort and mergesort?
Quicksort works on the concept of selection which divides a larger list into two smaller lists and mergesort works on the concept of merging which is joining two smaller lists together into one larger list
Advantages of mergesort
-Worst case is O(nlogn)
-Stable and Robust
-Easily works with linked lists
Drawbacks of mergesort
-May require an additional array up to the size of the original list
What effiency does mergesort guarantee
Mergesort gaurantees O(nlogn)
What is a radix sort algorithm
Radixsort is used when the comparison keys we are using are more complicated.
It’s useful as in several applications the consideration of the whole key is not neccesary, for example when comparing names alphabetically the whole name is often not relevant
What are the main ideas behind radixsort
-We could represent keys in binary and work with the individual bits
-We can consider the number in decimal base and work with the individual digits
-We can consider strings as sequence of characters and work with the individual
characters
What is an MSD radix sort?
Stands for most significant digit radix sort, it will compare digits from left to right
-MSD tend to be preferred because are normally able to sort the list by examining just a partial
number of the digits
-more complex than lsd sorts
What is an LSD radix sort?
Least significant digit radix sort, will compare digits from right to left
-less complex than MSD sorts
How can we represent the keys in radix sort?
For smaller and simpler keys extraction of digits is enough.
For large and non-uniform keys it may be a better idea to use the binary representation
of the key.
What is the performance of radixsort?
For sorting n records with k number of digits the running time of Radixsort is equivalent to k*n = O(n)
-Clearly this performance depend on the sorting algorithm used to sort the element based on digit k.
-For large values of n radixsort is comparable to O(nlogn)
What is counting search?
Counting search is a type of search algorithm that works when we know how many values are in the array
What are the basic concepts behind counting search?
For each element x count how many other elements are less then x
* This information tells the position of x in the sorted list
* If there are 20 elements less than x, then x is in position 21 in the list
What is the performance of counting search
Counting sort has a performance of k = O(n)
Why might we want to avoid direct access structures?
We have to set the size in advance and as we have to overestimate the size we end up with a sparse structure
Why might we want to use a hash table?
When the number of keys to be stored is much smaller than the total universe of keys
What are some advantages of hash tables
-they have the advantage of direct access
-we can access the element directly as the keys are non trivial
What makes a good hash function?
It is one that satisfies the assumption of uniform hashing
This means that each key is equally like to hash to any of the m slots in the table.
What are two common hash functions?
The division method (h(k) = k mod m where m is the size of the hash table, good values of m are crucial
-For instance, if we want to store 4000 numbers and we don’t mind doing 4 probes, chose m
to be a prime close to 1000
The multiplication method h(k) = [m x (kAmod1)] where A is a constant between 0 and 1
What are two types of collision resolutions that we can use for hash tables?
Seperate chaining
-each slot of the hash table is a pointer to a dynamic structure(e.g linked list)
Open Addressing
-When a collision occurs the data is stored in an alternative location in the hash table determined by which method is used.
-This is best used when we know the maximum inputs the hash table has to sustain
What are the different methods we can use to redirect data collisions with open addressing?
-Linear probing
-Quadratic probing
-random probing
-rehashing
Linear probing
Do a linear search of the table until you find an empty spot
-suffers from primary clustering
Quadrati probing
A varient of Linear probing where the term being added to the hash result is squared h(key) + c^2
-sufferes from secondary clustering which is milder than primary clustering
Random probing
A varient of linear probing where the term added to the hash result is a random number h(key) + random()
-Rehashing
A series of hashing functions are definied and if a collision occurs they are used in order until it is resolved
How does searching in a hash table work?
Searching is an important function in hash table:
-Hash the target
-Take the value of the hash of target and go to the slot. If the target exist it must be in
that slot
-Search in the list in the current slot using a linear search (assuming Linked Lists)
What is the worst case for searching a hash table
O(n)
What is the best case for searching a hash table
O(1)
What is the average case for searching a hash table
O(1)
Verticies in a graph are objects that have properties attached to them, while edges do not normally have properties attached what is it called if they do?
A weighted graph
How can we represent a path of nodes in a graph?
A path from node A to node B in a graph is a list of nodes where successive nodes are
connected in the graph e.g
A path from C to F
normally represented
as
(c,a),(a,d),(d,f)
When is a graph considered connected?
When all nodes are connected to each other via a path
How does a non connected graph work?
A non connected graph is made up of smaller graphs called components
What is a simple path
A path in which no node is repeated
What is a cycle
A simple path with the same first and last node
What is a spanning tree?
A connected subgraph hat contains all the
nodes of the graph but has no cycles is called a spanning tree
How many edges can be in a graph
The number of edges in a graph can range from 0 (zero) to V(V-1)/2. Where V is the number of verticies
What is a sparse graph
We call a sparse graph a graph in which E is much less than V2
* In general we say that in a sparse graph E = O(V)
What is a dense graph
We call a dense graph one in which E is close to V2
* Or we say that in a dense graph E = Θ(V2)
What is a complete graph
We call a complete graph a graph in which E is exactly V(V-1)/2
* A complete graph with k nodes is also called a k-clique.
When is adjacency list representation best used?
Best used when representing sparse graphs
What is BFS
Breadth first search, a type of graph search
What are some properties of breadth first search
Similar idea to level order traversals for trees
The algorithm is very simple and is important for other graph algorithms
* for instance Dijkstra’s and Prim’s algorithms use ideas similar to BFS
The breadth-first search in graphs will make use of colors to identify nodes that have been visited
* White nodes are nodes not yet visited
* Grey nodes have been discovered but not visited
* Black nodes have been visited
At the end of the breadth first search, a tree is formed. This tree is called BFS tree.
How does breadth first search work?
After the seach starting point is discovered all the nodes attached are added to a queue with the lowest subnodes being at the highest priority
The search moves to the next item in the queue and adds all the new subnodes discovered to the end of the queue, again with the lowest subnodes having highest priority
The search will move through every node in this manner
How does a depth first search work?
Traverse left subtrees first and then travsere right subtrees
How can we use a depth first search on graphs
The algorithm is also very similar to the BFS algorithm
* By replacing a queue with a stack we get depth-first search
What is a connected undirected graph considered?
A tree
What do BFS and DFS tell us about distances in spanning trees
BFS: Shortest path – distance from the starting vertex (unweighted graph)
DFS: Traverse through all possible options
Both produce spanning trees
What is a wieghted graph
A graph with a cost attached to the edges
What is the minumum spanning edge of a tree
A collection of edges connecting all
nodes such that the sum of the weights of the edges is minimum.
How does Prim’s algorithm work?
Start with an arbitrary vertex v as the root of the tree.
Create a set of visited vertices and add v to it.
Create a set of unvisited vertices and add all the other vertices to it.
While the set of unvisited vertices is not empty:
a. Find the edge with the smallest weight that connects a visited vertex to an unvisited vertex.
b. Add the unvisited vertex to the set of visited vertices.
c. Add the selected edge to the minimum spanning tree.
Return the minimum spanning tree.
what does mst stand for?
Minimum spanning tree
What is a priority queue
A queue where the next item to be removed is the item with the highest priority(in the case of Dijkstras that would be the lowest distance)
How does Dijkstras algorithm work?
A distance array is created to keep track of how far each vertex is from the start and a visited array is created to keep track of what verticies have been visited
-The starting vertex is selected
-For every neighbor of the selected vertex calculate the distance from the started vertex and add it to a priority queue
-Select the vertex with the lowest priority in the queue and visit it’s neighbours, calculating the distance from the starting vertex and adding them to the queue
-If a vertex is already in the queue then only replace it if the calculated value is smaller
-repeat
What is a directed graph?
More general form of graph
-Edges of the graph now have directions which means traversal is more difficult
-Directed graphs can represent undirected graphs but not vise versa
What is a weakly connected directed graph
There is a path(direction is irrelevent) between every set of verticies
What is a strongly connected directed graph?
There must be a path(directions included) between every set of nodes
Does depth first search account for directed graphs?
No, the algorithm must be repeated starting at a different node if there are still undiscovered nodes at the end of operations