Final Flashcards
hyperparameter and examples
a parameter whose value is used to control the learning process
ex) batch size, number of epochs
parameter grid
specifies search space, combination of hyperparameters
probabilistic graphical models
graphical representations of probability distributions, variables depend on other variables
what are the benefits of graphical models?
learning dependencies, visualizing a probability model, graphical manipulations over latent variables, obtaining insights (like conditional independence)
conditional independence
2 events A and B are conditionally independent given a 3rd event C if the occurrence of A and the occurrence of B are independent events.
How many types of probabilistic graphical models are there and what are they?
2 types: Bayesian Networks, Markov Networks
What is the difference between Bayesian Networks and Markov Networks?
Bayesian Networks have directed graphs and Markov Networks have undirected graphs
Bayesian network
directed edges between nodes that describe conditional dependencies
ex) sprinkler, rain, grass wet
joint probability
Probability of 2 or more events happening at the same time. This uses product/chain rule
ex) Probability that a card drawn is red and 4
marginal probability
probability of an event irrespective of the outcome of another variable (unconditional probability). This is the probability of a single event and this uses the sum rule.
ex) Probability that a card drawn is red
conditional probability
probability of one event with some relationship to one of more events
ex) given that we drew a red card, what is the probability that the red card has a 4
Bayesian Networks
directed acyclic graph (graph having no cycles) and model dependencies between the variables of the data set. Vertices are variables and edges are conditional probability. It allows us to capture variable dependencies within the data which we can’t capture with linear and logistics regression. Bayesian networks use Bayesian Inference.
Inference
Process of using a trained machine learning algorithm to make a prediction.
Posterior Probability
Probability of A (the hypothesis) to occur given event B (the evidence) already occured
Likelihood
Probability of B (the evidence) being true given that A is true
Prior
Probability of A (the hypothesis) being true
Evidence
Probability of B (the evidence) being true
Probability Density Function
Finds probability of outcomes of random variables
What are two ways to build a classifier?
1) Calculate posterior probabilities for a sample and assign it to a class that has the highest probability
2) create a discriminant function
What would you use for a continuous random variable?
gaussian naive bayes
What would you use for a categorical random variable?
categorical naive bayes
What would you use for a multinomial distribution?
multinomial naive bayes
What would you use for a binary random variable?
bernouli naive bayes
discriminant function
we don’t need to calculate evidence