Week 7 (Actual) Flashcards
What is a Decision Tree?
A tree structure which consists of:
Root/internal node (independent)
Leaf (dependent variable)
Branch (decision)
Can be classification or regression.
How are decision trees constructed?
Given a data set, group similar samples and look for the best rules that split dissimilar samples.
What is the Gini Index?
Given a training dataset of J classes:
IG(p) = 1 - sum pi^2
where pi is the fraction of items labelled with class i in the dataset.
What is Information Gain?
The information we gain after splitting the samples based on an independent variable.
IG (Y, X) = H(Y) - H (Y | X)
What are the drawbacks of Decision Trees?
Unstable - Small change in data results in large change.
Relatively Inaccurate - Support vector machine and neural networks perform better.
What are Probabilistic Graphical Models?
Nodes represent random variables, and edges (links/arcs) represent conditional independence.
Undirected or Directed (Bayesian).
What are Bayesian Networks?
A kind of probabilistic graphical model that uses the direction of edges to represent the cause-effect relationship and Bayes theorem for probabilistic inference.
A compact representation of a probability distribution in terms of conditional distribution.
What are the advantages of Bayesian Networks?
Graphical Representation: of joint probability distributions of random variables - interpretable.
More powerful: can capture complex relationships.
Combine data and prior knowledge: better approximation.
Generative approach: generate new data similar to existing data.
What are the disadvantages of Bayesian Networks?
Requires prior knowledge of many probabilites.
Sometimes computationally intractable.
What are the main problems faced in Bayesian Networks?
Inference.
Training the models.
Determining the structure of the network.
How do you represent the joint probability distributions of random variables?
A set of nodes: represent random variables.
A set of directed edges: represents “directed dependency”.
A conditional distribution for each node given its parents: P(Xi | Parents(Xi)).
What groups do random variables (nodes) fall in to?
Observed: The nodes we have knowledge about.
Unobserved: Nodes we have to infer probability for.
What is the Markov condition?
Each random variable X is conditionally indepdendent of its non-descendants, given its parents.