Fundamentals of AI Flashcards
What is Machine Learning?
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
What is Top Down?
Model all different functions and wire all the ‘agents’ together, deduction
What is Bottom Up?
Give the system a lot of data, so it can discover the concepts itself, induction
Three Pillars of Machine Learning
Models and Algorithms, Powerful and cheaper computation, Massive data warehouse
What is Data Mining?
Exploration and analysis of large quantities of data discover valid, novel, useful and understandable patterns in data.
What is Supervised Learning?
Infers a function from labelled training data, each example consisting of input and outputs (Classification and Regression)
What is Unsupervised Learning?
Infers a function to describe hidden structure from unlabelled data (Clustering and Association)
What can be done in Data pre-processing?
Fill in missing data, find outliers, feature selection.
Unsupervised Learning - Clustering
Given: Un-labelled data set and similarities/distance metric
Goal: Find ‘natural’ partitioning, or groups of similar data points
K-means clustering
Choose the number of k clusters and initialise K cluster centroids randomly. Assign each data point to the nearest centroid (based on distance), and update the centroids by the mean of all the data points assigned. Output the final cluster assignments and centroids.
Application of k-clustering
Anomaly detection, Social Media Analysis
Unsupervised Learning - Association
Discover correlation between any two or more variable.
Given: a set of records containing items
Goal: Produce dependency rules to predict occurrence of variable X with variable Y
Categorical data
Learn to predict to which set an instance belongs to based on pre-labeled (classified) instances (Classification)
Continuous data
Finds a linear relationship with the variable X and the variable Y
Supervised Learning: Regression
Based om the given data find the function that minimises its mean squared error to fit the samples
Overfitting
Describes errors in the dataset instead of the underlying relationship of the variables
Regression Pros and Cons
Pros: short training time, easy to implement, easy to interpret
Cons: sensitive to noises and outliers (overfitting), cannot handle complicated relationships (linear only)
Supervised Learning: Decision Tree
Internal nodes: decision rules on features
Branched: course of decision or action
Leaf nodes: a predicted class label (output)
Iteratively partition the decision space of chosen features.
Decision Tree Pros and Cons
Pros: Reasonable training time, Caan handle large number of features, easy to implement, easy to interpret
Cons: only simple boundary decisions, problems with missing data, cannot handle complicated relationships, over-complex tree (overfitting)
Neural Networks Pros and Cons
Pros: can learn more complicated class boundaries, can be more accurate, can handle large number of features
Cons: hard to implement: trial and error for choosing parameters and network structure, slow training time, can overfit the data, hard to interpret
Supervised Learning: Neural Networks
Set of neurons connected by directed, weighted edges
Positive weight encourages the neuron to fire, while negative prevents firing. Each neuron is fixed at threshold t.
Linearly Seperable
Where the output data pointscan be seperated using a linear boundary. Only a linearly separable function can be represented by a perceptron.
Bayes Rule
Fundamental notion is of conditional probability
KRR
Knowledge Representation and Reasoning
Knowledge Based Systems
A system build around a knowledge base, i.e collection of knowledge taken from a human and stored in such a way that the system can reason with it