AI & ML Lect6 Flashcards
What is general AI?
Complex machines that possess same characteristics of human intelligence
What is Narrow AI?
Tech that are able to perform specific tasks, better than humans can
What is a turing test?
A test of a machine’s ability to exhibit intelligent behaviour indistinguishable from that of a human.
CAPTCHA is an example of?
Reverse Turing Test
AI in 1960’s used?
If then else which was pre defined rule and knowledge
Modern AI used ? to replace defined rule and knowledge
Machine learning algo
Use of Algo
Parse data, learn and make a determination or prediction
What is ML
Getting computers to act without being explicitly programmed
What is Deep Learning
Artificial neural networks have discrete layers, connections, and directions of data propagation
Steps in Data Science
Data collection, Data preparation, EDA (explanatory data analysis), Machine learning, Visualization
Why the need for Machine Learning?
No human experts, blackbox human expertise, rapidly changing phenomena, need for customization/ personalization
Give an example of Learning
Data: loan application data
Task: predict whether loan should be approved or not
Performance measure: accuracy
No learning: approve all future transactions
With learning: analyze user’s assets and credit score etc
How to achieve good accuracy om test data?
training example must be similar to test data
Types of Machine Learning
Supervised Learning, Unsupervised Learning, Reinforcement Learning
Uses of Supervised Learning
Classification and Regression
Uses of Unsupervised Learning
Clustering, Dimensionality Reduction, Anomaly detection
Characteristics of Supervised learning
Train using labeled data, direct feedback, predict outcome
Characteristics of Unsupervised learning
No labels, no feedback, no hidden structure, learn with their own
Characteristics of reinforcement learning
Decision process, reward system, learn series of actions, has a mapping structure that guides machine from input to output
Goal of unsupervised learning
Find Pattern and trend to discover output, doesn’t predict/find anything specific
What is classifcation in supervised learning
Categorize all variable that form the output
Example of Classification
Classifiying written digits in a cheque
Use of Regression
Identifying/Predicting a specific value, usually a real number, used in stock market prediction, sales volume by using mathemathical functions
Use of Clustering
identifying groupings occuring within the data
Use of Anomaly detection
identifying anomalies within the data
Use of Dimensionlity reduction
Transformation of data form high to low by discarding redundant data, while still retaining meaningful properties of the original data
Uses PCA
How Supervised Learning works:
- Provide Algo with labeled input and output data to learn
- Feed the machine new unlabeled info to see if it tags new data approporiately.
- If not, continue refining the program
Use of classification
Sorting items into categories
How unsupervised learning works:
- Input raw data
- Interpretation
- Algo
- Processing
- Output
How reinforcement learning works:
- input raw data
- enviroment (state of data, agent takes control, selection of algo, best action, reward if correct)
- output
2 features in Dimensionality reduction
Feature selection and Feature extraction
What is feature selection in Dimensionality reduction?
identify redundant features and discard
What is feature extraction?
find new set of low dimensional point that represent the original data well
What is PCA
Principal Component Analysis is a method to construct low-dimensional representation of the data by focusing on the principal components
Condition of Data in PCA
As much variance as possible
Use of PCA
Data visualization with a scatter plot
Best classifier for text classification
SVM
Which method performs best in classifying high dimensional data?
SVM
What is Decision Tree mainly used for?
Classification due to its competitive accuracy, and very effecient
Leaf nodes in Decision tress indicate?
Class / decision
What type of Algo for decision Tree?
Heuristic Algo
How to make an easy to understand and better performing decision tree
Smaller and accurate tree
Finding the best tree is?
NP-hard
What is good attribute selection in Decision Tree?
Attribute that splits examples into subsets that are ideallly all (+) or (-)
What is ensemble learning
A set of decision tress working together to make a single predicition, and allow greater predictive accuracy
What are “Random forests”?
A set of decision trees, with each tree having different features
Key Idea of KNN (Key Neighbour Model)
Properties of an input x are likely to be similar to those points in the neighbourhood of x
Basic idea of kNN
Find k nearest neighbors of x and find target attribute of x based on corresponding attribute values
How does complexity grow with data
Linearly
How does kNN define nearest neighbors?
Euclidian Distance
Training algo in kNN
Add each training example (x,y) to dataset D
Classification algo in kNN
Count the K-nearest neighbors
kNN is slow at?
Classification time eventho accuracy can be quite strong
Usage of kNN
handwritten character classifications, recommender systems, medical data mining, pattern recognition
How Linear Regression works?
Have a set of points, which the regression algo will model relationship between a single feature (explanatory variable x), and a continuous valued response (target variable y)
Goal of Linear Regression
Find a best fit line such that the cost function is minimized
Most common cost function is
MSE (Mean squared error), which is average squared diff between an observation’s actual and predicted values
1 method of clustering is?
k-means algo
What is a training set?
A set of examples used for learning a model
What’s a validation set?
Set of examples that can’t be used for learning the model but can help tune model parameters. Helps control overfitting.
What’s a test set?
Used to access the performance of the final model and provide an estimation of the test error.
What set to never use in tuning parameters or revise model?
Test set
How to do cross validation
Train the model on p% of the data
Test the model on the other (100-p)% of the data - this data is unseen by the model
ML in Fintech
Process automation, fraud/security, algo trading, robo-advisory
Explain benefits of process automation
Replace manual work, automate repetitive tasks, and increase productivity
examples include chatbots and JPM’s COiN (Contract intelligence)
Role of ML in security / fraud
Fraud detection, financial monitoring, underwriting and credit scoring
Procedure of fraud detection
- Unsupervised learning (clustering)
- If any anomalies detected, trained AI model will separate legitimate and illegitimate transactions.
- Initial trianing of AI is using supervised learning
- Trained AI model is then given data, manual review by an expert, feedback (reinforcement learning) to become the final model
Algo trading in ML
- Monitor trade results in real time and detect patterns
- Sentiment / news analysis
- Act to sell, hol and buy stocks
- Analyze thousands of data sources
- Squeeze slim advantage over market average → significant profits due to enormous volume
- Make thousand or million of trades - high-frequency trading
Robo Advisory roles
Portofolio Management, and reccomendation of financial products
SVM’s steps to do?
- Define hyperplane between classes with support vectors
- Optimise model to find support vectors that maximize the distance between hyperplane and classes (best line)
What happens if there are 2 overlapping classes in SVM?
Use Kernel Trick - Map data to high dimensional space where they will be linearly separable