MACHINE LEARNING Flashcards
what is Machine Learning?
a type of AI for building methods that allow machines to learn
data is leveraged to improve computer performance on a set of tasks, the system makes predictions based on that data
not all AI is machine learning (MYCIN Expert System)
what is deep learning?
NEURONS AND SYNAPSES
subset of machine learning for training a model on more complex patterns in the data with more layers of learning
input layers
hidden layers !
output layers
one example is computer vision, used for image classification, object
needs A LOT of input data and GPUs
what are nodes?
they are what neural networks are made of, tiny units connected together and organized in layers
when the network sees a lot of data, it identifies patterns and changes the connections between the nodes as needed
nodes “talk” to eachother by passing data or not to the next layer and creating new connections as well as removing old ones
neural networks have billions of nodes
so AI > ML > Deep Learning > GenAi?
yes.
in GenAI we have multi-purpose FM backed by neural networks, which can be fine-tuned to fit use cases
genAI models leverage transformer models (LLMs), which are able to proccess a sentence as a whole instead of word by word
faster and more efficient text processing (less training time)
trained on vast ammounts of text data from the internet, books and etc - learns patterns and relationships between words and phrases
a good example is Generative Pre-trained Transformer (chatGPT)
and for images we have:
diffusion models: NOISE
multi-modal models: take in lots of different inputs, output lots of different data too
ML terms that might be on the exam:
GPT: generate human text or computer code based on input prompts
BERT (Bidirectional Encoder Representations from Transformers): reads the text in two directions
RNN (Recurrent Neural Network): for sequential data such as time-series or text, speech recognition
ResNet (Residual Network): for image recognition tasks, object detection and facial recognition
SVM (Support Vector Machine): ML algorithm for classification and regression
WaveNet: generate raw audio waveform for speech synthesis
GAN (Generative Adversarial Network): generate synthetic data like images, videos or sounds that resemble training data - for data augmentation
XGBoost (Extreme Gradient Boosting): implementation for gradient boosting
RESNET FOR IMAGES
WAVENET FOR AUDIO
GAN FOR DATA AUGMENTATION
GPT AND BERT FOR LANGUAGE
what is training data?
it’s the most critical stage in the development of a good FM, it can be based on:
labeled data: for SUPERVISED LEARNING, includes input features and output labels
unlabeled data: for UNSUPERVISED LEARNING, includes only input features
structured data: tabular or in time-series, very easy to read and structure
unstructured data: text-heavy or multimedia content (images, social media posts, articles, customer reviews)
what is supervised learning?
machine learning algorithm where it learns a mapping function that can predict output for new unseen input data with labeled input data
regression: used to predict a numeric value based on input data - house prices, stock, weather forecasting
classification: used to predict categorical labels of input data
can be BINARY CLASSIFICATION like spam vs. not-spam emails
or MULTICLASS CLASSIFICATION like for movie genre labels on rotten tomatoes
logistic regression
linear regression
decision tree
neural network
what is feature engineering?
ML algorithm where domain knowledge is used to select and TRANSFORM RAW DATA INTO MEANINGFUL FEATURES
feature engineering on structured data: predicting house prices based on features like location, number of rooms, size, etc
feature engineering on unstructured data: sentiment analysis of customer review
what is unsupervised learning?
ML algorithm that uses only unlabeled data, which the system will discover patterns, structures and relationships within
humans will still put the labels in the end of the output
common techniques include: clustering, dimensionality reduction and association rule learning
what is self-supervised learning?
specific form of unsupervised learning where the model creates its own labels from the unlabeled data
what is semi-supervised learning?
a ML algorithm that uses small amount of labeled data along with a large amount of labeled data
document classification
sentiment analysis
fraud identification
what is reinforcement learning?
ML algorithm where an agent learns how to make decisions by interacting with the given enviroment, with the goal of maximizing cumulative rewards
dynamic learning process, effective for enviroments where responses need to be optimized based on direct user interaction and satisfaction
unlike supervised learning where the model learns from labeled data, reinforcement learning involves learning from feedback based on the consequences of actions
what is reinforcement learning from human feedback?
ML algorithm that uses human feedback to self-learn more efficiently
by incorporating human feedback in the reward function, it becomes more aligned with human goals, wants and needs
RLHF is used throughout GenAI applications, like LLMs, since it significantly improves the model’s performance
what is overfitting?
the fit of a model that has low bias and high variance
performs well on training data but not real-life
prevent it by using techniques like cross-validation, regularization, and pruning to simplify the model and improve its generalization - MAKE IT SIMPLER
resource-effective solution: hyperparameter tuning
resource-intensive solution: increase ammount of data for training