MACHINE LEARNING Flashcards
terms
Accuracy
How often model predicts correctly, e.g: 100 predictions, 80 is correct
80/100 = 0,8 0.8 or 80%
Activation function
A function applied in neural netwroks to introduce non-linear transformations, enabling the network to model complex relationships. Exampes: Sigmoid, Relu
Autoencoder
A type of neural network used for unsupervised learning, mainly for dimensionality reduction and feature learning by encoding inputs to a lower-dimensional space and reconstructing them.
Back-propagation
An algorithm for traing neural networks, where errors are propagated backward through the network to adjust weights based on gradients.
Bagging
“Bootstrap aggregating” - a technique to reduce variance by training multiple models on random subsets of the data and averaging predictions
Basis function
Functions used in models to transform in put data to make it easier to fit linear models.
Bias
The error due to overly simplistic models that do not capture patterns well.
Bias-variance tradeoff
The balance between a models complexity(variance) and simplicity(bias) to achieve optimal performance
boosting
An ensemble method: building models that correcting errors previos models made which leads to higher accuracy
Bootstrap Sample
A sample obtained by randomly sampling with with replacement from dataset, often used in bagging and estimation
example:
- ball from a casket, and identify it with a color
- puts it back, and re-do the process
- point is you might pick same ball twice or more.
Classification
A type of supervised learning where the goal is to predict discrete labels
Clustering
An unsupervised learning technique for grouping similar data points toghether
- the machine finds patterns by itself
- netflix groups movies after genre
Complete Linkage
A hierarchical clustering method where the distance between clusters is the maximum distance between any pair of points in the clusters
- biggest distance becomes “the distance between the clusters”
Confusion Matrix
A table used to evaluate a classification model’s performance by comparing predicted and actual values.
Cross-validation
A technique for assessing models:
- performance by dividing data into subsets,
- training on some
- testing on others to reduce overfitting
Curse of dimensionality
- problem where data becomes sparse in high-dimensional spaces
- which makes it harder for models to find patterns
Deep Learning
A subset of machine learning that uses neural networks with many layers to model complex data patterns
- uses many layers
- the more layers, the more complex things can the model learn
Decision Boundary
The boundary that separates different classes in a classifier
Classifier
Algorithm which is trained using labeled data
- employ math and statistics methods to generate predictions
Decision Tree
A model that splits data based on feature values to predict the target label, forming a tree structure.
Ensemble Methods
Techniques that combine multiple models to improve predictions, like bagging, boosting, and stacking
Entropy
A measure of uncertainty or disorder in data, used in decision trees to determine the best splits
Fairness
Ensuring that models do not discriminate or create biases against certain groups
F1 Score
The harmonic mean of precision and recall, providing a balance between the two