basic definitions Flashcards by Patrice Williams

what is unsupervised learning

algorithm is trained on unlabeled data

How well did you know this?

Not at all

Perfectly

Clustering

grouping similar data points together

How well did you know this?

Not at all

Perfectly

Supervised learning

algorithm is trained on a labeled dataset

How well did you know this?

Not at all

Perfectly

Reinforcement Learning

A type of machine learning where an agent interacts with an environment by performing actions and receives rewards or penalties. The agent’s goal is to learn a policy that maximizes the cumulative reward over time

How well did you know this?

Not at all

Perfectly

Overfitting

A situation where a machine learning model performs well on the training data but poorly on unseen test data because it has learned noise and irrelevant patterns in the training data.

How well did you know this?

Not at all

Perfectly

Underfitting

A situation where a machine learning model is too simple to capture the underlying pattern of the data, resulting in poor performance on both the training and test datasets.

How well did you know this?

Not at all

Perfectly

Cross-Validation

A technique for evaluating a machine learning model by dividing the dataset into multiple subsets, training the model on some subsets, and validating it on the remaining subset. This process is repeated several times to ensure that the model’s performance is robust

How well did you know this?

Not at all

Perfectly

Hyperparameters

Parameters whose values are set before the learning process begins and control the behavior of the learning algorithm. Unlike model parameters, hyperparameters are not learned from the data.

How well did you know this?

Not at all

Perfectly

Neural Network

A machine learning model inspired by the human brain’s structure, consisting of layers of interconnected nodes (neurons) that process input data to produce output

How well did you know this?

Not at all

Perfectly

Gradient Descent

An optimization algorithm used to minimize the loss function in machine learning models by iteratively adjusting the model parameters in the direction that reduces the error.

How well did you know this?

Not at all

Perfectly

Feature Engineering

The process of selecting, modifying, or creating new features from raw data to improve the performance of a machine learning model

How well did you know this?

Not at all

Perfectly

Confusion Matrix

A table used to evaluate the performance of a classification model, showing the actual vs. predicted classifications, including true positives, false positives, true negatives, and false negatives

How well did you know this?

Not at all

Perfectly

Nearest Neighbors (k-NN)

Nearest Neighbors is an instance-based learning method where the classification of a new instance is determined by the majority vote of its ‘k’ closest neighbors from the training dataset. The distance between instances is typically measured using metrics such as Euclidean distance, Manhattan distance, or others depending on the feature types

How well did you know this?

Not at all

Perfectly

knn Lazy learning

k-NN is a lazy learner, meaning it doesn’t learn a model explicitly but rather computes results based on the distance from the query point to the stored instances.

How well did you know this?

Not at all

Perfectly

knn Decision Surface

The decision boundary of k-NN is often very irregular and heavily influenced by the choice of ‘k’.

How well did you know this?

Not at all

Perfectly

knn- Challenges

Study These Flashcards

k-NN is sensitive to the choice of distance metrics and can be computationally expensive, especially in high-dimensional spaces.

Regression Models

Study These Flashcards

Predict numbers

Classification models

Study These Flashcards

predicts categories

2 types of Supervised learning models

Study These Flashcards

classification and regression

Training Set

Study These Flashcards

data used to train the model (features and targets)

Study These Flashcards

input variable feature

Lower case y

Study These Flashcards

output variable or target variable

Study These Flashcards

number of training examples

(X,y)

Study These Flashcards

single training example

(X(i), y(i))

Ith training example - ith

linear function

fancy name for a line

univariate linear regression

1 variable line

Cost function

the cost function (also known as the loss function or error function) is a mathematical function that quantifies the "error" between a model's predictions and the actual, expected outcomes.

fitting or training

capturing patterns from data

Univariate Feature Selection

his involves analyzing each feature individually and assessing its importance. Methods like the Pearson correlation coefficient or the chi-squared test can help identify features with strong relationships to the target variable

Embedded Methods

Some machine learning algorithms, like Lasso regression, have built-in feature selection mechanisms. The algorithm automatically assigns weights to features, and features with low weights can be dropped.

Wrapper Methods

These methods use the machine learning algorithm itself to evaluate the importance of features. Techniques like recursive feature elimination (RFE) iteratively remove features and measure how the model's performance drops.

basic definitions Flashcards

(33 cards)