basic definitions Flashcards

1
Q

what is unsupervised learning

A

algorithm is trained on unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clustering

A

grouping similar data points together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Supervised learning

A

algorithm is trained on a labeled dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reinforcement Learning

A

A type of machine learning where an agent interacts with an environment by performing actions and receives rewards or penalties. The agent’s goal is to learn a policy that maximizes the cumulative reward over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Overfitting

A

A situation where a machine learning model performs well on the training data but poorly on unseen test data because it has learned noise and irrelevant patterns in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Underfitting

A

A situation where a machine learning model is too simple to capture the underlying pattern of the data, resulting in poor performance on both the training and test datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cross-Validation

A

A technique for evaluating a machine learning model by dividing the dataset into multiple subsets, training the model on some subsets, and validating it on the remaining subset. This process is repeated several times to ensure that the model’s performance is robust

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hyperparameters

A

Parameters whose values are set before the learning process begins and control the behavior of the learning algorithm. Unlike model parameters, hyperparameters are not learned from the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Neural Network

A

A machine learning model inspired by the human brain’s structure, consisting of layers of interconnected nodes (neurons) that process input data to produce output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gradient Descent

A

An optimization algorithm used to minimize the loss function in machine learning models by iteratively adjusting the model parameters in the direction that reduces the error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Feature Engineering

A

The process of selecting, modifying, or creating new features from raw data to improve the performance of a machine learning model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Confusion Matrix

A

A table used to evaluate the performance of a classification model, showing the actual vs. predicted classifications, including true positives, false positives, true negatives, and false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Nearest Neighbors (k-NN)

A

Nearest Neighbors is an instance-based learning method where the classification of a new instance is determined by the majority vote of its ‘k’ closest neighbors from the training dataset. The distance between instances is typically measured using metrics such as Euclidean distance, Manhattan distance, or others depending on the feature types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

knn Lazy learning

A

k-NN is a lazy learner, meaning it doesn’t learn a model explicitly but rather computes results based on the distance from the query point to the stored instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

knn Decision Surface

A

The decision boundary of k-NN is often very irregular and heavily influenced by the choice of ‘k’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

knn- Challenges

A

k-NN is sensitive to the choice of distance metrics and can be computationally expensive, especially in high-dimensional spaces.

17
Q

Regression Models

A

Predict numbers

18
Q

Classification models

A

predicts categories

19
Q

2 types of Supervised learning models

A

classification and regression

20
Q

Training Set

A

data used to train the model (features and targets)

21
Q

X

A

input variable feature

22
Q

Lower case y

A

output variable or target variable

23
Q

m

A

number of training examples

24
Q

(X,y)

A

single training example

25
Q

(X(i), y(i))

A

Ith training example - ith

26
Q

linear function

A

fancy name for a line

27
Q

univariate linear regression

A

1 variable line

28
Q

Cost function

A

the cost function (also known as the loss function or error function) is a mathematical function that quantifies the “error” between a model’s predictions and the actual, expected outcomes.

29
Q

fitting or training

A

capturing patterns from data

30
Q

Univariate Feature Selection

A

his involves analyzing each feature individually and assessing its importance. Methods like the Pearson correlation coefficient or the chi-squared test can help identify features with strong relationships to the target variable

31
Q

Embedded Methods

A

Some machine learning algorithms, like Lasso regression, have built-in feature selection mechanisms. The algorithm automatically assigns weights to features, and features with low weights can be dropped.

31
Q

Wrapper Methods

A

These methods use the machine learning algorithm itself to evaluate the importance of features. Techniques like recursive feature elimination (RFE) iteratively remove features and measure how the model’s performance drops.

32
Q
A