Machine Learning Flashcards

1
Q

What is machine learning?

A
  • Machine learning is a type of artificial intelligence that allows computer programs to become more accurate at making predictions without being explicitly programmed to do so.
  • Machine learning algorithms use historical data as input to predict new output values.
  • It helps us predict future outcomes or classify information to make decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unsupervised learning? Give an example

A
  • Algorithms that train on unlabeled data.
  • The algorithm scans through data sets looking for connections and trends.

It adds structure to the data in the form of clustering or grouping.

Example: market segmentation − cluster users into groups on the basis of their previous purchases, viewing patterns etc. This can feed into recommender systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is artificial intelligence? What are some examples of artificial intelligence?

A

Artificial intelligence is a branch of computer science concerned with building programmes capable of performing tasks that usually require human intelligence, and can iteratively improve themselves based on the information they collect.

The main types of artificial intelligence are:

Machine Learning - algorithms allow computers to learn a task with minimal instructions and improve with experience

eg - recommendation engines

Deep Learning - a subset of machine learning, processes richer datasets with less preprocessing (eg image recognition) Artificial neural networks, which are algorithms inspired by the human brain, learn from large amounts of data. Deep learning algorithm would perform a task repeatedly, each time tweaking it a little to improve the outcome.

eg - image recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ML model fit

A

Underfitting - when the model has not learned enough from the training data, resulting in low generalisation and unreliable predictions. The model is too simple.

Overfitting - the model fits the training data too well, resulting in poor generalisation. It will underperform when it sees new data. Happens due to high complexity and inadequate training data.

Balanced - good generalisation, so the model can infer conclusions with new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Testing and training data

A

Raw data is split into 2 sets, training and testing. The training set is used to develop a model, the testing data is used to test and validate model performance. The ratio is usually 90:10, 80:20 or 70:30 for train:test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is SkiKit-Learn?

A

A powerful library for machine learning in Python. It contains tools for machine learning and statistical modelling, including:
- classification
- regression
- clustering
- dimensionality reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two categories of supervised learning?

A

In supervised learning, algorithms learn from labeled data. After understanding the data, the algorithm determines which label should be given to new data by associating patterns to the unlabeled new data.

Classification

Classification is a process of building a model which can divide the dataset into classes based on different parameters. The program is trained on a training dataset and based on that training, it categorizes data into different classes.

The task of the classification algorithm is to find the mapping function to map the input(x) to the discrete output(y).

Eg - spam detection

Regression

Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc.

The task of the Regression algorithm is to find the mapping function to map the input variable(x) to the continuous output variable(y).

eg - weather forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a correlation coefficient?

A

A value indicating the strength of a relationship between two variables. -1 = strong negative relationship, +1 = strong positive relationship

df.corr()
sis.heatmap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define logistic regression, decision tree and random forest

A

Logistic Regression is an example of supervised learning. It is used to calculate or predict the probability of a binary (yes/no) event occurring

Decision Trees: a type of Supervised Machine Learning where the data is continuously split according to a certain parameter.

Random Forest: An extension of a simple decision tree, the only difference being this algorithm provides the combined result of many such trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly