Intro To Machine Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is machine learning?

A

A field of study that gives the computer the ability to learn from data without being explicitly programmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Scikit-learn?

A

General purpose machine learning library. This is probably the most widely used machine learning library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is XGboost

A

Library primarily used as a gradient boosting machine learning algorithm framework ( advanced machine learning algorithm). This is a commonly used Library in machine learning and industry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is LightGBM?

A

Another library primarily used as a gradient machine learning algorithm, exactly like XGBoost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is NLTK( natural language toolkit)?

A

Among other things, this is a toolkit that helps in understanding text and enhancing some machine learning models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is TensorFlow?

A

Deep learning framework that easily allows for complete customization of deep learning algorithm, butbhas a massive learning curve. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is PyTorch?

A

Deep learning framework that feels familiar to most Python developers. It can act as a replacement for NumPy. Since PyTorch has a very similar interface to NumPy, Python developers can migrate to it relatively easily. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work, though less than tensorflow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Keras?

A

Deep learning framework. You can think of it as a high level wrapper of many different deep learning frameworks similar to how seaborn is a wrapper of matplotlib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two broad types of machine learning?

A

Supervised Learning and unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is supervised learning?

A

The most common form of machine learning is supervised learning. In sickit-learn, a supervised learning algorithm learns the relationship between your features matrix and your target vector to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s a features matrix and target vector?

A

A features is just one property of the data that is represented as a column. A target is the column of the dataset you want to make predictions for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Regression?

A

Predict a continuous value. This is considered a regression problem. This means that your target vector contains continuous quantities like home prices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Classification?

A

Predict a categorical value. This is considered a classification problem. This means your target vector contains categorical quantities like different flower species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is unsupervised learning?

A

In machine learning, you aren’t always trying ti predict a value. Sometimes your goal is to find some structure in your dataset. Unsupervised learning is when you train an algorithm without giving it answers for example in your dataset. No target vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Target (y)?

A

The target is the column we are trying to predict. In this case the “charges” column is the target.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Features (x)?

A

The features are the columns we will use to make the prediction. In this case the other columns (“age”,”sex”,”bmi”,”children”,”smoker”,”region”) are the features

17
Q

What is model validation?

A

The goal of supervised learning is to build a model that performs well on new data. The model validation procedure used in this section is called train test split

18
Q

What is the default training and testing split?

A

The default split is 75% going to your training set and 25% going to your test set

19
Q

What is data leakage?

A

It is important to only use testing data information when we are actually ready to test our model. If we use data from testing data or from set that has testing, it will cause data leakage and invalidate the final evaluation of our model

20
Q

What is numeric features?

A

Numeric features should be integers or floats, but if it’s is an object due to things such as a dollar sign $3.00, must be changed. Examples- price, mpg, Number of rooms

21
Q

What is an Ordinal feature?

A

They can be strings or integers that represents an ordered class. Examples- low, medium, high, or one stars, two stars, three stars.

22
Q

What is nominal features?

A

Their categories that represents different classes in a nonordered class. Examples- male,female or red, green, blue, yellow

23
Q

What is Scaling data (Scale)?

A

Generally means to change the range of the values. The shape of distribution doesn’t change. Think about how a scale model of a building has the same promotions at the original, just smaller. That’s why we say draw down to scale

24
Q

What is standardizations(Standardize)?

A

Standardizing is one of the several kinds of scaling. It means scaling the values so that the distribution has a standard deviation of 1 with a mean of 0. It outputs something very close to a normal distribution

25
Q

What is standardization calculated as as?

A

Standardized_feature= (feature - mean_of_feature) / std_dev_of_feature

26
Q

What is One-hot-Encoding?

A

First, one-hot-encoding does Not capture the meaning of the words. The computer does not know what blue looks like, but it can still find relationships between the color and other variables

27
Q

What is Column Transformer?

A

Column transformer works in parallel. It scales the numerical data, one-hot-encodes the categorical data and everything else that is not one of those categories is passed through to the final dataset unchanged