Intro To Machine Learning Flashcards by Eric Saechao

What is machine learning?

A field of study that gives the computer the ability to learn from data without being explicitly programmed

How well did you know this?

Not at all

Perfectly

What is Scikit-learn?

General purpose machine learning library. This is probably the most widely used machine learning library

How well did you know this?

Not at all

Perfectly

What is XGboost

Library primarily used as a gradient boosting machine learning algorithm framework ( advanced machine learning algorithm). This is a commonly used Library in machine learning and industry

How well did you know this?

Not at all

Perfectly

What is LightGBM?

Another library primarily used as a gradient machine learning algorithm, exactly like XGBoost

How well did you know this?

Not at all

Perfectly

What is NLTK( natural language toolkit)?

Among other things, this is a toolkit that helps in understanding text and enhancing some machine learning models

How well did you know this?

Not at all

Perfectly

What is TensorFlow?

Deep learning framework that easily allows for complete customization of deep learning algorithm, butbhas a massive learning curve. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work

How well did you know this?

Not at all

Perfectly

What is PyTorch?

Deep learning framework that feels familiar to most Python developers. It can act as a replacement for NumPy. Since PyTorch has a very similar interface to NumPy, Python developers can migrate to it relatively easily. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work, though less than tensorflow

How well did you know this?

Not at all

Perfectly

What is Keras?

Deep learning framework. You can think of it as a high level wrapper of many different deep learning frameworks similar to how seaborn is a wrapper of matplotlib

How well did you know this?

Not at all

Perfectly

What are the two broad types of machine learning?

Supervised Learning and unsupervised learning

How well did you know this?

Not at all

Perfectly

What is supervised learning?

The most common form of machine learning is supervised learning. In sickit-learn, a supervised learning algorithm learns the relationship between your features matrix and your target vector to make predictions

How well did you know this?

Not at all

Perfectly

What’s a features matrix and target vector?

A features is just one property of the data that is represented as a column. A target is the column of the dataset you want to make predictions for

How well did you know this?

Not at all

Perfectly

What is Regression?

Predict a continuous value. This is considered a regression problem. This means that your target vector contains continuous quantities like home prices

How well did you know this?

Not at all

Perfectly

What is Classification?

Predict a categorical value. This is considered a classification problem. This means your target vector contains categorical quantities like different flower species

How well did you know this?

Not at all

Perfectly

What is unsupervised learning?

In machine learning, you aren’t always trying ti predict a value. Sometimes your goal is to find some structure in your dataset. Unsupervised learning is when you train an algorithm without giving it answers for example in your dataset. No target vector

How well did you know this?

Not at all

Perfectly

What is Target (y)?

The target is the column we are trying to predict. In this case the “charges” column is the target.

How well did you know this?

Not at all

Perfectly

What is Features (x)?

Study These Flashcards

The features are the columns we will use to make the prediction. In this case the other columns (“age”,”sex”,”bmi”,”children”,”smoker”,”region”) are the features

What is model validation?

Study These Flashcards

The goal of supervised learning is to build a model that performs well on new data. The model validation procedure used in this section is called train test split

What is the default training and testing split?

Study These Flashcards

The default split is 75% going to your training set and 25% going to your test set

What is data leakage?

Study These Flashcards

It is important to only use testing data information when we are actually ready to test our model. If we use data from testing data or from set that has testing, it will cause data leakage and invalidate the final evaluation of our model

What is numeric features?

Study These Flashcards

Numeric features should be integers or floats, but if it’s is an object due to things such as a dollar sign $3.00, must be changed. Examples- price, mpg, Number of rooms

What is an Ordinal feature?

Study These Flashcards

They can be strings or integers that represents an ordered class. Examples- low, medium, high, or one stars, two stars, three stars.

What is nominal features?

Study These Flashcards

Their categories that represents different classes in a nonordered class. Examples- male,female or red, green, blue, yellow

What is Scaling data (Scale)?

Study These Flashcards

Generally means to change the range of the values. The shape of distribution doesn’t change. Think about how a scale model of a building has the same promotions at the original, just smaller. That’s why we say draw down to scale

What is standardizations(Standardize)?

Study These Flashcards

Standardizing is one of the several kinds of scaling. It means scaling the values so that the distribution has a standard deviation of 1 with a mean of 0. It outputs something very close to a normal distribution

What is standardization calculated as as?

Standardized_feature= (feature - mean_of_feature) / std_dev_of_feature

What is One-hot-Encoding?

First, one-hot-encoding does Not capture the meaning of the words. The computer does not know what blue looks like, but it can still find relationships between the color and other variables

What is Column Transformer?

Column transformer works in parallel. It scales the numerical data, one-hot-encodes the categorical data and everything else that is not one of those categories is passed through to the final dataset unchanged

Intro To Machine Learning Flashcards

(27 cards)