Intro To Machine Learning Flashcards
What is machine learning?
A field of study that gives the computer the ability to learn from data without being explicitly programmed
What is Scikit-learn?
General purpose machine learning library. This is probably the most widely used machine learning library
What is XGboost
Library primarily used as a gradient boosting machine learning algorithm framework ( advanced machine learning algorithm). This is a commonly used Library in machine learning and industry
What is LightGBM?
Another library primarily used as a gradient machine learning algorithm, exactly like XGBoost
What is NLTK( natural language toolkit)?
Among other things, this is a toolkit that helps in understanding text and enhancing some machine learning models
What is TensorFlow?
Deep learning framework that easily allows for complete customization of deep learning algorithm, butbhas a massive learning curve. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work
What is PyTorch?
Deep learning framework that feels familiar to most Python developers. It can act as a replacement for NumPy. Since PyTorch has a very similar interface to NumPy, Python developers can migrate to it relatively easily. This framework can also do traditional machine learning, but it requires a lot of knowledge of how individual machine learning algorithms work, though less than tensorflow
What is Keras?
Deep learning framework. You can think of it as a high level wrapper of many different deep learning frameworks similar to how seaborn is a wrapper of matplotlib
What are the two broad types of machine learning?
Supervised Learning and unsupervised learning
What is supervised learning?
The most common form of machine learning is supervised learning. In sickit-learn, a supervised learning algorithm learns the relationship between your features matrix and your target vector to make predictions
What’s a features matrix and target vector?
A features is just one property of the data that is represented as a column. A target is the column of the dataset you want to make predictions for
What is Regression?
Predict a continuous value. This is considered a regression problem. This means that your target vector contains continuous quantities like home prices
What is Classification?
Predict a categorical value. This is considered a classification problem. This means your target vector contains categorical quantities like different flower species
What is unsupervised learning?
In machine learning, you aren’t always trying ti predict a value. Sometimes your goal is to find some structure in your dataset. Unsupervised learning is when you train an algorithm without giving it answers for example in your dataset. No target vector
What is Target (y)?
The target is the column we are trying to predict. In this case the “charges” column is the target.
What is Features (x)?
The features are the columns we will use to make the prediction. In this case the other columns (“age”,”sex”,”bmi”,”children”,”smoker”,”region”) are the features
What is model validation?
The goal of supervised learning is to build a model that performs well on new data. The model validation procedure used in this section is called train test split
What is the default training and testing split?
The default split is 75% going to your training set and 25% going to your test set
What is data leakage?
It is important to only use testing data information when we are actually ready to test our model. If we use data from testing data or from set that has testing, it will cause data leakage and invalidate the final evaluation of our model
What is numeric features?
Numeric features should be integers or floats, but if it’s is an object due to things such as a dollar sign $3.00, must be changed. Examples- price, mpg, Number of rooms
What is an Ordinal feature?
They can be strings or integers that represents an ordered class. Examples- low, medium, high, or one stars, two stars, three stars.
What is nominal features?
Their categories that represents different classes in a nonordered class. Examples- male,female or red, green, blue, yellow
What is Scaling data (Scale)?
Generally means to change the range of the values. The shape of distribution doesn’t change. Think about how a scale model of a building has the same promotions at the original, just smaller. That’s why we say draw down to scale
What is standardizations(Standardize)?
Standardizing is one of the several kinds of scaling. It means scaling the values so that the distribution has a standard deviation of 1 with a mean of 0. It outputs something very close to a normal distribution
What is standardization calculated as as?
Standardized_feature= (feature - mean_of_feature) / std_dev_of_feature
What is One-hot-Encoding?
First, one-hot-encoding does Not capture the meaning of the words. The computer does not know what blue looks like, but it can still find relationships between the color and other variables
What is Column Transformer?
Column transformer works in parallel. It scales the numerical data, one-hot-encodes the categorical data and everything else that is not one of those categories is passed through to the final dataset unchanged