Machine Learning Flashcards
Decision Tree
Type of ML model. Easy to understand, and they are the basic building block for some of the best models in data science. You can capture more factors using a tree that has more “splits.” These are called “deeper” trees.
Fitting / Training
Capturing patterns from data. After the model has been fit, you can apply it to new data to predict or identify patterns.
Training Data
The data used to fit the model.
Leaf
Point at the bottom of a decision tree where we make a prediction is called a leaf.
Pandas
The primary library data scientists use for exploring and manipulating data. Most people abbreviate pandas in their code as pd.
DataFrame
holds the type of data you might think of as a table. This is similar to a sheet in Excel, or a table in a SQL database.
Prediction Target
The column we want to predict.
Features
The columns that are inputted into our model (and later used to make predictions). Sometimes, you will use all columns except the target as features. Other times you’ll be better off with fewer features.
Scikit-learn
most popular library for modeling the types of data typically stored in DataFrames.