8.4 Predictive Analytics Data Flow Flashcards
What is the first step in the data flow for predictive analytics?
Load the data.
After loading the data, what should you do next?
Split the data into training and testing sets.
Why is some data set aside and not used in training or tuning?
It’s used as a validation set to ensure unbiased evaluation during hyperparameter tuning.
When should you train the model?
After splitting the data and setting aside validation data.
What are the three main types of predictive models to choose from?
Regression, Classification, and Clustering.
What do you evaluate your model on first?
The training data.
What is hyperparameter tuning?
Adjusting settings like number of neighbors in KNN or neurons/layers in ANN to optimize performance.
What are examples of hyperparameters in different models?
- KNN: Number of neighbors
- ANN: Number of neurons and layers
After tuning, what data is used to evaluate the model?
The testing data.
Why evaluate using the test data more than once?
To ensure consistent and reliable performance across evaluations.
What is the next step after evaluating the model?
Consider whether all errors are equal (evaluate error impact and cost).