Exam 3 Flashcards
What makes time series data different from what we have studied so far in this course?
This data has time or dates
What are the major components of a time-series Signal?
Level: Average value of the series
Trend: Increasing, decreasing, or static
Seasonality: Repetition in the data
Noise
What are the two models we can use to understand the major components?
multiplicative
additive
How do we choose between multiplicative and additive models?
When to use multiplicative model: when repetition changes over time
When to use additive model: when multiplicative model statement isn’t true
If a signal does not have seasonality, what should we expect to see in the graphs for the two models?
Additive = seasonality would be close to 0
Multiplicative = seasonality would be close to 1
Which components are present in all signals, and which are not guaranteed?
All series have level and noise, however trend and seasonality are not guaranteed
Why is kmeans considered unsupervised learning?
We do not know which group the data belongs to before clustering
The steps for Kmeans clustering
Step 1: Randomly select K observations as initial cluster centroids (center)
* Step 2: Use a distance (similarity) metric for assigning each observation to one of K clusters
* Step 3: Recalculate cluster centroid
(center)
* Step 4: If any data points changed clusters in Step 2 AND we have not reached our max iterations, go back to Step 2
Know the difference between the K in KNN and Kmeans
K in k-means: number of clusters
K in KNN: number of neighbors to compare for class assignment
What is convergence?
Convergence: no change in clusters
How do we understand the similarity of a datapoint to each cluster center?
Most similar == smallest distance
* Euclidean distance: Because we are calculating a distance, the features must be numeric for K-means clustering
What do we use for predictors when using linear regression on
time-series data?
trend and seasonality as predictors
How do we use seasonality as a predictor?
* What do we need to know to create a sub-interval for one repetition
Size of the sub-repetitions is the period
How are the training and validation sets different for time-series Datasets?
assumes the relationship between time steps is linear
How does the time step affect the linear regression models for forecasting? (give example)
If we use quarters to represent seasonality, we end up with 4 linear models
* If we chose to use months,we end up with 12 linear models