Exam 3 Flashcards
What makes time series data different from what we have studied so far in this course?
This data has time or dates
What are the major components of a time-series Signal?
Level: Average value of the series
Trend: Increasing, decreasing, or static
Seasonality: Repetition in the data
Noise
What are the two models we can use to understand the major components?
multiplicative
additive
How do we choose between multiplicative and additive models?
When to use multiplicative model: when repetition changes over time
When to use additive model: when multiplicative model statement isn’t true
If a signal does not have seasonality, what should we expect to see in the graphs for the two models?
Additive = seasonality would be close to 0
Multiplicative = seasonality would be close to 1
Which components are present in all signals, and which are not guaranteed?
All series have level and noise, however trend and seasonality are not guaranteed
Why is kmeans considered unsupervised learning?
We do not know which group the data belongs to before clustering
The steps for Kmeans clustering
Step 1: Randomly select K observations as initial cluster centroids (center)
* Step 2: Use a distance (similarity) metric for assigning each observation to one of K clusters
* Step 3: Recalculate cluster centroid
(center)
* Step 4: If any data points changed clusters in Step 2 AND we have not reached our max iterations, go back to Step 2
Know the difference between the K in KNN and Kmeans
K in k-means: number of clusters
K in KNN: number of neighbors to compare for class assignment
What is convergence?
Convergence: no change in clusters
How do we understand the similarity of a datapoint to each cluster center?
Most similar == smallest distance
* Euclidean distance: Because we are calculating a distance, the features must be numeric for K-means clustering
What do we use for predictors when using linear regression on
time-series data?
trend and seasonality as predictors
How do we use seasonality as a predictor?
* What do we need to know to create a sub-interval for one repetition
Size of the sub-repetitions is the period
How are the training and validation sets different for time-series Datasets?
assumes the relationship between time steps is linear
How does the time step affect the linear regression models for forecasting? (give example)
If we use quarters to represent seasonality, we end up with 4 linear models
* If we chose to use months,we end up with 12 linear models
How does linear regression differ for time-series data from traditional linear regression?
* Hint: How many lines are used to estimate the target?
Depends on the time your data is in and how you want to display it (months, days, etc)
If seasonality does not exist
straight line graph
what is the significance between multiplicative and additive
level and trend are going to be the same, you are wanting to see the difference between seasonality and noise (y-axis difference)
which signals are always guaranteed and not
guarenteed = noise and level
not = trend and seasonality
how does time models differ from linear
time (seasonality) is accounted and we are fitting multiple lines of data vs 1 to estimate repetition. training and validation also must be in chronological order
KMeans is used to classify datapoints. (True or False)
False
In linear regression, to determine the repetition sub-interval, we only need to know the time-step of the dataset.
False
KMeans always converges to an ideal grouping.
False
If linear regression is applied to a time-series dataset without seasonality, it will produce the same results as regular linear regression (ie MLR)
True
Noise graphs from both the additive and multiplicative models can be used to choose which decomposition model to use.
True
In K-means clustering, we group data based on an expected target.
False
Ideally, in clustering, the distance between centroids is minimized.
False
Your goal is to understand sales and demographic data from eight different store locations and identify the differences between a high performing stores vs a low performing stores. Which model would be best?
KMeans
You run a landscaping company and have tracked the last 3 years of demand for your services. What models can you use to help predict the demand for year 4?
Linear Regression
Select everything we need to know in order to use multiple linear regression with a time series signal
Trend
Seasonality
Time-step