Trends, Forecasts, Clusters Flashcards
True/False: You can show seasonality in a line chart with a trend line.
Hint: seasonality means that there are repeated patterns (i.e. there is always a spike in December)
False, you wouldn’t use a trend line b/c a trend line would not show repeated patterns.
See image 1
Bonus: You could use forecasting
When the repeated pattern correlates with itself, it’s known as ______.
Autocorrelation
See image 2
MASE means ____ & its equation is ___.
Mean absolute scaled error
MAE model forecast/MAE naive forecast
MAE is ____.
Mean absolute error
A MASE close to zero means ___.
Your model is really accurate in predicting the future.
MASE = 0.5 means ___.
Your forecast is likely to have half as much error as a naive forecast.
What is better, a MASE of 1 or 0.5?
What does each mean?
0.5
0.5 means that your forecast error is likely less than that of a naive forecast
1 means that your forecast is as accurate as a naive forecast
A MASE close to 1 means ___.
Your model is similar to the naive forecast.
i.e. your forecast is no better than a naive forecast
A MASE more than 1 means ____.
Your model is worse than the naive forecast.
A MASE of 0.65 means ____.
Is this acceptable?
Your model has a 65% error of the naive model.
Yes.
See image 2.
What would happen to the MASE if the peak of interest didn’t happen in April 2020?
The MASE would be lower, meaning your model would be more accurate.
Because Tableau uses ____ in Forecasting, Forecasting is more influenced by ___ values than ___ values.
Exponential smoothing
Recent
Past
Forecasting in Tableau uses a technique known as exponential smoothing. Its formula shows that predictions will be influenced more by recent values than the past.
How is MAE calculated?
The absolute difference between the Actual and Forecast values then takes the mean of those values.
See image 3
What is the Naive Forecasting Method?
The Forecast is the last observed value.
See image 4
True/False: The Naive Forecasting method is oversimplified, but very cost-effective.
True
Naive Forecasting can be better than other complex forecasting models because ____.
Although it’s oversimplified, it’s very cost-effective.
What field/s do you need for a forecast?
Date field and a measure
Bonus: Alternative to date field: a dimension with integer values (This is uncommon)
Instead of having a date field and a measure, what can you replace the date field with for a Forecast?
A dimension with integer values (This is uncommon).
What is the difference between trend and seasonality?
Trend is a tendency in the data to increase or decrease over time.
Seasonality is a repeating, predictable variation in value.
True/False: A MASE of 1 means that you have a perfect model.
False
True/False: Predicting the amount of stock in your warehouse can be done with forecasting.
True
When creating a forecast, Tableau automatically creates a gap between the actual and predicted values. How can you close the gap?
Change the Forecast indicator to an attribute.
Bonus: See “Forecast” in colors mark > drop-down arrow > Change “Dimension” to “Attribute”
Note that this cannot be found on the Analysis Tab.
How can you find how accurate is a forecast?
How can you find the accuracy of a forecast value?
1) Look at “Quality” or MASE in “Describe Forcast” (NOT p-value)
Steps: Find in “Analyst” tab or
right click data point in the forecast > “Forecast” > “Describe Forecast” > see Quality in the Summary tab or MASE in the Models tab
2) Look at Precision/Precions %
Steps: Change column measure to Precision/% or
Add Precision/% to tooltip
Tooltip Steps:
Drag y to Marks Tooltip > click arrow > Forecast results > Precision/%
See image 5
How can you change the Forecast length?
Change it in the “Forecast Options” menu.
Image 6
Bonus: Right click forecast > “Forecast” > “Forecast Options”
How can you change the Confidence Interval of your Forecast?
Change it under the “Forecast Model” section of the “Forecast Options” menu.
Image 6
Right click forecast > “Forecast” > “Forecast Options” > change the prediction interval under the “Forecast Model” section
How can you add Forecast Values to the tooltip? What are 2 Forecast Results you can show?
(After creating the forecast visual) Data Pane > drag the measure (usually y axis) to the Tooltip Mark (Can duplicate from columns/rows - usually columns).
Forecast Results: Precision & quality
Image 7
What does a forecast quality of 89 mean for the data point of Dec 2020, views (the measure) being 25k?
The prediction quality for Dec 2020 is 89%.
What does a forecast quality of 100 mean for the data point of Dec 2020, views (the measure) being 25k?
The prediction quality for Dec 2020 is 100%, the best possible.
What does a forecast prediction of +-16k views mean for the data point of Dec 2020, views (the measure) being 25k?
The true views could be 16k views higher or lower than the predicted 25k views.
True/False: Forecasting in Tableau uses a technique known as exponential smoothing.
True
Forecasting models (like Regression and exponential smoothing) are examples of supervised learning technique/s, meaning ____.
They applied a known relationship (bt 2 variables) on a new, unseen data
Bonus: Trend lines (a basic statistical analysis feature) nor Clustering (an unsupervised learning technique) are examples of Supervised learning techniques
True/False: Trend lines make complex predictions based on a trained model.
False: Trend lines primarily function as a visual aid to identify patterns and trends in data over time.
____ is/are example/s of unsupervised learning technique/s, meaning ____.
Clustering
It looks for similar data points & detects patterns.
True/False: Clustering looks for patterns in the data and tries to group similar observations into clusters, knowing what each cluster represents.
False: it should be: “WITHOUT knowing what each cluster represents.”
Which of these are examples of clustering?
1 - Reducing the # of colors of a pic
2 - Recognizing that a cat is hidden in a picture of dogs
3 - Social network analysis
4 - Market basket analysis (checking which items are bought together)
All “Yes” except “Recognizing that a cat is hidden in a picture of dogs”
No, this would require supervised learning. You first learned the model what data points belong to a cat/dog (this is called classification)
What does the ‘k’ in K-means clustering represent?
The number of clusters you want to split your data into.
How do you split your data into 5 groups, using unsupervised learning technique?
Use Clustering.
Set your ‘k’ in k-means clustering to 5.
If there are blended dimensions in the view, can you use Clustering?
No.
If there are dimensions present in an aggregated view, can you use Clustering?
Yes. When there are no dimensions present in an aggregated view, there can be no clustering in Tableau.
image 8
Note: This does not mean that the dimension needs to be aggregated, but that aggregation in general needs to be in view
Give 3 examples of ‘if _ happens, you will not be able to save clusters to the Data pane’.
When the measures in the view are ____ and the measures you are using as clustering variables are____.
When the Clusters you want to save are on ____.
When ___ are in view.
****When the measures in the view are disaggregated and the measures you are using as clustering variables are not the same as the measures in the view.
When the Clusters you want to save are on the Filters shelf.
When Measure Names or Measure Values are in view.
What is added to the ‘Not Clustered’ category?
Hint: 2
Null values for a measure.
Categorical variables (that is, dimensions) that return * for ATTR (meaning that all values are not identical)
What is needed for Clustering?
1 dimension, 2 measures, Scatterplot diagram.
What will happen to the visual if another measure is added?
(see image 8)
When you add more than two variables you will see by the color, that dimensions will start to overlap. This is normal.
True/False: You can assess the quality of the clustering result by comparing actual and predicted values.
False, ‘You cannot assess.’
These 2 metrics are used to assess the cluster quality:
B/t-group sum of squares
W/in-group sum of squares
_____ measures the separation between the clusters as the sum of squared distances between each cluster’s center, and the average value of the data set.
Between-group sum of squares.
Image 9
A larger Between-group sum of squares value means ____.
A better separation between clusters.
Image 9
True/False: A lower Between-group sum of squares value is better than a higher value.
False.
Image 9
What quantifies the cohesion of clusters as the sum of squared distances between the center of each cluster & the individual data point in the cluster?
Within-group sum of squares.
Image 9
True/False: A lower Within-group sum of squares value is better than a higher value.
True.
Image 9
See Image 10.
The p-value for each variable is < 0.05. This suggests that_____
The expected values of the corresponding variable differ among clusters.
In this case, all variables seem to be different enough in all clusters
True/False: You can drag the Cluster in the marks card to the Data Panel?
True.
True/False: A trend line can be curved.
True.
Name 2 types of trend lines.
Linear, Logarithmic
Exponential, Polynomial
Linear regression model is used in Tableau for ____ trend lines.
Linear.
What kind of trend line uses this formula: y = a * x + b?
Linear.
Bonus: a & b are called the model coefficients
Every trend line uses a different model. There are multiple models, but they all have the same goal: to ____.
To minimize the residual.
What is a residual?
The distance between observation and trend line.
Image 11
R^2 is what?
The coefficient of correlation squared.
The R^2 value is similar to P-value in that it is used to analyse the trend line.
The R^2 value quantifies the strength of the entire model at explaining changes in the variable output (y) given variable input (x)
R^2 value (e.g. 0.64) means that ____ (e.g. over half) of the variation in ‘y’ can be explained by variation in ‘x’
Aka. correlation coefficient squared
Bonus: coefficient is the numerical/constant quality placed bf & multiplying the variable in an algebraic expression
What is the range and range meaning of R^2?
a statistical measure that indicates how much of the variation of a dependent variable is explained by an independent variable in a regression model.
0 (worst, your model is no better than randomness) to 1 (best, perfect fit).
image 17 shows formula
What does R^2 = 0.33 mean?
33% of the variation in the dependent variable? (e.g. y = species richness) is explained by the independent variable? (e.g. x = distance) between observation & trend line.
This is a poor fit
Coefficient of determination means ____.
R^2.
What is RSE?
What does it mean for a Trendline?
What is it’s Unit in a trendline?
Residual Standard Error (Just called Standard Error in Tableau)
Avg diff bt the observed values & trend
Has the same unit as the variable on the y axis
i.e. if Species is on the y axis and RSE = 1.6 then the avg diff bt the obsed value and the trendline is 1.6 species
What does RSE = 3.69 mean?
The model typically differs 3-4 species from the observed value.
A lower RSE is ____.
Better/more accurate.
How to you calculate the CI?
see image 16
You take the sample mean, and subtract and add (to get the lower and upper bound respectively) the standard error multiplied by a confidence level.
What does the confidence interval look like for a linear model and Logarithmic model?
See Image 13.
The P-value of a linear/log model tells you the chance that there is ____.
No relationship/correlation between the 2 variables.
What does a p-value = 0.001 mean for a scatter plot that shows a trend line using a logarithmic model?
The chance of 1/1,000 that there is no correlation between the 2 variables.
The model is statistically significant if the p-value < ____.
0.05.
Which of these do you want to be high and which do you want to be low? RSE, R^2, Residual, p-value, confidence interval.
All by R^2 should be as small as possible.
What is RSE called in Tableau?
Standard Error.
Where can you find the RSE of a Trend line?
Right click trend line > Describe Trend Model > see “Standard Error.”
Where can you find the p-value of a Trend line?
Hover over the trend line or Right click trend line > Describe Trend Model.
Where can you find the R^2 of a Trend line?
Hover over the trend line or Right click trend line > Describe Trend Model.
How can you add CIs around the trend line?
Right click trend line>”Edit All Trend Lines”> Select “confidence bands.”
CI = Confidence Interval.
Image 14
How can you change the model type of a trend line?
Right click trend line>”Edit All Trend Lines”> Select options under “Model Type.”
Image 14
What does “standard error of the mean” mean?
How much the sample mean and population mean deviate.
Image 15
What does “standard error of the mean” mean?
How much the sample mean and population mean deviate
Regression and exponential smoothing are ____ models in Tableau
Forecasting
True/False: Trend Lines use supervised learning
False, you do not need to define target variables or train the model on labeled data to use trend lines
Trend lines aren’t making predictions
Give 5 examples of ‘if _ happens, you will not be able to create a cluster’.
When there is a _____ in the view.
When you are using a____ data source.
When ___ are the variables (inputs) for clustering.
When there are no ___ that can be used as ___ for clustering in the view.
When there are no ____ present in an aggregated view.
When there is a blended dimension in the view.
When you are using a cube (multidimensional) data source.
When Measure Names or Measure Values are the variables (inputs) for clustering.
Also:
*Table calculations
* Blended calculations
* Ad-hoc calculations
* Generated
* latitude/longitude values
* Groups
* Sets
* Bins
* Parameters
* Dates
When there are no fields that can be used as variables (inputs) for clustering in the view.
When there are no dimensions present in an aggregated view.
What is Precision and Percison % in a Forecast?
Precision—Show the prediction interval distance from the forecast value for the configured confidence level. Units are the forecast indicator (Usually y axis/non-time variable)
Precision %—Show precision as a percentage of the forecast value.
Both provide more detail about the forecast value, not the forecast as a whole.
What are 2 statistical measures used for analyzing a trend line?
R^2
P-value