Trends, Forecasts, Clusters Flashcards

1
Q

True/False: You can show seasonality in a line chart with a trend line.

A

Hint: seasonality means that there are repeated patterns (i.e. there is always a spike in December)

False, you wouldn’t use a trend line b/c a trend line would not show repeated patterns.

See image 1

Bonus: You could use forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When the repeated pattern correlates with itself, it’s known as ______.

A

Autocorrelation
See image 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MASE means ____ & its equation is ___.

A

Mean absolute scaled error

MAE model forecast/MAE naive forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MAE is ____.

A

Mean absolute error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A MASE close to zero means ___.

A

Your model is really accurate in predicting the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MASE = 0.5 means ___.

A

Your forecast is likely to have half as much error as a naive forecast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is better, a MASE of 1 or 0.5?

What does each mean?

A

0.5

0.5 means that your forecast error is likely less than that of a naive forecast

1 means that your forecast is as accurate as a naive forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A MASE close to 1 means ___.

A

Your model is similar to the naive forecast.

i.e. your forecast is no better than a naive forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A MASE more than 1 means ____.

A

Your model is worse than the naive forecast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A MASE of 0.65 means ____.

Is this acceptable?

A

Your model has a 65% error of the naive model.

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

See image 2.

What would happen to the MASE if the peak of interest didn’t happen in April 2020?

A

The MASE would be lower, meaning your model would be more accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Because Tableau uses ____ in Forecasting, Forecasting is more influenced by ___ values than ___ values.

A

Exponential smoothing
Recent
Past

Forecasting in Tableau uses a technique known as exponential smoothing. Its formula shows that predictions will be influenced more by recent values than the past.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is MAE calculated?

A

The absolute difference between the Actual and Forecast values then takes the mean of those values.

See image 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Naive Forecasting Method?

A

The Forecast is the last observed value.

See image 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True/False: The Naive Forecasting method is oversimplified, but very cost-effective.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Naive Forecasting can be better than other complex forecasting models because ____.

A

Although it’s oversimplified, it’s very cost-effective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What field/s do you need for a forecast?

A

Date field and a measure

Bonus: Alternative to date field: a dimension with integer values (This is uncommon)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Instead of having a date field and a measure, what can you replace the date field with for a Forecast?

A

A dimension with integer values (This is uncommon).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between trend and seasonality?

A

Trend is a tendency in the data to increase or decrease over time.

Seasonality is a repeating, predictable variation in value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True/False: A MASE of 1 means that you have a perfect model.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True/False: Predicting the amount of stock in your warehouse can be done with forecasting.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When creating a forecast, Tableau automatically creates a gap between the actual and predicted values. How can you close the gap?

A

Change the Forecast indicator to an attribute.

Bonus: See “Forecast” in colors mark > drop-down arrow > Change “Dimension” to “Attribute”

Note that this cannot be found on the Analysis Tab.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can you find how accurate is a forecast?

How can you find the accuracy of a forecast value?

A

1) Look at “Quality” or MASE in “Describe Forcast” (NOT p-value)

Steps: Find in “Analyst” tab or
right click data point in the forecast > “Forecast” > “Describe Forecast” > see Quality in the Summary tab or MASE in the Models tab

2) Look at Precision/Precions %

Steps: Change column measure to Precision/% or
Add Precision/% to tooltip

Tooltip Steps:
Drag y to Marks Tooltip > click arrow > Forecast results > Precision/%

See image 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can you change the Forecast length?

A

Change it in the “Forecast Options” menu.

Image 6

Bonus: Right click forecast > “Forecast” > “Forecast Options”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can you change the Confidence Interval of your Forecast?

A

Change it under the “Forecast Model” section of the “Forecast Options” menu.

Image 6

Right click forecast > “Forecast” > “Forecast Options” > change the prediction interval under the “Forecast Model” section

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you add Forecast Values to the tooltip? What are 2 Forecast Results you can show?

A

(After creating the forecast visual) Data Pane > drag the measure (usually y axis) to the Tooltip Mark (Can duplicate from columns/rows - usually columns).

Forecast Results: Precision & quality

Image 7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a forecast quality of 89 mean for the data point of Dec 2020, views (the measure) being 25k?

A

The prediction quality for Dec 2020 is 89%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does a forecast quality of 100 mean for the data point of Dec 2020, views (the measure) being 25k?

A

The prediction quality for Dec 2020 is 100%, the best possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does a forecast prediction of +-16k views mean for the data point of Dec 2020, views (the measure) being 25k?

A

The true views could be 16k views higher or lower than the predicted 25k views.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

True/False: Forecasting in Tableau uses a technique known as exponential smoothing.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Forecasting models (like Regression and exponential smoothing) are examples of supervised learning technique/s, meaning ____.

A

They applied a known relationship (bt 2 variables) on a new, unseen data

Bonus: Trend lines (a basic statistical analysis feature) nor Clustering (an unsupervised learning technique) are examples of Supervised learning techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

True/False: Trend lines make complex predictions based on a trained model.

A

False: Trend lines primarily function as a visual aid to identify patterns and trends in data over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

____ is/are example/s of unsupervised learning technique/s, meaning ____.

A

Clustering

It looks for similar data points & detects patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

True/False: Clustering looks for patterns in the data and tries to group similar observations into clusters, knowing what each cluster represents.

A

False: it should be: “WITHOUT knowing what each cluster represents.”

35
Q

Which of these are examples of clustering?

1 - Reducing the # of colors of a pic

2 - Recognizing that a cat is hidden in a picture of dogs

3 - Social network analysis

4 - Market basket analysis (checking which items are bought together)

A

All “Yes” except “Recognizing that a cat is hidden in a picture of dogs”

No, this would require supervised learning. You first learned the model what data points belong to a cat/dog (this is called classification)

36
Q

What does the ‘k’ in K-means clustering represent?

A

The number of clusters you want to split your data into.

37
Q

How do you split your data into 5 groups, using unsupervised learning technique?

A

Use Clustering.

Set your ‘k’ in k-means clustering to 5.

38
Q

If there are blended dimensions in the view, can you use Clustering?

39
Q

If there are dimensions present in an aggregated view, can you use Clustering?

A

Yes. When there are no dimensions present in an aggregated view, there can be no clustering in Tableau.

image 8

Note: This does not mean that the dimension needs to be aggregated, but that aggregation in general needs to be in view

40
Q

Give 3 examples of ‘if _ happens, you will not be able to save clusters to the Data pane’.

When the measures in the view are ____ and the measures you are using as clustering variables are____.

When the Clusters you want to save are on ____.

When ___ are in view.

A

****When the measures in the view are disaggregated and the measures you are using as clustering variables are not the same as the measures in the view.

When the Clusters you want to save are on the Filters shelf.

When Measure Names or Measure Values are in view.

41
Q

What is added to the ‘Not Clustered’ category?

Hint: 2

A

Null values for a measure.

Categorical variables (that is, dimensions) that return * for ATTR (meaning that all values are not identical)

42
Q

What is needed for Clustering?

A

1 dimension, 2 measures, Scatterplot diagram.

43
Q

What will happen to the visual if another measure is added?

(see image 8)

A

When you add more than two variables you will see by the color, that dimensions will start to overlap. This is normal.

44
Q

True/False: You can assess the quality of the clustering result by comparing actual and predicted values.

A

False, ‘You cannot assess.’

These 2 metrics are used to assess the cluster quality:

B/t-group sum of squares

W/in-group sum of squares

45
Q

_____ measures the separation between the clusters as the sum of squared distances between each cluster’s center, and the average value of the data set.

A

Between-group sum of squares.

Image 9

46
Q

A larger Between-group sum of squares value means ____.

A

A better separation between clusters.

Image 9

47
Q

True/False: A lower Between-group sum of squares value is better than a higher value.

A

False.

Image 9

48
Q

What quantifies the cohesion of clusters as the sum of squared distances between the center of each cluster & the individual data point in the cluster?

A

Within-group sum of squares.

Image 9

49
Q

True/False: A lower Within-group sum of squares value is better than a higher value.

A

True.

Image 9

50
Q

See Image 10.

The p-value for each variable is < 0.05. This suggests that_____

A

The expected values of the corresponding variable differ among clusters.

In this case, all variables seem to be different enough in all clusters

51
Q

True/False: You can drag the Cluster in the marks card to the Data Panel?

52
Q

True/False: A trend line can be curved.

53
Q

Name 2 types of trend lines.

A

Linear, Logarithmic

Exponential, Polynomial

54
Q

Linear regression model is used in Tableau for ____ trend lines.

55
Q

What kind of trend line uses this formula: y = a * x + b?

A

Linear.

Bonus: a & b are called the model coefficients

56
Q

Every trend line uses a different model. There are multiple models, but they all have the same goal: to ____.

A

To minimize the residual.

57
Q

What is a residual?

A

The distance between observation and trend line.

Image 11

58
Q

R^2 is what?

A

The coefficient of correlation squared.

The R^2 value is similar to P-value in that it is used to analyse the trend line.

The R^2 value quantifies the strength of the entire model at explaining changes in the variable output (y) given variable input (x)

R^2 value (e.g. 0.64) means that ____ (e.g. over half) of the variation in ‘y’ can be explained by variation in ‘x’

Aka. correlation coefficient squared

Bonus: coefficient is the numerical/constant quality placed bf & multiplying the variable in an algebraic expression

59
Q

What is the range and range meaning of R^2?

A

a statistical measure that indicates how much of the variation of a dependent variable is explained by an independent variable in a regression model.

0 (worst, your model is no better than randomness) to 1 (best, perfect fit).

image 17 shows formula

60
Q

What does R^2 = 0.33 mean?

A

33% of the variation in the dependent variable? (e.g. y = species richness) is explained by the independent variable? (e.g. x = distance) between observation & trend line.

This is a poor fit

61
Q

Coefficient of determination means ____.

62
Q

What is RSE?

What does it mean for a Trendline?

What is it’s Unit in a trendline?

A

Residual Standard Error (Just called Standard Error in Tableau)

Avg diff bt the observed values & trend

Has the same unit as the variable on the y axis

i.e. if Species is on the y axis and RSE = 1.6 then the avg diff bt the obsed value and the trendline is 1.6 species

63
Q

What does RSE = 3.69 mean?

A

The model typically differs 3-4 species from the observed value.

64
Q

A lower RSE is ____.

A

Better/more accurate.

65
Q

How to you calculate the CI?

A

see image 16

You take the sample mean, and subtract and add (to get the lower and upper bound respectively) the standard error multiplied by a confidence level.

66
Q

What does the confidence interval look like for a linear model and Logarithmic model?

A

See Image 13.

67
Q

The P-value of a linear/log model tells you the chance that there is ____.

A

No relationship/correlation between the 2 variables.

68
Q

What does a p-value = 0.001 mean for a scatter plot that shows a trend line using a logarithmic model?

A

The chance of 1/1,000 that there is no correlation between the 2 variables.

69
Q

The model is statistically significant if the p-value < ____.

70
Q

Which of these do you want to be high and which do you want to be low? RSE, R^2, Residual, p-value, confidence interval.

A

All by R^2 should be as small as possible.

71
Q

What is RSE called in Tableau?

A

Standard Error.

72
Q

Where can you find the RSE of a Trend line?

A

Right click trend line > Describe Trend Model > see “Standard Error.”

73
Q

Where can you find the p-value of a Trend line?

A

Hover over the trend line or Right click trend line > Describe Trend Model.

74
Q

Where can you find the R^2 of a Trend line?

A

Hover over the trend line or Right click trend line > Describe Trend Model.

75
Q

How can you add CIs around the trend line?

A

Right click trend line>”Edit All Trend Lines”> Select “confidence bands.”

CI = Confidence Interval.

Image 14

76
Q

How can you change the model type of a trend line?

A

Right click trend line>”Edit All Trend Lines”> Select options under “Model Type.”

Image 14

77
Q

What does “standard error of the mean” mean?

A

How much the sample mean and population mean deviate.

Image 15

78
Q

What does “standard error of the mean” mean?

A

How much the sample mean and population mean deviate

79
Q

Regression and exponential smoothing are ____ models in Tableau

A

Forecasting

80
Q

True/False: Trend Lines use supervised learning

A

False, you do not need to define target variables or train the model on labeled data to use trend lines

Trend lines aren’t making predictions

81
Q

Give 5 examples of ‘if _ happens, you will not be able to create a cluster’.

When there is a _____ in the view.

When you are using a____ data source.

When ___ are the variables (inputs) for clustering.

When there are no ___ that can be used as ___ for clustering in the view.

When there are no ____ present in an aggregated view.

A

When there is a blended dimension in the view.

When you are using a cube (multidimensional) data source.

When Measure Names or Measure Values are the variables (inputs) for clustering.
Also:
*Table calculations
* Blended calculations
* Ad-hoc calculations
* Generated
* latitude/longitude values
* Groups
* Sets
* Bins
* Parameters
* Dates

When there are no fields that can be used as variables (inputs) for clustering in the view.

When there are no dimensions present in an aggregated view.

82
Q

What is Precision and Percison % in a Forecast?

A

Precision—Show the prediction interval distance from the forecast value for the configured confidence level. Units are the forecast indicator (Usually y axis/non-time variable)

Precision %—Show precision as a percentage of the forecast value.

Both provide more detail about the forecast value, not the forecast as a whole.

83
Q

What are 2 statistical measures used for analyzing a trend line?

A

R^2

P-value