Domain 2: Explore and Analyze Data (41%) Flashcards

Question

Name the functions that reveal just parts of a string in a calculated string field.

Answer 1

LEFT(string, number) - Returns the left-most of characters in the string. RIGHT(string, number) - Returns the right-most of characters in the string. (MID(string, start, [length]) - Returns a string starting at the specified start position. The first character in the string is position 1. If the optional numeric argument length is added, the returned string includes only that number of characters.

Answer 2

LEN(string) Example: LEN("Matador") = 7

Answer 3

LOWER(string) UPPER(string) PROPER(string)

Answer 4

LTRIM(string) RTRIM(string)

Answer 5

REPLACE(string, substring, replacement Searches for and replaces it with . If is not found, the string is not changed. Example: REPLACE("Version 3.8", "3.8", "4x") = "Version 4x"

Answer 6

SPLIT(string, delimiter, token number) Returns a substring from a string, using a delimiter character to divide the string into a sequence of tokens. Examples: * SPLIT ("a-b-c-d", "-", 2) = "b" * SPLIT ("a|b|c|d", "|", -2) = "c"

Answer 7

Numeric literals: written as numbers. String literals: written with quotation marks. Date literals: written with the # symbol. Boolean literals: written as either true or false. Null literals: written as null.

Answer 8

Parentheses can be used as needed to force an order of precedence

Answer 9

To add a comment to a calculation, type two forward slash (//) characters.

Answer 10

You use ELSEIF to specify conditions. You use ELSE to encompass any other condition not cited yet. It's the catch-all logical statement.

Answer 11

Evaluates the expression and compares it to the specified options (, , etc.). When a value that matches the expression is encountered, CASE returns the corresponding return. If no match is found, the (optional) default is returned. If there is no default and no values match, then Null is returned. Example: CASE [Season] WHEN 'Summer' THEN 'Sandals' WHEN 'Winter' THEN 'Boots' ELSE 'Sneakers' END "Look at the Season field. If the value is Summer, then return Sandals. If the value is Winter, then return Boots. If none of the options in the calculation match what is in the Season field, return a Sneakers."

Answer 12

Tests a series of expressions and returns the value for the first true . Example: IF [Season] = "Summer" THEN 'Sandals' ELSEIF [Season] = "Winter" THEN 'Boots' ELSE 'Sneakers' END "If Season = Summer, then return Sandals. If not, look at the next expression. If Season = Winter, then return Boots. If neither of the expressions are true, return Sneakers."

Answer 13

In Tableau, CASE statements are easier to read and perform better than IF statements when the data is simple and doesn't require complex TRUE/FALSE evaluations. CASE statements are also better for evaluating a list of values. However, IF statements are more flexible and allow boolean logic in the test. IF statements are also easier to read when nested

Answer 14

IFNULL(expr1, expr2) Returns if it's non-null, otherwise returns . Example: IFNULL([Assigned Room], "TBD") "If the Assigned Room field isn't null, return its value. If the Assigned room field is null, return TBD instead." Compare with IFNULL and ZN: * IFNULL always returns a value. ISNULL returns a boolean. * ZN swaps a zero for a NULL.

Answer 15

IIF(, , , []) Checks whether a condition is met (), and returns if the test is true, if the test is false, and an optional value for if the test is null. If the optional unknown isn't specified, IIF returns null. Example: IIF([Season] = 'Summer', 'Sandals', 'Other footwear') "If Season = Summer, then return Sandals. If not, return Other footwear" IIF doesn't have an equivalent to ELSEIF (like IF) or repeated WHEN clauses (like CASE). Instead, multiple tests can be evaluated sequentially by nesting IIF statements as the element. The first (outermost) true is returned.

Answer 16

IN Returns TRUE if any value in matches any value in . Examples: * SUM([Cost]) IN (1000, 15, 200) "Is the value of the Cost field 1000, 15, or 200?" * [Field] IN [Set] "Is the value of the field present in the set?"

Answer 17

ABS(number)

Answer 18

ROUND(number, [decimals]) Rounds to a specified number of digits. CEILING(number) Rounds a to the nearest integer of equal or greater value. FLOOR(number) Rounds a to the nearest integer of equal or lesser value.

Answer 19

ATTR(expression) You use this when you want to returns the value of the expression if it has a single value for all rows. Otherwise returns an asterisk. Null values are ignored.

Answer 20

MEDIAN is not available for the following data sources: Access, Amazon Redshift, Cloudera Hadoop, HP Vertica, IBM DB2, IBM PDA (Netezza), Microsoft SQL Server, MySQL, SAP HANA, Teradata. For other data source types, you can extract your data into an extract file to use this function.

Answer 21

FIXED level of detail expressions compute a value using the specified dimensions, without reference to the dimensions in the view. INCLUDE level of detail expressions compute values using the specified dimensions in addition to whatever dimensions are in the view. INCLUDE level of detail expressions are most useful when including a dimension that isn’t in the view. EXCLUDE level of detail expressions explicitly remove dimensions from the expression—that is, they subtract dimensions from the view level of detail. EXCLUDE level of detail expressions are most useful for eliminating a dimension in the view.

Answer 22

LOD expressions allow you to compute values at the data source level and the visualization level. Most importantly, LOD expressions let you control the granularity you want to compute. They can be performed at a: * more granular level (INCLUDE), * less granular level (EXCLUDE), or * entirely independent level (FIXED). A level of detail expression has the following structure: {[FIXED | INCLUDE | EXCLUDE] : }

Answer 23

No matter what other aggregations are happening on the viz, you want to reveal the sales per customer.

Answer 24

In the Data pane, control-click drag the measure you want to aggregate onto the desired dimension

Answer 25

INCLUDE and EXCLUDE FIXED is the only LOD that ignores what is in the view.

Answer 26

The Order of Operations is the order filters are enacted. They are: 1) Extract Filters 2) Data Source Filters 3) Context Filters -- Sets, conditional filters, Top N, FIXED 4) Dimension Filters -- INCLUDE, EXCLUDE, Data Blending 5) Measure Filters -- Forecasts, Table Calcs, clusters, totals 6) Table Calculation Filters -- Trend lines, reference lines

Answer 27

For each mark in the view, a Moving Calculation table calculation (sometimes referred to as a rolling calculation) determines the value for a mark in the view by performing an aggregation (sum, average, minimum, or maximum) across a specified number of values before and/or after the current value. A moving calculation is typically used to smooth short-term fluctuations in your data so that you can see long-term trends. For example, with securities data there are so many fluctuations every day that it is hard to see the big picture through all the ups and downs. You can use a moving calculation to define a range of values to summarize using an aggregation of your choice.

Answer 28

For each mark in the view, a Percentile table calculation computes a percentile rank for each value in a partition. You can use a Percentile table calculation to rank the total sales for each month in a year as a percentage, rather than a whole number (for example, 1 through 10).

Answer 29

A "Difference From" table calculation. With a Difference From, Percent Difference From, or Percent From calculation, there are always two values to consider: the current value, and the value from which the difference should be calculated. In most cases, you want to calculate the difference between the current value and the previous value, as in the procedure above. But in some cases you may want something different.

Answer 30

For each mark in the view, a Percent of Total table calculation computes a value as a percentage of all values in the current partition. You can use a Percent of Total table calculation to calculate the percentage of total sales each month makes within a quarter.

Answer 31

RANK: Returns the standard competition rank for the current row in the partition. Identical values are assigned an identical rank. Use the optional 'asc' | 'desc' argument to specify ascending or descending order. The default is descending. RANK_DENSE: Returns the dense rank for the current row in the partition. Identical values are assigned an identical rank, but no gaps are inserted into the number sequence. RANK_MODIFIED: Returns the modified competition rank for the current row in the partition. Identical values are assigned an identical rank. RANK_PERCENTILE: Returns the percentile rank for the current row in the partition RANK_UNIQUE: Returns the unique rank for the current row in the partition. Identical values are assigned different ranks.

Answer 32

FIRST: Smallest number gets "1"; others get sequential negative numbers LAST: Biggest number gets the biggest number; smallest gets a "0" INDEX: Returns the index of the current row in the partition, without any sorting by value.

Answer 33

General Wildcard Condition Top

Answer 34

Range of Values At Least At Most Special: Select the Special option to filter on Null values. Include only Null values, Non-null values, or All Values.

Answer 35

To create a table calculation filter, create a calculated field, and then place that field on the Filters shelf. Filters based on table calculations do not filter out underlying data in the data set, because table calculation filters are applied last in the order of operations.

Answer 36

Yes, to apply a filter to all worksheets using a related primary data source: * On the Filters shelf, right-click the field * Select Apply to Worksheets > All Using Related Data Sources OR * Select Apply to Worksheets > All Using this Data Source

Answer 37

You may create a context filter to: * Force a filter to be carried out first * Create a dependent numerical or top N filter. You can set a context filter to include only the data of interest, and then set a numerical or a top N filter.

Answer 38

To create a context filter, select Add to Context from the context menu of an existing categorical filter.

Answer 39

Charcoal black

Answer 40

A parameter is a workbook variable such as a number, date, or string that can replace a constant value in a calculation, filter, or reference line. A parameter won't do anything until it's tied to an element in the viz. Parameters can be referenced in calculations, filters, and reference lines.

Answer 41

Parameter actions let your audience change a parameter value through direct interaction with a viz, such as clicking or selecting a mark. Parameter actions open up new possibilities for creating summary values and statistics without using calculations. You can configure parameter actions to let users select multiple marks that are automatically aggregated into a single parameter value.

Answer 42

Sets are custom fields that define a subset of data based on some conditions. You can choose to show them in two ways on a viz: 1) In/Out members 2) Show only members in the set

Answer 43

To give your audience the ability to quickly modify members of a set, you can also display a Set Control. A set control is a worksheet card that is very similar to a parameter control or filter card.

Answer 44

Yes, you can combine two sets to compare the members. When you combine sets you create a new set containing either the combination of all members, just the members that exist in both, or members that exist in one set but not the other. Example: To determine the percentage of customers who purchased both last year and this year, you can combine two sets containing the customers from each year and return only the customers that exist in both sets.

Answer 45

There are two types of sets: dynamic sets and fixed sets. The members of a dynamic set change when the underlying data changes. Dynamic sets can only be based on a single dimension. Fixed sets can be based on a single dimension or multiple dimensions.

Answer 46

It's a way of grouping values to better understand their distribution. A histogram is often used in this capacity.

Answer 47

1) In the Data pane, right-click (control-click on Mac) a measure and select Create > Bins. 2) Either enter a value in the Size of bins field or have Tableau calculate a value for you. 3) After you click OK to dismiss the Create Bins dialog box, a new binned field appears in the Dimensions area of the Data pane.

Answer 48

1) Click a [Field] (bin) dimension in the Data pane and choose Convert to continuous. 2) Drag the [Field] (bin) dimension from the Data pane and drop it on the Columns shelf. 3) Drag the original [Field] field from the Measures area of the Data pane and drop it on the Rows shelf. 4) Click SUM(Field) on Rows and change the aggregation from Sum to Count.

Answer 49

1) In the Data pane, drag a field and drop it directly on top of another field. 2) Drag additional fields into the hierarchy as needed. You can also re-order fields in the hierarchy by dragging them to a new position.

Answer 50

Groups are useful for both correcting data errors (e.g., combining CA, Calif., and California into one data point) as well as answering "what if" type questions (e.g., "What if we combined the East and West regions?).

Answer 51

There are multiple ways to create a group. * You can create a group from a field in the Data pane, or * By selecting data in the view and then clicking the group icon. When you create groups in Tableau, you have the option to group all remaining, or non-grouped members into an "Other" group.

Answer 52

* Proportional symbol maps * Choropleth maps (filled maps) * Point distribution maps * Density maps (heatmaps) * Flow maps (path maps) * Spider maps (origin-destination maps)

Answer 53

Choropleth maps are great for showing ratio data. Example: If you want to see obesity rates for every county across the United States, you might consider creating a choropleth map to see if you can spot any spatial trends.

Answer 54

Proportional symbol maps are great for showing quantitative data for individual locations. Example: You can plot earthquakes around the world and size them by magnitude.

Answer 55

Point distribution maps can be used when you want to show approximate locations and are looking for visual clusters of data. Example: If you want to see where all the hailstorms were in the U.S. last year, you can create a point distribution map to see if you can spot any clusters.

Answer 56

Density maps, also called heatmaps, can be used when you want to show a trend for visual clusters of data. Example: If you want to find out which areas of Manhattan have the most taxi pickups, you can create a density map to see which areas are most popular.

Answer 57

You can use flow maps to connect paths across a map and to see where something went over time. Example: You can track the paths of major storms across the world over a period of time.

Answer 58

You can use a spider map to show how an origin location and one or more destination locations interact. Example: You can connect paths between metro stations to plot them on a map, or you can track bike share rides from an origin to one or more destinations.

Answer 59

1) Click the Analytics pane. 2) In the Analytics pane, under Summarize, drag Totals into the Add Totals dialog, and drop it over either the Row Grand Totals or Column Grand Totals option. Row grand totals appear automatically on the right-side of the visualization. Column grand totals appear automatically at the bottom of the visualization.

Answer 60

When you first turn on grand totals, the totals are computed using disaggregated data in the underlying data source. The discrepancy is likely due to the fact that Tableau is averaging the data in the underlying data source. There may be more numbers involved than is in the view. To avoid this, do the following: 1) From the Analysis menu choose Totals > Total All Using > Average . Now the average is performed on the values you see, and not on the disaggregated date in the data source.

Answer 61

Reference line: a constant or computed value on the axis Reference band: Reference bands shade an area behind the marks in the view between two constant or computed values on the axis. Reference distribution: Reference distributions add a gradient of shading to indicate the distribution of values along the axis. Distribution can be defined by percentages, percentiles, quantiles (as in the following image), or standard deviation. Note: Reference distributions can also be used to create bullet charts.

Answer 62

To add a reference line: Drag "Reference Line" from the Analytics pane into the view. Tableau shows the possible destinations. The range of choices varies depending on the type of item and the current view (Table, Pane, Cell). if the view contains a line chart with multiple or dual axes—Tableau shows you an expanded drop target area with specific continuous fields. If you drop onto those fields, Tableau will the line is added on the corresponding axis, with the specified scope.

Answer 63

A bullet graph is a variation of a bar graph developed to replace dashboard gauges and meters. The bullet graph is generally used to compare a primary measure to one or more other measures in the context of qualitative ranges of performance such as poor, satisfactory, and good. You can create a bullet graph by adding a distribution to indicate the qualitative ranges of performance, and a line to indicate the target. Show Me is the easiest way to make these, but you can do it manually.

Answer 64

Percentages: shades intervals at the specified percentiles Quantiles: breaks the view into the specified number of tiles using shading and lines Standard Deviation: places lines and shading to indicated the specified number of standard deviations above and below the mean

Answer 65

1) Right-click (Control-click on a Mac) on a quantitative axis and select Add Reference Line. 2) In the Add Reference Line, Band, or Box dialog box, select Box Plot. 3) Under Plot Options, specify placement for the whiskers. You can choose either: * "Data within 1.5 times the IQR" - places whiskers at a location that is 1.5 times the interquartile range—that is, 1.5 times further out than the width of the adjoining box. This is also known as a schematic box plot. * "Maximum extent of the data" - places whiskers at the farthest data point (mark) in the distribution. This is also known as a skeletal box plot. 4) Specify whether to Hide underlying marks (except outliers)—that is, whether to hide all marks except those beyond the whiskers. 5) Configure the appearance of the plot by selecting a Style , Fill, Border, and Whiskers.

Answer 66

Linear: Used when something is increasing or decreasing at a steady rate. Logarithmic: Best for data that increases or decreases quickly and then levels out. Exponential: Most useful when data values rise or fall at increasingly higher rates Power: Best used with data sets that compare measurements that increase at a specific rate Polynomial: Used to represent data that fluctuates, such as gains and losses over a large data set.

Answer 67

To add a trend line to a visualization: 1) Select the Analytics pane. 2) From the Analytics pane, drag Trend Line into the view, and then drop it on the Linear, Logarithmic, Exponential, Polynomial, or Power model types.

Answer 68

To create a forecast, your view must use at least one date dimension and one measure. To turn forecasting on, right-click (control-click on Mac) on the visualization and either choose: * Forecast >Show Forecast, or * Choose Analysis >Forecast >Show Forecast.

Answer 69

* Forecast length: Determines how far into the future the forecast extends. * Source data: Use the Source Data section to specify time aggregation granularity, periods to ignore, and replacing nulls with zeroes. * Forecast model: Specifies how the forecast model is to be produced. * Prediction interval: You can set the prediction interval to 90, 95, or 99 percent, or enter a custom value.

Answer 70

Predictive modeling functions in Tableau by default use linear regression to build predictive models and generate predictions about your data

Answer 71

MODEL_PERCENTILE tells you, as a percentile, where the observed mark falls within a range of probable values for each mark. If the percentile is very close to 0.5, the value observed is very close to the median value predicted. If the percentile is close to 0 or 1, the value observed is at the lower or upper boundaries of the model range and is relatively unexpected. You can use MODEL_QUANTILE to generate a confidence interval, missing values such as future dates, or to generate categories that don't exist in your underlying data set. These can be used to identify outliers, estimate values for sparse or missing data, and predict values for future time periods.

Answer 72

MODEL_QUANTILE(0.5,SUM([Sales]),ATTR(DATETRUNC('month',[OrderDate]))) * MODEL_QUANTILE - This is the predictive modeling function. It specifies a line to be drawn * (0.5, - Defines the quantile as the median * SUM([Sales]), - This is the target expression * ATTR(DATETRUNC('month',[OrderDate]))) - this is predictor expression Note: The ATTR was needed to make sure that all functions are aggregations.

Answer 73

Linear regression (default): Use when you have only one predictor, and that predictor has a linear relationship with your target metric. Regularized linear regression: Use when you have multiple predictors, especially when those predictors have a linear relationship to the target metric and those predictors are likely affected by similar underlying relationships or trends. Gaussian process regression: Use when you have time or space predictors, or when you're using predictors that might not have a linear relationship with the target metric.

Domain 2: Explore and Analyze Data (41%) Flashcards

(99 cards)