Domain 1: Connect To and Transform Data (24%) Flashcards

1
Q

In what two places in the Tableau ecosystem can Tableau data sources reside?

A

1) Embedded in a workbook (.twbx)
2) As a standalone data source (.tds). Published Data Sources allow for centralization and scaling of curated data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the seven items to consider before you build a data source in Tableau?

A

1) Location and access
2) Shape and cleanliness
3) The data model and combining data
4) Metadata and customization
5) Scalability, security, and discoverability
6) Performance and freshness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four main purposes of a metadata layer in a Tableau data source?

A

1) Calculated fields to be added
2) Make table and field names understandable
3) Set the default settings for the fields
4) Adjustments that should be made to support Tableau Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can an embedded data source be included in Tableau Catalog?

A

No, only published data sources are included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do Tableau Relationships work?

A

They establish the POTENTIAL for joins between two tables based on related fields. They do not merge data together to create a new, fixed table. During analysis, Tableau will query the relevant tables automatically using the contextually-appropriate joins to generate a custom table of data for that analysis.

Maintains the appropriate level of detail, does not lose data, keeps appropriate aggregations, and handles nulls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do unions work?

A

Merge two or more tables of data to form a new, fixed table of data. Used to append (add new rows of) data across the same basic column structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do joins work?

A

They merge two tables of data based on a join clause and join type to form a new, fixed table of data. Often used to add new columns of data across the same basic row structure.

May cause data loss with some join types if fields are not present in all tables. May cause data duplication if tables are at different levels of detail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do blends work?

A

Work across two or more separate Tableau data sources. Data remains separate. Tableau queries the data sources independently and visualizes the results together in the view, based on the linking fields established for that sheet. Mimics the behavior of a left join and may filter data from secondary data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can you use custom SQL in Tableau?

A

Yes, it’s possible to combine data with Custom SQL in Tableau. Most of the time, it’s best to combine data directly in the canvas or with data blending. This allows Tableau the greatest flexibility for optimization.

However, in certain scenarios, there may be reasons to use Custom SQL. Keep in mind doing so can have performance implications because Tableau is forced to execute the Custom SQL statement as written.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When are Relationships a better choice, vs. a Join or Blend?

A

Relationships are useful when the data is at different levels of detail, or granularity. Relationships won’t duplicate data stored at different levels of detail.

While joins can filter data, relationships always preserve all measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is a Join a better choice, vs. Relationships or Blends?

A

When you want to deliberately filter or duplicate data (as in, when two tables are at different levels of granularity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is a Blend a better choice, vs. Joins or Relationships?

A

If you need to relate two different Tableau data sources (.tds) or cubes in the same viz. Blends are the only way to do this.

Blends query each data source independently, aggregate the results to the appropriate level, and then present the results together visually in the view. Because of this, blends can handle different levels of detail and also work with published data sources.

Blends are resource-intensive, so are not recommended for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you make a Blend?

A

To create a blend in a workbook, you need to connect to at least two data sources.

Then bring a field from one data source to the sheet—it becomes the primary data source (blue check mark appears). Switch to the other data source and use a field on the same sheet—it becomes a secondary data source (orange check mark appears).

A linking icon will appear in the data pane, indicating which field(s) are being used to blend the data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of join does a blend simulate?

A

Data blending simulates a traditional left join.

The main difference between the two is when the aggregation is performed. A join combines the data and then aggregates. A blend aggregates and then combines the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can you union two tables with different columns?

A

For best results, the tables that you combine using a union must have the same structure. That is, each table must have the same number of fields, and related fields must have matching field names and data types.

The result of combining data using a union is a virtual table that has the same columns but extends vertically by adding rows of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you create a union in Tableau Desktop?

A

On the data source page, double-click New Union to set up the union. This option appears when you bring in at least two data connections that match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What exactly are extracts?

A

Tableau Data Extracts are snapshots of data optimized for aggregation and loaded into system memory to be quickly recalled for visualization.

Extracts tend to be much faster than live connections, especially in more complex visualizations with large data sets, filters, calculations, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some factors to be considered when using a live connection?

A

With live connections, your data queries are only as fast as the database itself.

Workbook speeds are affected by a variety of factors, including your network speed, traffic on that network, and any custom SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Name the main live connections available on Tableau Cloud

A

Tableau Online currently supports live connections to the following cloud-hosted data sources:

Amazon Redshift
Google
Hive and Impala on Amazon Elastic MapReduce
HP Vertica
Microsoft AZzure and SQL Server
MySQL
PostgreSQL
SAP HANA
Spark SQL
Snowflake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why bother making published data sources?

A

With published data sources, users don’t need to establish connections to databases themselves. Publishing a data source to Tableau Online also captures any metadata you’ve built in Tableau Desktop:
* Calculated fields
* Groups
* Sets
* Hierarchies
* Etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When would you use logical tables, vs. physical tables?

A

If you want to limit the amount of data in your extract and use additional extract properties like filters, aggregation, or Top N, you should select Logical Tables.

This option also works well when your data includes pass-through functions (RAWSQL).

Tableau uses Logical Tables as the default structure for storing extract data.

If you choose this option and your extract includes joins, the joins will be applied when the extract is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When would you use physical tables, vs. logical tables?

A

you should select Physical Tables if your extract consists of tables combined with equality joins and meets the Conditions for using the Physical Tables option.

This option performs joins at query time and can potentially improve performance and reduce the size of the extract file.

When the extract is stored as physical tables, you can’t append data to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Can you still use .tde for extracts, vs. .hyper?

A

Beginning with version 10.5, when you create a new extract it uses the .hyper format. Extracts in the .hyper format take advantage of the improved data engine, which supports faster analytical and query performance for larger data sets.

Similarly, when an extract-related task is performed on a .tde extract using version 10.5 and later, the extract is upgraded to a .hyper extract.

After a .tde extract is upgraded to a .hyper extract, it can’t be reverted back to .tde extract.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can Data Interpreter help to clean Excel files?

A

Data Interpreter can give you a head start when cleaning your data. It can detect things like titles, notes, footers, empty cells, and so on and bypass them to identify the actual fields and values in your data set.

It can also detect additional tables and sub-tables so that you can work with a subset of your data independently of the other data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are .ttde or .hhyper files?

A

You might notice .ttde or .hhyper files when navigating your computer’s directory. When you create a Tableau data source that connects to your data, Tableau creates a .ttde or .hhyper file. This file, also known as a shadow extract, is used to help improve the speed your data source loads in Tableau Desktop.

Although a shadow extract contains underlying data and other information similar to the standard Tableau extract, a shadow extract is saved in a different format and can’t be used to recover your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Can custom SQL slow down or speed up performance within Tableau?

A

Using custom SQL can slow the performance of a workbook.

In order to perform the operations necessary for building views in Tableau Desktop, Tableau must be able to control WHERE, GROUP BY and other SQL clauses. Because a custom SQL query may contain those elements, and Tableau cannot inject them into the existing SQL, Tableau must wrap the custom SQL statement within a select statement.

When a custom SQL connection is performing slowly, or produces an error, it is often the result of the custom SQL wrapping that Tableau Desktop performs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Why would a user want to use custom SQL?

A

You can use custom SQL to union your data across tables, recast fields to perform cross-database joins, restructure or reduce the size of your data for analysis, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you connect to custom SQL?

A

After connecting to your data, double-click the “New Custom SQL” option on the Data Source page. Then, type or paste the query into the text box. The query must be a single SELECT* statement.

29
Q

Name compatible custom SQL statements with Tableau

A

SELECT = Retrieve data
FROM = Data source
WHERE = Filter
UNION ALL = Union data
CAST…AS = Change data type
GROUP BY = Group
LEFT JOIN…ON = Form a left join on a given key field

30
Q

Name the steps to replace a data source within a workbook.

A

Note: You must have at least one field in the view to make the Replace Data Source option available.

1) Open a workbook that connects to the original data source.

2) Select Data > New Data Source and then connect to the new data source.

3) On the Data Source page, drag a table to the canvas to set up the data source (if this isn’t automatically done for you).

4) Go to the sheet tab and select Data > Replace Data Source.

5) In the Replace Data Source dialog box, select the Current data source and the Replacement data source.

31
Q

How do Data Roles in Prep help you clean data?

A

Data roles help you set up expectations about a data field, so Prep can do the heavy lifting of analyzing the data quality.

Tableau Prep recognizes a set of real-world entities including the eight geographic roles Tableau Desktop knows about, as well as email addresses and URLs.

32
Q

Besides the standard Clean steps, how can you clean data using Prep?

A

You can also clean your data using a pivot step or a script step to apply R or Python scripts to your flow.

Note: Script steps aren’t supported in Tableau Cloud yet.

33
Q

What Clean steps are offered within Prep?

A

Filtering
Adding
Renaming
Splitting
Grouping
Removing fields

34
Q

In Prep, can you do Clean operations in the Output step?

A

No

35
Q

In Prep, can you do Clean operations in the Input step? If so, which ones?

A

Hide Field

Filter

Rename Field

Change Data Type (this option is currently supported for Microsoft Excel, text and PDF files, Box, Dropbox, Google Drive, and OneDrive data sources. All other data sources can be changed in a clean step)

36
Q

Prep can make cleaning recommendations in all steps except Input, Output, and Join steps. How do you accept the recommendations?

A

Do one of the following:

  • Click the light bulb icon in the top right corner of the profile card.
  • From the toolbar, click the Recommendations drop-down arrow to view all recommendations for your data set and select a recommendation from the list.

This option only appears when recommended changes are identified by Tableau Prep.

37
Q

In Prep, how do you add values?

A

To add a new value you need to create a Group:

1) In the Profile pane or Results pane, select the field you want to edit.

2) Click More options and select Group Values (Group and Replace in previous versions) > Manual Selection from the context menu.

3) In the left pane of the Group Values editor, click the “plus” to add a new value.

4) Type a new value in the field and press Enter to add it.

38
Q

Can you organize the Data Pane in Desktop? If so, how?

A

You can group by Folder and by Table.
* Grouping by folder lets you manually decide how to organize fields.
* Grouping by table puts dimensions and measures together under their table. This structure is useful when fields with similar names appear in multiple tables.

39
Q

In Desktop, how do you group fields in the Data Pane by Folder?

A

After you set the Group by option to Folder, you must create your folders.

1) In the Data pane, select the fields you want to group together and right-click (or right-click an empty area in the Data pane).

2) Select Folders > Create Folder.

3) When prompted, name the new folder.

39
Q

How do you filter fields in the Data Pane of Desktop?

A

You can filter the Data pane to only show certain types of fields, or by text.

Select the Filter by icon next to the search box to show the clickable list of options: dimensions, measures, calculations, or fields with comments.
Clear the filter using the X in the search box.

40
Q

How do you change the name of a field in the Data Pane?

A

Click field name in the Data pane and hold the mouse button down until the field name is shown in an edit box. Simply type the new name and hit Enter.

41
Q

Can you combine fields from within Desktop?

A

Yes. You might want to combine fields to create a cross-product of members from different dimensions.

To combine the fields, select multiple dimensions in the Data pane and then right-click (control-click on a Mac) the fields and select Create > Combined Field.

42
Q

Within Desktop, can you always pivot a data source?

A

When working with Microsoft Excel, text file, Google Sheets, and .pdf data sources, you can pivot your data from crosstab format into columnar format.

But if you are working with other data sources, you must Pivot using custom SQL.

43
Q

How do you pivot values when working on compatible data sources?

A

After you have set up the data source, in the grid, select two or more columns.

Click the drop-down arrow next to the column name, and then select Pivot.

New columns called “Pivot field names” and “Pivot field values” are created and added to the data source. The new columns replace the original columns that you selected to create the pivot.

44
Q

When using custom SQL to pivot tables, which SQL statement is used?

A

When you use the UNION ALL operator in a custom SQL query, you can take values from distinct columns and put them into a new column.

45
Q

If you need to split a field, other than using the Split function in the data source window, how can you do the split?

A

Split manually using the SPLIT function in a calculated field(s)

Split manually using LEFT and RIGHT functions

Use regular expressions for fields containing mixed separators

46
Q

Does web edit support Split functions from within the data source window?

A

No, this is only in the Desktop application.

47
Q

Where do you go to get a history of extracts taken for a data source?

A

You can see when the extract was last updated and other details by selecting a data source on the Data menu and then selecting Extract > History.

48
Q

What is Granularity, as it relates to data sources?

A

Granularity refers to how detailed the data is. What does a row or record in the data set represent?

Knowing the granularity of the data is crucial to working with level of detail (LOD) expressions.

49
Q

What is Domain, as it relates to data sources?

A

Domain means the values present in the data. What values are allowed in a given field (or, FIT into a field) are determined by the domain of the field.

Example:
A well-structured data set would have a column for “Sales” and a column for “Profit”, not a single column for “Money”, because profit is a separate concept from sales.
* The domain of the Sales field would be values ≥ 0, since sales cannot be negative.
* The domain of the Profit field, however, would be all values, since profit can be negative.

50
Q

In Desktop:
What color are Discrete Measures?
What color are Continuous Measures?

A

Discrete - Blue
Continuous - Green

51
Q

What do histograms measure?

A

Histograms are used to visualize the distribution of numerical data using binning.

A histogram is similar to a bar chart, but rather than being discrete categories per bar, the rectangles making up the histogram span a bin of a continuous axis

52
Q

Does Tableau prefer wide or tall data sources?

A

Tableau prefers tall data sources because they are far easier to not only comprehend, but to aggregate.

53
Q

Why is having a data source with two tables joined preferable to a wide data source (in most cases)?

A

Normalization, like when joining two tables with a join, helps minimize duplication of data. It also provides flexibility of analysis.

54
Q

Name the join types supported by Tableau Prep

A

Left
Inner
Right
LeftOnly
RightOnly
NotInner
Full
==============
Left: For each row, includes all values from the left table and corresponding matches from the right table. When a value in the left table doesn’t have a corresponding match in the right table, you see a null value in the join results.

lnner: For each row, includes values that have matches in both tables.

Right: For each row, includes all values from the right table and corresponding matches from the left table. When a value in the right table doesn’t have a corresponding match in the left table, you see a null value in the join results.

leftOnly: For each row, includes only values from the left table that don’t match any values from the right table. Field values from the right table show as null values in the join results.

rightOnly: For each row, includes only values from the right table that don’t match any values from the left table. Field values from the left table show as null values in the join results.

notInner: For each row, includes all of the values from the right and the left table that don’t match.

Full: For each row, includes all values from both tables. When a value from either table doesn’t have a match with the other table, you see a null value in the join results.

55
Q

Using Prep, why would you want to adjust the granularity of the data?

A

To reduce the amount of data produced from the flow, or to align data with other data you might want to join or union together.

56
Q

What Default Properties can be changed in Desktop?

A

The Default Properties menu includes default settings for:
* Aggregation
* Comments
* Number formatting
* Color
* Shape
* Totals (based on the type of field)

57
Q

What is a stored procedure?

A

A stored procedure is a subroutine available to applications that access a relational database system.

When you connect to a SAP Sybase ASE, Microsoft SQL Server, or Teradata database with Tableau, you can use a stored procedure to define the connection.

58
Q

Can you automatically sort your fields?

A

Yes, this is done in the Data Source window.

59
Q

Name the three types of calculated fields

A
  • Basic expressions
  • Level of Detail (LOD) expressions
  • Table calculations
60
Q

Should you use a table calculation if the data needed is not on the view?

A

No, table calculations exclusively use data present on the visualization.

61
Q

If the granularity of your data question does not match either the granularity of the visualization or the granularity of the data source, then what type of calculation must you use?

A

Level of Detail (LOD) calculation

62
Q

In what situations will table calculations always be used?

A

If your analysis requires any of the following scenarios, use a table calculation.
* Ranking
* Recursion (e.g. cumulative totals)
* Moving calculations (e.g. rolling averages)
* Inter-row calculations (e.g. period vs. period calculations)

63
Q

Does Tableau only show aggregations?

A

If you decide you want to see all the marks in the view at the most detailed level of granularity, you can disaggregate the view. Disaggregating your data means that Tableau displays a separate mark for every data value in every row of your data source.

Disaggregating your data can be useful for analyzing measures that you may want to use both independently and dependently in the view. A scatterplot is a good example of when you would desire this.

64
Q

What is the only Mark on the Marks card that does not affect the level of detail?

A

The Tooltip mark

65
Q

What do continuous fields create when added to a viz?
What do distinct fields create?

A

Continuous fields produce axes

Discrete fields create headers

66
Q

What do Aliases do?

A

You can create aliases (alternate names) for members in a dimension so that their labels appear differently in the view.

Aliases can be created for the members of discrete dimensions only. They can’t be created for continuous dimensions, dates, or measures. Tableau doesn’t permit re-aliasing measures because it involves modifying data values themselves.

67
Q

When is the alias option not available?

A

When using a published data source, you can’t create or edit aliases.
You can only change aliases on fields that you create in your workbook.

If you didn’t publish the data source, you can duplicate the field (right click > Duplicate) and edit aliases for the duplicate field without restriction.

68
Q
A