Domain 1: Connect To and Transform Data (24%) Flashcards
In what two places in the Tableau ecosystem can Tableau data sources reside?
1) Embedded in a workbook (.twbx)
2) As a standalone data source (.tds). Published Data Sources allow for centralization and scaling of curated data sources
What are the seven items to consider before you build a data source in Tableau?
1) Location and access
2) Shape and cleanliness
3) The data model and combining data
4) Metadata and customization
5) Scalability, security, and discoverability
6) Performance and freshness
What are the four main purposes of a metadata layer in a Tableau data source?
1) Calculated fields to be added
2) Make table and field names understandable
3) Set the default settings for the fields
4) Adjustments that should be made to support Tableau Catalog
Can an embedded data source be included in Tableau Catalog?
No, only published data sources are included.
How do Tableau Relationships work?
They establish the POTENTIAL for joins between two tables based on related fields. They do not merge data together to create a new, fixed table. During analysis, Tableau will query the relevant tables automatically using the contextually-appropriate joins to generate a custom table of data for that analysis.
Maintains the appropriate level of detail, does not lose data, keeps appropriate aggregations, and handles nulls.
How do unions work?
Merge two or more tables of data to form a new, fixed table of data. Used to append (add new rows of) data across the same basic column structure.
How do joins work?
They merge two tables of data based on a join clause and join type to form a new, fixed table of data. Often used to add new columns of data across the same basic row structure.
May cause data loss with some join types if fields are not present in all tables. May cause data duplication if tables are at different levels of detail.
How do blends work?
Work across two or more separate Tableau data sources. Data remains separate. Tableau queries the data sources independently and visualizes the results together in the view, based on the linking fields established for that sheet. Mimics the behavior of a left join and may filter data from secondary data sources.
Can you use custom SQL in Tableau?
Yes, it’s possible to combine data with Custom SQL in Tableau. Most of the time, it’s best to combine data directly in the canvas or with data blending. This allows Tableau the greatest flexibility for optimization.
However, in certain scenarios, there may be reasons to use Custom SQL. Keep in mind doing so can have performance implications because Tableau is forced to execute the Custom SQL statement as written.
When are Relationships a better choice, vs. a Join or Blend?
Relationships are useful when the data is at different levels of detail, or granularity. Relationships won’t duplicate data stored at different levels of detail.
While joins can filter data, relationships always preserve all measures.
When is a Join a better choice, vs. Relationships or Blends?
When you want to deliberately filter or duplicate data (as in, when two tables are at different levels of granularity).
When is a Blend a better choice, vs. Joins or Relationships?
If you need to relate two different Tableau data sources (.tds) or cubes in the same viz. Blends are the only way to do this.
Blends query each data source independently, aggregate the results to the appropriate level, and then present the results together visually in the view. Because of this, blends can handle different levels of detail and also work with published data sources.
Blends are resource-intensive, so are not recommended for large datasets.
How do you make a Blend?
To create a blend in a workbook, you need to connect to at least two data sources.
Then bring a field from one data source to the sheet—it becomes the primary data source (blue check mark appears). Switch to the other data source and use a field on the same sheet—it becomes a secondary data source (orange check mark appears).
A linking icon will appear in the data pane, indicating which field(s) are being used to blend the data sources.
What type of join does a blend simulate?
Data blending simulates a traditional left join.
The main difference between the two is when the aggregation is performed. A join combines the data and then aggregates. A blend aggregates and then combines the data.
Can you union two tables with different columns?
For best results, the tables that you combine using a union must have the same structure. That is, each table must have the same number of fields, and related fields must have matching field names and data types.
The result of combining data using a union is a virtual table that has the same columns but extends vertically by adding rows of data.
How do you create a union in Tableau Desktop?
On the data source page, double-click New Union to set up the union. This option appears when you bring in at least two data connections that match.
What exactly are extracts?
Tableau Data Extracts are snapshots of data optimized for aggregation and loaded into system memory to be quickly recalled for visualization.
Extracts tend to be much faster than live connections, especially in more complex visualizations with large data sets, filters, calculations, etc.
What are some factors to be considered when using a live connection?
With live connections, your data queries are only as fast as the database itself.
Workbook speeds are affected by a variety of factors, including your network speed, traffic on that network, and any custom SQL.
Name the main live connections available on Tableau Cloud
Tableau Online currently supports live connections to the following cloud-hosted data sources:
Amazon Redshift
Google
Hive and Impala on Amazon Elastic MapReduce
HP Vertica
Microsoft AZzure and SQL Server
MySQL
PostgreSQL
SAP HANA
Spark SQL
Snowflake
Why bother making published data sources?
With published data sources, users don’t need to establish connections to databases themselves. Publishing a data source to Tableau Online also captures any metadata you’ve built in Tableau Desktop:
* Calculated fields
* Groups
* Sets
* Hierarchies
* Etc
When would you use logical tables, vs. physical tables?
If you want to limit the amount of data in your extract and use additional extract properties like filters, aggregation, or Top N, you should select Logical Tables.
This option also works well when your data includes pass-through functions (RAWSQL).
Tableau uses Logical Tables as the default structure for storing extract data.
If you choose this option and your extract includes joins, the joins will be applied when the extract is created.
When would you use physical tables, vs. logical tables?
you should select Physical Tables if your extract consists of tables combined with equality joins and meets the Conditions for using the Physical Tables option.
This option performs joins at query time and can potentially improve performance and reduce the size of the extract file.
When the extract is stored as physical tables, you can’t append data to it.
Can you still use .tde for extracts, vs. .hyper?
Beginning with version 10.5, when you create a new extract it uses the .hyper format. Extracts in the .hyper format take advantage of the improved data engine, which supports faster analytical and query performance for larger data sets.
Similarly, when an extract-related task is performed on a .tde extract using version 10.5 and later, the extract is upgraded to a .hyper extract.
After a .tde extract is upgraded to a .hyper extract, it can’t be reverted back to .tde extract.
How can Data Interpreter help to clean Excel files?
Data Interpreter can give you a head start when cleaning your data. It can detect things like titles, notes, footers, empty cells, and so on and bypass them to identify the actual fields and values in your data set.
It can also detect additional tables and sub-tables so that you can work with a subset of your data independently of the other data.
What are .ttde or .hhyper files?
You might notice .ttde or .hhyper files when navigating your computer’s directory. When you create a Tableau data source that connects to your data, Tableau creates a .ttde or .hhyper file. This file, also known as a shadow extract, is used to help improve the speed your data source loads in Tableau Desktop.
Although a shadow extract contains underlying data and other information similar to the standard Tableau extract, a shadow extract is saved in a different format and can’t be used to recover your data.
Can custom SQL slow down or speed up performance within Tableau?
Using custom SQL can slow the performance of a workbook.
In order to perform the operations necessary for building views in Tableau Desktop, Tableau must be able to control WHERE, GROUP BY and other SQL clauses. Because a custom SQL query may contain those elements, and Tableau cannot inject them into the existing SQL, Tableau must wrap the custom SQL statement within a select statement.
When a custom SQL connection is performing slowly, or produces an error, it is often the result of the custom SQL wrapping that Tableau Desktop performs.
Why would a user want to use custom SQL?
You can use custom SQL to union your data across tables, recast fields to perform cross-database joins, restructure or reduce the size of your data for analysis, etc.