Planning Your Data & Creating a Dataset Flashcards
Identify Your Data Requirements
Think about the data you need, where it’s located, and if it needs to be combined with other data.
Do the fields all come from the same object in Salesforce? Or multiple?
Do the fields all come from Salesforce or also other sources?
Map the Data Journey
- What route will the data take and when will it start?
- Timing is important if you have data arriving from various sources that needs to be combined
- If data isn’t available at the right time, it can’t be combined
- Tableau CRM lets you schedule your extracts and preparations so you extract the data when its freshest and have it when you need it
- Mapping the journey breaks the process down into a series of steps
EXAMPLE
- Use CSV Uploader to extract data from a CSV file
- Use dataflow to extract data from Salesforce objects
- Join the Salesforce objects into a single dataset
- Use a recipe to join the CSV file with the dataset
Extraction
- Extraction is the process of bringing data into Tableau CRM
- There are various ways (CSV Uploader, connectors, Tableau CRM API), dataflow (for Salesforce data only)
- Use the Salesforce Connector to sync data from your local Salesforce org
- Use the Salesforce External Connector to sync data from a remote salesforce org
- Create a remote connection to sync external data with Tableau CRM
- For each connector, there is information on how to create the connection, configure its properties, and track important considerations
Preparation
Involves getting the data into a form that’s meaningful to the people exploring it.
DATAFLOW
- For Salesforce data only
- Tableau CRM can sync supported local Salesforce data incrementally (only data that’s changed gets synced)
DATA PREP
- Dataset recipe tool
- Takes data from existing datasets, prepares it, and outputs the results to a new dataset
DATA SYNC
- Use this to decouple the extract of data from recipes + data flows
- Sync this data to Tableau CRM on a separate schedule
- This makes recipes and dataflows faster
Dataflows
- Dataflows are used to extract data from Salesforce objects
- The dataflow is a set of instructions in Javscript Object Notation (JSON) that runs to extract data + create datasets
- The instructions specify which objects and fields you want to extract data from + the names of datasets you want to create; they also join data together
- Tableau CRM has tools to write the instructions for you, so you don’t really need to know JSON
- A dataflow can create lots of datasets from different objects at the same time
- You can schedule it to run regularly to keep datasets up to date
- If the dataset is already in use, make a backup before you add new instructions
Dataset Builder
- Dataset builder generates the JSON instructions needed to build a dataset and adds the instructions to your dataflow
- In Analytics Studio, click Create and click Dataset
- Click Salesforce Data and enter the dataset name
- Select the appropriate dataflow and click Next
- Select a root object which will appear on the dataset builder canvas
- Hover over the object and click the plus sign to add the fields you need
- Click the relationships tab to join with additional objects
- Select an app for your dataset
- Click Create Dataset
- See how the run is doing in Data Manager
Root Object
- When you create a dataset, you are creating rows and Tableau CRM first asks you for the root object
- The root object is the lowest object in the hierarchy of objects you’re extracting
- The root is the grain of a dataset, the unit of data in each row
Ex) If you select Account as the root object, you can include related objects such as User and Parent Account, but not Opportunity because it’s lower than Account
- In Opportunity, each row is an opportunity, so the opportunity record is the grain
Data Prep
- Data Prep is a user interface tool to create dataset recipes that takes data from existing dataset, prepare it, and output the results to a new dataset
- Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data through transformation
- Remove and filter rows you don’t need - When you create a recipe, you specify the transformations, or steps, that you want to perform on a source
- The source can be one or more datasets or connected objects
- When you run the recipe, it applies these transformations and outputs the results to a new target dataset
- To keep your target dataset up to date, you can schedule a recipe to run on a regular basis
Handle Date Columns
When Tableau CRM loads dates into a dataset, it breaks up each date into multiple columns such as day, week, month, quarter, and year, based on the calendar year.
If your fiscal year differs from the calendar year, you can enable Tableau CRM to generate fiscal date columns as well.
Data Sync with Dataflows
Data Sync makes your dataflows run faster.
Without it, a dataflow performs a separate extract each time it needs data from a Salesforce object; the more data there is, the longer it takes to run. For multiple dataflows, they perform separate, duplicate, extracts from the same object.
With data sync, these extracts are performed as a separate process, which you can schedule to happen before your dataflows run.
This synced data is then available to all your dataflows, which run faster bc they no longer have to extract any data, just load and transform.
Avoid Data Drift with Periodic Full Sync
When you enable data sync, Tableau CRM syncs local Salesforce data incrementally
Records are inserted, updated, or deleted to match changes since the previous sync
Data can become out of sync over time, esp for formula fields or permanently deleted records
Use periodic full sync to keep your synced objects up to date