Einstein Discovery Data Prep and Create Stories Flashcards by Connor Waskom

You can remedy data issues in 2 ways:

Fix issue in CRM Analytics dataset using data prep tools.

Correcting issue in story using story settings. Story fixes don’t affect data in the dataset.

How well did you know this?

Not at all

Perfectly

Data Prep Terminology

Variables - Category of data. Columns in dataset.
Observations - Row Value in Dataset
Data Type - Numerical, Categorical (text), Date

How well did you know this?

Not at all

Perfectly

Max # Observations in datset

20M

How well did you know this?

Not at all

Perfectly

What is a Story

A story contains answers, explanations, predictions, and suggested actions that arranged into an organized presentation with logical flow and related sections. The story is filled with insights about your data as they relate to the outcome you’re interested in. Einstein Discovery walks you through what has happened and why, what has changed, what is likely to happen, and what you can do about it.

How well did you know this?

Not at all

Perfectly

Two Types of Stories

Insights Only - only descriptive

Insights and Predictions - all insight types.

How well did you know this?

Not at all

Perfectly

2 ways to create a story

Dataset or Template

How well did you know this?

Not at all

Perfectly

Ways to create a Story from a Dataset

Create -> Create from dataset.
While viewing a lens.
From dataset dropdown.

How well did you know this?

Not at all

Perfectly

Stories and Security Predicates

All users who access the story can see the results of the story. They don’t need the same row-level access as the story creator.

How well did you know this?

Not at all

Perfectly

What data in a dataset is a story based on?

A snapshot of the data. Initial data snapshot taken when story is created. If data has changed in source dataset, users with sufficient privileges can refresh story based on most recent data. Otherwise, subsequent changes to the story do not affect the snapshot, and subsequent changes to the dataset are ignored.

How well did you know this?

Not at all

Perfectly

Occurences

performs an extensive query analysis of dataset values by calculating the number of times a value occurs in a column, including interactions with other columns. For example, the color red occurs 30% of the time in an Automobile dataset, of those rows the most frequent body type is coupe.

How well did you know this?

Not at all

Perfectly

What does template overview provide?

Description, List of Supported Objects, Sample Insights

How well did you know this?

Not at all

Perfectly

Issue: Story concurrency limits exceeded

No more than two stories can be created concurrently

How well did you know this?

Not at all

Perfectly

Dataflow run limits exceeded

During app creation, story templates runs a dataflow twice - create a dataset used to train the predictive model, and use the predictive model to generate prediction scores and rite back to crm.

If you exceel the max number of dataflow runs in your ord in a 24 hours period

How well did you know this?

Not at all

Perfectly

Data Sync-related limits exceeded

Story templates can add objects to Data Sync. If org already has created the max number of data sync objects, will fail.

How well did you know this?

Not at all

Perfectly

Daya Sync-related errors

If app creation triggers data sync-related errors, address them in data manager before trying to create the template again.

How well did you know this?

Not at all

Perfectly

Elements of a story interface

Story Headline - Name of story, goal, most recent version
Story toolbar
Variables Panel - list of explanetory variables and their correlation to outcome
Story Version summary - summary of insights, version comparison
insight Summary Panels - List of variables, ordered by correlation, that positively or negatively impact a story

How well did you know this?

Not at all

Perfectly

What does a story headline contain?

The basis of the story.
Story Name
Version Update
Story Goal
Story Version

How well did you know this?

Not at all

Perfectly

What does the story version summary contain?

Study These Flashcards

Goal
Row Couunt (# obs in analysis)
# Change in Row Count from previous version
Outcome Avg
% Change in outcome avg from previous version
What changed between versions

How to Edit Story

Study These Flashcards

Open Story
click Edit Story

Can change columns, update story to latest dataset change

Use correlation column to see how much each field contributed to the outcome. remove columns that have little to no impact.

What column contains fields that you can improve, such as fields with outliers or duplicates?

Study These Flashcards

Data Alert Column

What can you edit in the general settings tab?

Study These Flashcards

Analysis Type (insights or insights & preds)

Algorithm (GLM, GBM, XGBoost, random Forest)
- select Model Tournament to have ED run all algorithms and show the results of algorithm that performed best

Validation Type -

Training/Validation Ratio
Validation Dataset (can specify crm dataset). Will only see datasets that match the schema of your story’s datset.

None (default) - uses only k-fold validation.

Configure Number Variables

Study These Flashcards

change settings for individual numbers in your story.
On Story settings, click number field. Can:

analyze for bias (select to exclude a variable from the model. A SHIELD icon will appear next to the title of the insight to remind you it’s a sensitive variable)

Transform - Replace missing values, projected predictions

Bucket Values by (count, width, manual)

Number of buckets - specify number of buckets to show in charts

Include only – adds min and max values to starting values and ending value fields

Preview - Graph shows number of values that occur across the range of number ranges.

What are projected predictions?

Study These Flashcards

Providing trending data for numeric variables that factor into your predictions to make them more accurate

Configure Projected Predictions

Study These Flashcards

Provide dataset that contains trend data.
Tell story data about the dataset:
Unique columns identifier
Variable column (maps to selected variable in story)
time interval column
time interval number of intervals to project ahead
seasonality (auto or none or number)

What is Fuzzy Matching?

Adds uniformity to spelling variations in variables.

How to track story versions?

ED keeps previous versions of stories so you can track your progress. View - Version History on story toolbar cancel a version - Cancel story before submitting it compare versions - 'what changed' button

Disparate Impact alert

significant discrepancy in the way different classes are being treated

Proxy Variable alert

one or more variables highly correlated to a sensitive variable

Outliers Alert

uncommonly large or small numbers

strongest predictors alert

variable that is so highly correlated with outcome that it should be exampled for possible data leakage.

Multicollinearity alert

2 or more variables highly correlated. Would have duplicate impact on outcome variable

High cardinality Alert

variable contains more than 100 unique values

Missing values alert

variable is missing a high percentage of values

Identical values alert

all values of variable are identical

recommended buckets alert

Indicates that, for a numerical value, Einstein Discovery devised an alternative set of buckets (grouping of data points based on ranges).

dominant values alert

Indicates that most values in a variable are in the same category, which can limit its contribution to the analysis.

No Correlation Alert

Indicates that this variable explains no variation in the outcome and has no statistical significance.

Imbalanced Distribution Alert

Indicates a disproportionate ratio of observations in each class in training data.

Potential data Leakage Alert

Indicates that a value in an explanatory variable always results in the same outcome value, which may indicate data leakage. Data leakage occurs when your training data contains the information that you’re trying to predict. Leakage results in models that score optimistically high in training but perform less accurately on live data. To produce more realistic models that perform better at predicting outcomes, investigate and remove leaky predictor variables from your model.

Area Under the curve quality alert

Indicates that this binary classification model's AUC metric is so high or low that it warrants further examination.

R2 Quality Alert

Indicates that this model's R2 metric is so high or low that it warrants further examination.

Cross-Validation Failure Alert

Indicates that cross-validation failed for this model.

Rename or move a story

Edit Story Toolbar -> Properties For App, can change app For story name, changes name save

How to delete a story

edit story menu -> delete. Once you delete it can't be recovered.

Einstein Discovery Data Prep and Create Stories Flashcards

(44 cards)