Einstein Discovery Data Prep and Create Stories Flashcards

1
Q

You can remedy data issues in 2 ways:

A

Fix issue in CRM Analytics dataset using data prep tools.

Correcting issue in story using story settings. Story fixes don’t affect data in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Prep Terminology

A

Variables - Category of data. Columns in dataset.
Observations - Row Value in Dataset
Data Type - Numerical, Categorical (text), Date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Max # Observations in datset

A

20M

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Story

A

A story contains answers, explanations, predictions, and suggested actions that arranged into an organized presentation with logical flow and related sections. The story is filled with insights about your data as they relate to the outcome you’re interested in. Einstein Discovery walks you through what has happened and why, what has changed, what is likely to happen, and what you can do about it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Two Types of Stories

A

Insights Only - only descriptive

Insights and Predictions - all insight types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 ways to create a story

A

Dataset or Template

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ways to create a Story from a Dataset

A
  1. Create -> Create from dataset.
  2. While viewing a lens.
  3. From dataset dropdown.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stories and Security Predicates

A

All users who access the story can see the results of the story. They don’t need the same row-level access as the story creator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What data in a dataset is a story based on?

A

A snapshot of the data. Initial data snapshot taken when story is created. If data has changed in source dataset, users with sufficient privileges can refresh story based on most recent data. Otherwise, subsequent changes to the story do not affect the snapshot, and subsequent changes to the dataset are ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Occurences

A

performs an extensive query analysis of dataset values by calculating the number of times a value occurs in a column, including interactions with other columns. For example, the color red occurs 30% of the time in an Automobile dataset, of those rows the most frequent body type is coupe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does template overview provide?

A

Description, List of Supported Objects, Sample Insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Issue: Story concurrency limits exceeded

A

No more than two stories can be created concurrently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dataflow run limits exceeded

A

During app creation, story templates runs a dataflow twice - create a dataset used to train the predictive model, and use the predictive model to generate prediction scores and rite back to crm.

If you exceel the max number of dataflow runs in your ord in a 24 hours period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Sync-related limits exceeded

A

Story templates can add objects to Data Sync. If org already has created the max number of data sync objects, will fail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Daya Sync-related errors

A

If app creation triggers data sync-related errors, address them in data manager before trying to create the template again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Elements of a story interface

A
  1. Story Headline - Name of story, goal, most recent version
  2. Story toolbar
  3. Variables Panel - list of explanetory variables and their correlation to outcome
  4. Story Version summary - summary of insights, version comparison
  5. insight Summary Panels - List of variables, ordered by correlation, that positively or negatively impact a story
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a story headline contain?

A
The basis of the story.
Story Name
Version Update
Story Goal
Story Version
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the story version summary contain?

A

Goal
Row Couunt (# obs in analysis)
# Change in Row Count from previous version
Outcome Avg
% Change in outcome avg from previous version
What changed between versions

19
Q

How to Edit Story

A

Open Story
click Edit Story

Can change columns, update story to latest dataset change

Use correlation column to see how much each field contributed to the outcome. remove columns that have little to no impact.

20
Q

What column contains fields that you can improve, such as fields with outliers or duplicates?

A

Data Alert Column

21
Q

What can you edit in the general settings tab?

A

Analysis Type (insights or insights & preds)

Algorithm (GLM, GBM, XGBoost, random Forest)
- select Model Tournament to have ED run all algorithms and show the results of algorithm that performed best

Validation Type -

  • Training/Validation Ratio
  • Validation Dataset (can specify crm dataset). Will only see datasets that match the schema of your story’s datset.

None (default) - uses only k-fold validation.

22
Q

Configure Number Variables

A

change settings for individual numbers in your story.
On Story settings, click number field. Can:

analyze for bias (select to exclude a variable from the model. A SHIELD icon will appear next to the title of the insight to remind you it’s a sensitive variable)

Transform - Replace missing values, projected predictions

Bucket Values by (count, width, manual)

Number of buckets - specify number of buckets to show in charts

Include only – adds min and max values to starting values and ending value fields

Preview - Graph shows number of values that occur across the range of number ranges.

23
Q

What are projected predictions?

A

Providing trending data for numeric variables that factor into your predictions to make them more accurate

24
Q

Configure Projected Predictions

A

Provide dataset that contains trend data.
Tell story data about the dataset:
Unique columns identifier
Variable column (maps to selected variable in story)
time interval column
time interval number of intervals to project ahead
seasonality (auto or none or number)

25
Q

What is Fuzzy Matching?

A

Adds uniformity to spelling variations in variables.

26
Q

How to track story versions?

A

ED keeps previous versions of stories so you can track your progress.

View - Version History on story toolbar
cancel a version - Cancel story before submitting it
compare versions - ‘what changed’ button

27
Q

Disparate Impact alert

A

significant discrepancy in the way different classes are being treated

28
Q

Proxy Variable alert

A

one or more variables highly correlated to a sensitive variable

29
Q

Outliers Alert

A

uncommonly large or small numbers

30
Q

strongest predictors alert

A

variable that is so highly correlated with outcome that it should be exampled for possible data leakage.

31
Q

Multicollinearity alert

A

2 or more variables highly correlated. Would have duplicate impact on outcome variable

32
Q

High cardinality Alert

A

variable contains more than 100 unique values

33
Q

Missing values alert

A

variable is missing a high percentage of values

34
Q

Identical values alert

A

all values of variable are identical

35
Q

recommended buckets alert

A

Indicates that, for a numerical value, Einstein Discovery devised an alternative set of buckets (grouping of data points based on ranges).

36
Q

dominant values alert

A

Indicates that most values in a variable are in the same category, which can limit its contribution to the analysis.

37
Q

No Correlation Alert

A

Indicates that this variable explains no variation in the outcome and has no statistical significance.

38
Q

Imbalanced Distribution Alert

A

Indicates a disproportionate ratio of observations in each class in training data.

39
Q

Potential data Leakage Alert

A

Indicates that a value in an explanatory variable always results in the same outcome value, which may indicate data leakage. Data leakage occurs when your training data contains the information that you’re trying to predict. Leakage results in models that score optimistically high in training but perform less accurately on live data. To produce more realistic models that perform better at predicting outcomes, investigate and remove leaky predictor variables from your model.

40
Q

Area Under the curve quality alert

A

Indicates that this binary classification model’s AUC metric is so high or low that it warrants further examination.

41
Q

R2 Quality Alert

A

Indicates that this model’s R2 metric is so high or low that it warrants further examination.

42
Q

Cross-Validation Failure Alert

A

Indicates that cross-validation failed for this model.

43
Q

Rename or move a story

A

Edit Story Toolbar -> Properties
For App, can change app
For story name, changes name
save

44
Q

How to delete a story

A

edit story menu -> delete. Once you delete it can’t be recovered.