Descriptive Analytics Flashcards

1
Q

Data

A

Set of pieces of individual information (can be likes, tweets, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 Question to Ask With New Data

A

1) How many variables in dataset?
2) How many observations in the dataset?
3) What level are the observations (customer, transaction, store, etc.)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Quantitative

A

Scale relevant
Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative

A

Set of categories

gender {m,f,o} being {0,1,2} doesn’t mean o is double the value of f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete

A

Takes only certain values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous

A

Can take on any numerical value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variation

A

Observations for one variable will take on different values, not all values for “churn” will be the same in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean

A

Average

Sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Median

A

Middle most value

Ignores a lot of information
Can move abruptly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mode

A

Most common value

Not good for continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Frequency Distribution

A

Table showing fraction of observations for which a variable takes on each possible value

Variable. # Obs. % of overall Obs
A
B
C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Range of Values

A

Important if you care about the likelihood of a good outcome - blockbuster

Want to avoid a bad- healthcare

Care about inequality/dispersion- inventory management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interquartile Range

A

Difference between 25th and 75th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard Deviation

A

sqrt(variance)

gives the average variation around the mean in the units of the variable

lower means data close to the mean

higher means data spread out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Coefficient of Variation

A

SD/Mean

Compares degree of variation between data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Summary Statistics

A

On numerical variables #obs, mean, median, std dev, min, max

17
Q

Data will have

A

Randomness
Measurement Error
True Unexplained Factors (things we didn’t realize impacted our variable of interest

18
Q

Data will have

A

Randomness
Measurement Error
True Unexplained Factors (things we didn’t realize impacted our variable of interest

19
Q

Tools to Uncover Systematic Relationships

A

Scatter Plot
Binned Scatter Plot
Coefficient of Correlation
Conditional Means
Cross Tab

20
Q

Binned Scatter Plot

A

Reduce data points by taking conditional means of y values

21
Q

Coefficient of Correlation

A

Measures direction and strength on linear relationship between quant variables

Sign = direction
Abs Val = strength
Always between -1 and 1, unit less, not slope
X with Y = Y with C

V shape will make 0 because it cancels out

22
Q

Conditional Means

A

Mean of different x values on different y values to show relationship
Must be numerical
Best if y is discrete or has categories
Can be y vs a dummy variable x

23
Q

Cross Tab

A

Measures the frequency that certain combinations of features occur using conditional means

Ie Airports on rows and Buckets of Delays on Columns

24
Q

Confounding Effects

A

Mixture of Effects

25
Q

Descriptive Analysis Steps

A

1) Get to know data
2) Explore distribution
3) Explore correlations between key variables
4) Recognize some variation is driven factors- we don’t know or can’t measure