Intro Flashcards

1
Q

Definition / equation of churn rate

A

Cancellations / total subscribers (current + new subscribers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two types of organized observations

A

Methodology and shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the most common shape for data

A

Table or spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variables

A

The things we measure (columns of a table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Observations / entity / instance

A

Rows - Individual instances of the things we are measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Numerical variables

A

Both the measurement and unit of measurement (without unit a numerical variable is just a number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two ways of getting a number

A

Counting (discrete) or measuring (continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Whole numbers are what type of variable

A

Discrete variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Partial values are what type of variable

A

Continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical variables

A

Characteristics with words or relative values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal variable

A

A categorical variable that is specifically A named value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dichotomous variable

A

A categorical variable that is binary (yes / no, true / false, on / off, 1 / 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinal variable

A

A categorical variable that is a subjective value (a ranking from 1 to 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 common messy data problems

A
  1. Typos
  2. Missing data
  3. Inconsistent coding (three instead of 3 or N/A instead of 0)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Missing completely at random

Vs

Missing at random

Vs

Structurally missing

A

Data was simply not entered or entered properly

We can predict if one value is missing based on the value in another variable

We don’t expect there to be a value to begin with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Accuracy

A

A measure of how well records reflect reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Validity

A

The data actually measures what we think it is measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Various ways that a dataset can be low quality

A

Typos
Mistakes
Missing data
Poor measurement
Duplicate observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the two types of categorical variables

A

Ordinal (ordered)

Nominal (unordered)

20
Q

A distribution is

A

A function that shows all possible values of a variable and how frequently each value occurs

21
Q

Interquartile range

A

The range between the first and third quartile of the dataset

First quartile marks the point at 25% into the range of data

Third quartile marks the point at 75% into the range of data

A range of data is all values arranged from smallest to largest

22
Q

Bimodal distribution

A

A distribution with two peaks (modes)

23
Q

The act of aggregating data

A

Summarizing a numeric variable across each value of a categorical variable

24
Q

Correlation coefficient

A

Direction: - or +

Strength: 0 to 1

25
Chart for representing change over time
Line graph
26
Chart for comparing a part to the whole
Pie chart
27
Chart for showing the spread of data points in one variable
Histogram
28
A chart for comparison of two variables to understand a trend
Scatterplot w/ or w/o trendline
29
Univariate charts
Help us visualize a change in only one variable - often that means measuring “how much” - a common type for counts is bar charts
30
Univariate chart type examples
Bar chart Histogram Density curve Box plot Univariate map
31
Bi / Multivariate charts definition
Charts that show the relationship between two or more variables
32
Multivariate chart examples
Scatterplot Line chart Bivariate map
33
Is information redundancy bad?
Not necessarily, redundancy can help add clarity or emphasis
34
Linear scale
The numbers of the axis count up by a consistent interval
35
Logarithmic scale
Common for showing exponential growth
36
Three common color scales
Sequential Diverging Categorical
37
Descriptive analysis
We describe, summarize, and visualize data so that patterns can emerge Most of the time this is the first step in the analysis process
38
Descriptives / summary statistics
Central tendency: mean median mode Spread: range, quartiles, variance, standard deviation, distribution
39
Exploratory analysis
Typically the next step after descriptive analysis We look for relationships between variables in our dataset
40
Clustering analysis
Uses Principal Component Analysis which compresses the variables into principle components that can be plotted against each other The plotting is checked with a k-means clustering value to confirm the correlation
41
Inferential analysis
We test a hypothesis on a sample of a population and then extend our conclusions to the whole population Typically use an A/B test Sample size should be at least 10% of the population and must be random selection
42
Casual analysis
Carefully designed experiments, usually with the following: - only change one variable at a time - carefully control all other variables - repeated multiple times with same results
43
Good experimental design
Replication Randomization Control
44
Casual inference with observational data
Requires: - advanced techniques to identify a casual effect - meeting very strict conditions - appropriate statistical tests
45
Predictive analysis
Uses supervised machine learning techniques It will only be as good as the training data used to start the algorithm