Marketing Analytics Flashcards

1
Q

What is data?

A

A distinct piece of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two main data types

A

Quantitative data takes on numeric values that allow us to perform mathematical operations (like the number of dogs). Can be divided into continuous and discrete

Categorical is used to label a group or set of items (like dog breeds - Collies, Labs, Poodles, etc.). Can be divided into ordinal and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical Ordinal vs Categorical Nominal

A

Categorical Ordinal vs. Categorical Nominal
We can divide categorical data further into two types: Ordinal and Nominal.

Categorical Ordinal data take on a ranked ordering (like a ranked interaction on a scale from Very Poor to Very Good with the dogs).

Categorical Nominal data do not have an order or ranking (like the breeds of the dog).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Continuous Vs Discrete Data

A

Continuous data can be split into smaller and smaller units, and still a smaller unit exists. An example of this is the age of the dog - we can measure the units of the age in years, months, days, hours, seconds, but there are still smaller units that could be associated with the age.

Discrete data only takes on countable values. The number of dogs we interact with is an example of a discrete data type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Four Aspects for quantitative Data

A

There are four main aspects to analyzing Quantitative data.

Measures of Center
Measures of Spread
The Shape of the data.
Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measures of Center

A

Measures of Center
There are three measures of center:

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating the Mean

A

Sum of all values divided by the count of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Median

A

it is the middle value of a data set, when the dataset has been ordered from smallest to largest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mode

A

The Mode
The mode is the most frequently observed value in our dataset.

There might be multiple modes for a particular dataset, or no mode at all.

No Mode
If all observations in our dataset are observed with the same frequency, there is no mode. If we have the dataset:

1, 1, 2, 2, 3, 3, 4, 4

There is no mode, because all observations occur the same number of times.

Many Modes
If two (or more) numbers share the maximum value, then there is more than one mode. If we have the dataset:

1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9

There are two modes 3 and 6, because these values share the maximum frequencies at 3 times, while all other values only appear once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Notation

A

Notation is a common language used to communicate mathematical ideas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Random Variable

A

Random Variables
A random variable is a placeholder for the possible values of some process (mostly… the term ‘some process’ is a bit ambiguous). As was stated before, notation is useful in that it helps us take complex ideas and simplify (often to a single letter or single symbol). We see random variables represented by capital letters (X, Y, or Z are common ways to represent a random variable).

We might have the random variable X, which is a holder for the possible values of the amount of time someone spends on our site. Or the random variable Y, which is a holder for the possible values of whether or not an individual purchases a product.

X is ‘a holder’ of the values that could possibly occur for the amount of time spent on our website. Any number from 0 to infinity really.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

x1

A

First observed value of the random variable X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Measures of Spread

A

Measures of Spread are used to provide us an idea of how spread out our data are from one another. Common measures of spread include:

Range
Interquartile Range (IQR)
Standard Deviation
Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Histograms

A

Histograms

Histograms are super useful to understanding the different aspects of quantitative data such as measures of spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculating the 5 Number Summa

A

Calculating the 5 Number Summary

The five number summary consist of 5 values:

Minimum: The smallest number in the dataset.

Q 1: The value such that 25% of the data fall below.

Q 2: The value such that 50% of the data falls below.

Q3: The value such that 75% of the data fall below.

Maximum: The largest value in the dataset.

Essentially, each value is just the median of a bunch of values

Range
The range is then calculated as the difference between the maximum and the minimum.

IQR
The interquartile range is calculated as the difference between Q3 and Q1
​ .
In the upcoming sections, you will practice this with Katie and on your own.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Box Plot

A

Useful for quickly comparing the Spread of two data sets across a key metric

17
Q

How to measure spread with a single value?

A

use standard deviation or variance

18
Q

Standard deviation vs Variance

A

both Tells us how far each point is from the mean of the point

The standard deviation is the square root of the variance.

In practice, you usually use the standard deviation rather than the variance. The reason for this is because the standard deviation shares the same units with our original data, while the variance has squared units.

19
Q

What is the use of Standard deviation

A

If data is associated with money, a higher SD is associated with a higher risk

standard deviation is used to tell if data is statistically significant or part of the expected variation

20
Q

Which greek symbol is used to denote standard deviation?

A

Sigma

21
Q

Normal distribution

A

A histogram with a symmetrical shape where the mean = the median = the mode

for normal distribution, it might be sufficient to only look at the mean and standard deviation for a conclusion

22
Q

Right skewed shape

A

Mean greater than Median which is greater than the mode

for skewed distributions, instead of the mean and standard deviation, a 5 variable summary, might provide better insight

23
Q

Left skewed shape

A

Mean is less than the median, which is less than the mode

for skewed distributions, instead of the mean and standard deviation, a 5 variable summary, might provide better insight

24
Q

Descriptive Statistics

A

Descriptive statistics is about describing our collected data.

25
Q

Inferential Statistics

A

Inferential Statistics is about using our collected data to draw conclusions to a larger population. Population - our entire group of interest.
Parameter - numeric summary about a population
Sample - a subset of the population
Statistic numeric summary about a sample

26
Q

Parameter vs Statistic

A

A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean).

The goal of quantitative research is to understand characteristics of populations by finding parameters.