Marketing Analytics Flashcards by Charl Swart

What is data?

A distinct piece of information

How well did you know this?

Not at all

Perfectly

Two main data types

Quantitative data takes on numeric values that allow us to perform mathematical operations (like the number of dogs). Can be divided into continuous and discrete

Categorical is used to label a group or set of items (like dog breeds - Collies, Labs, Poodles, etc.). Can be divided into ordinal and nominal

How well did you know this?

Not at all

Perfectly

Categorical Ordinal vs Categorical Nominal

Categorical Ordinal vs. Categorical Nominal
We can divide categorical data further into two types: Ordinal and Nominal.

Categorical Ordinal data take on a ranked ordering (like a ranked interaction on a scale from Very Poor to Very Good with the dogs).

Categorical Nominal data do not have an order or ranking (like the breeds of the dog).

How well did you know this?

Not at all

Perfectly

Continuous Vs Discrete Data

Continuous data can be split into smaller and smaller units, and still a smaller unit exists. An example of this is the age of the dog - we can measure the units of the age in years, months, days, hours, seconds, but there are still smaller units that could be associated with the age.

Discrete data only takes on countable values. The number of dogs we interact with is an example of a discrete data type.

How well did you know this?

Not at all

Perfectly

Four Aspects for quantitative Data

There are four main aspects to analyzing Quantitative data.

Measures of Center
Measures of Spread
The Shape of the data.
Outliers

How well did you know this?

Not at all

Perfectly

Measures of Center

Measures of Center
There are three measures of center:

Mean
Median
Mode

How well did you know this?

Not at all

Perfectly

Calculating the Mean

Sum of all values divided by the count of values

How well did you know this?

Not at all

Perfectly

Median

it is the middle value of a data set, when the dataset has been ordered from smallest to largest

How well did you know this?

Not at all

Perfectly

Mode

The Mode
The mode is the most frequently observed value in our dataset.

There might be multiple modes for a particular dataset, or no mode at all.

No Mode
If all observations in our dataset are observed with the same frequency, there is no mode. If we have the dataset:

1, 1, 2, 2, 3, 3, 4, 4

There is no mode, because all observations occur the same number of times.

Many Modes
If two (or more) numbers share the maximum value, then there is more than one mode. If we have the dataset:

1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9

There are two modes 3 and 6, because these values share the maximum frequencies at 3 times, while all other values only appear once.

How well did you know this?

Not at all

Perfectly

Notation

Notation is a common language used to communicate mathematical ideas

How well did you know this?

Not at all

Perfectly

Random Variable

Random Variables
A random variable is a placeholder for the possible values of some process (mostly… the term ‘some process’ is a bit ambiguous). As was stated before, notation is useful in that it helps us take complex ideas and simplify (often to a single letter or single symbol). We see random variables represented by capital letters (X, Y, or Z are common ways to represent a random variable).

We might have the random variable X, which is a holder for the possible values of the amount of time someone spends on our site. Or the random variable Y, which is a holder for the possible values of whether or not an individual purchases a product.

X is ‘a holder’ of the values that could possibly occur for the amount of time spent on our website. Any number from 0 to infinity really.

How well did you know this?

Not at all

Perfectly

First observed value of the random variable X

How well did you know this?

Not at all

Perfectly

Measures of Spread

Measures of Spread are used to provide us an idea of how spread out our data are from one another. Common measures of spread include:

Range
Interquartile Range (IQR)
Standard Deviation
Variance

How well did you know this?

Not at all

Perfectly

Histograms

Histograms are super useful to understanding the different aspects of quantitative data such as measures of spread

How well did you know this?

Not at all

Perfectly

Calculating the 5 Number Summa

Calculating the 5 Number Summary

The five number summary consist of 5 values:

Minimum: The smallest number in the dataset.

Q 1: The value such that 25% of the data fall below.

Q 2: The value such that 50% of the data falls below.

Q3: The value such that 75% of the data fall below.

Maximum: The largest value in the dataset.

Essentially, each value is just the median of a bunch of values

Range
The range is then calculated as the difference between the maximum and the minimum.

IQR
The interquartile range is calculated as the difference between Q3 and Q1
.
In the upcoming sections, you will practice this with Katie and on your own.

How well did you know this?

Not at all

Perfectly

Box Plot

Study These Flashcards

Useful for quickly comparing the Spread of two data sets across a key metric

How to measure spread with a single value?

Study These Flashcards

use standard deviation or variance

Standard deviation vs Variance

Study These Flashcards

both Tells us how far each point is from the mean of the point

The standard deviation is the square root of the variance.

In practice, you usually use the standard deviation rather than the variance. The reason for this is because the standard deviation shares the same units with our original data, while the variance has squared units.

What is the use of Standard deviation

Study These Flashcards

If data is associated with money, a higher SD is associated with a higher risk

standard deviation is used to tell if data is statistically significant or part of the expected variation

Which greek symbol is used to denote standard deviation?

Study These Flashcards

Sigma

Normal distribution

Study These Flashcards

A histogram with a symmetrical shape where the mean = the median = the mode

for normal distribution, it might be sufficient to only look at the mean and standard deviation for a conclusion

Right skewed shape

Study These Flashcards

Mean greater than Median which is greater than the mode

for skewed distributions, instead of the mean and standard deviation, a 5 variable summary, might provide better insight

Left skewed shape

Study These Flashcards

Mean is less than the median, which is less than the mode

for skewed distributions, instead of the mean and standard deviation, a 5 variable summary, might provide better insight

Descriptive Statistics

Study These Flashcards

Descriptive statistics is about describing our collected data.

Inferential Statistics

Inferential Statistics is about using our collected data to draw conclusions to a larger population. Population - our entire group of interest. Parameter - numeric summary about a population Sample - a subset of the population Statistic numeric summary about a sample

Parameter vs Statistic

A parameter is a number describing a whole population (e.g., population mean), while a statistic is a number describing a sample (e.g., sample mean). The goal of quantitative research is to understand characteristics of populations by finding parameters.

Marketing Analytics Flashcards

(26 cards)