Week 3 - Research and Measurement Flashcards
why do research and analysis?
in order to make the right decision
does all data and analysis have value?
NO - only if they help us make a decision
raw data has very little value
in hypothesis testing, when do you make a prediction
prior to testing (a priori)
what is the purpose of marketing research
inform decision making for business decisions (vs scientific research for instance)
what do you call raw data once it has been analysed?
interpreted data, ie, information
how should a decision maker be involved?
understand enough to know what’s reliable
tell the research team which questions to answer
potentially make predictions
project manage perhaps
be able to think like a researcher
how should a researcher be involved?
convert questions/predictions into testable hypotheses
conduct the applicable research
present results in a way to answer the original question
communicate information clearly - reduce the complex
how should administrators be involved?
understand sufficiently to
1) find common ground
2) engage throughout the process
what is inferential statistics?
statistical analysis to infer or estimate from a population
based on probability
what are the properties of data?
assignment
assignment and order
assignment order, and distance
assignment order, distance, and origin
what is the minimal requirement for raw data to be analysed?
must be able to place into categories (at least assignment)
can have:
assignment order, distance, and origin
what is assignment for data?
groupings
eg, color, gender, state
what is order for data?
data points that can be ordered eg, birth order, class rank, placement in race
what is distance for data?
ability to understand how far apart data points are from each other
eg, one person has 100%, another has 80%, distance is 20ppt
what is origin for data?
an unambiguous starting point or point of comparison
eg, zero is the lowest grade, 2018 is the current year
allows measurement of distance between data points AND vs origin
what are the four classifications of data?
non-metric
- nominal
- ordinal
metric
- interval
- ratio
what is nominal data classification?
nonmetric = nonparametric tests
assignment only
central tendency is only mode (most frequently occurring)
eg, most of these m&ms are blue
what is ordinal data classification?
nonmetric
assignment and order
central tendency is only mode or median
eg, shortest to tallest height
what is interval data classification?
metric = parametric data analysis available
assignment, order, and distance
(considered continuous because distance between points is measurable)
central tendency: mean, median, and mode (all three)
eg, what is the average length of a canoe
what is ratio data classification?
metric
assignment, order, distance, and origin
continuous
all central tendencies (mean, median, and mode)
eg, star ratings between books, consumption over years
what are descriptive statistics
a quantitative approach to identifying characteristics about a respondent pool
not a testing method
who answered our questions? what is the make up of our data overall?
what tools does descriptive statistics use?
central tendencies (mean, median, mode)
percentages
measures of dispersion
frequency distributions
when and how can you use mode?
any data with assignment (nominal)
what’s the most common?
how do you use median?
any data with an order property
what’s in the middle if you count from each side?
if two, you average the tie to come up with the answer
how do you use the mean?
only if you have distance property
average the group
what is a percentage?
a frequency, expressed as a fraction of 100
what is a range?
the defined distance between the smallest and largest numbers in the data?
how do you measure standard deviation?
what is the average difference between data points and the mean
how similar are numbers on average?
how do you measure frequency distribution?
visualise the distribution of the data - say with a bar chart
mode is just picking the tallest bar
can be applied to nominal data
what is the difference between census and sample studies?
census = entire population sample = part of the population
inferential statistics help when you can’t perform a whole census
when is a census study better than a sample study?
any time you can do a census study
but often it isn’t reasonably possible
what is a population parameter
population parameter = true fact based on 100% observation (census)
statistic = estimate
what are the pros and cons of sampling?
pro
- lower cost
- easier and faster data handling
cons
- higher error rate
- errors can drive bad decision making
why are sample-based estimates useful?
probability distributions allow for predictable estimates
how much does sample drawing matter?
it’s THE most critical part - an error here can lead to skewing or bias
how can you draw a sample?
probability - researcher has no role in drawing (eg, random sample)
nonprobability - researcher does have a role (eg, convenience sampling of people nearby)
what is probability sampling?
researcher plays no role in buliding the sample
generally near random
similar but not exactly every data point has an equal chance of being selected
what is nonprobability sampling?
researcher does play a role in selection
convenience sample is very common - stopping people at the subway for instance
why does error occur in statistical inference?
because a sample <> census
thus while it is in theory representative,
often reality can differ
what are the two types of errors found in statistical inferences?
sampling error - nonrepresentative sample
nonsampling error - systemic and/or random error not associated with the manner of drawing the sample
when should sampling error be suspected
probability sampling (random) - no risk of sampling error, but VERY rarely 100% followed (think - completion bias) non-probability sampling (selected) - high risk of error, must assume at least a certain level of error (hence statistical significance)
when should nonsampling error be suspected?
any time you don’t have a full census
even if the sample is random, if it isn’t complete (eg census) we can never be 100% sure of conclusions
what is the null hypothesis?
proof that there is no difference between compared populations
eg, people who take this medicine are definitely no better off than people who don’t
the null hypothesis is generally assumed true until proven false
what is a Type 1 error?
telling a man he’s pregnant when he isn’t
rejecting the null hypothesis, when it’s actually True
what is a Type 2 error?
telling a man he’s not a man when he really is
accepted the null hypothesis when the null hypothesis is false
normally type 2 is safer
can you decrease the likelihood of type 1 or 2 errors?
yes, by selecting significance levels
but decreasing type 1 increases risk of type 2
choose your adventure
what are the two categories of data collection?
primary data
secondary data
what is secondary data?
collected for a purpose other than this research project
eg, UN data
what is primary data?
collected specifically for our hypotheses
what are the pros/cons of secondary data?
pros
- available, already there
- price, might be cheap or even free
cons
- relevancy, might not fit needs
- accuracy, why was it collected, what standards were in place?
what is big data?
normally secondary data
passively collected
both structured and unstructured
can test hypotheses, but can’t verify cause/effect
how is primary data collected?
questioning - survey, interview (might not be answered honestly)
observing - watching, documenting (more honest answers, but harder to understand the why) - on a person or on a company (eg, keyword analysis of company legal policies)
how can you establish causality?
only through experimentation
must be very careful to not communicate correlation as causality
what three factors are required to prove causality?
evidence of statistical association
temporal ordering
control for competing hypotheses
how do you prove causality - evidence of statistical association?
necessary, but insufficient for causality
how do you prove causality - temporal ordering?
must prove that A came before B
eg, fire trucks arrived after fire started, not before
how do you prove causality - control for competing hypotheses?
look for unmeasured or unobserved hypotheses
alternative hypotheses
randomise away errors through probability sampling and experiment design
churches and liquor stores increase in parallel, but even with temporal ordering, neither causes the other
reality: population growth caused both