intro to statistics Flashcards
Stats 101
Four Common Misleading Graph Methods
- Graphing an inappropriate statistic
- Omitting the zero on the relevant scale
- Manipulating the scale
- Two dimensions to represent one dimension
Stats 101
Voluntary Response Sample
When a large group of individuals is invited to respond and those that don’t are not counted. They respond because they have a high interest in the topic surveyed.
Stats 101
Random Selection
occurs when every member of population to which we would like to generalize our results has an equally likely chance of being chosen to participate in the study
Stats 101
Data
collections of observations, such as measurements, genders, or survey responses.
Stats 101
Datum
a single data value
Stats 101
Statistics
the science of planning studies and experiments; obtaining data; and then organizing, summarizing, presenting, analyzing and interpreting those data and then drawing conclusions based on them.
Stats 101
Population
the complete collection of all measurements or data that are being considered.
Stats 101
Census
the collection of data from every member of the population.
Stats 101
sample
a subcollection of members selected from a population
Stats 101
process in conducting statistical study
prepare
analyze
conclude
Stats 101
Context
what do the data mean?
What is the goal of the study
Stats 101
Source of the data
are the data from a source with a special interest so that there is pressure to obtain results that are favorable to the source?
Stats 101
sampling method
Were the data collected in a way that is unbiased, or were the data collected in a way that is biased. (such as a procedure in which respondents volunteer to participate)
Stats 101
Prepare
- context
- source of data
- sampling method
Stats 101
Analyze
- graph the data
- explore the data
- apply statistical methods
Stats 101
Explore the data
Are there any outliers (numbers very far away from almost all of the other data)?
What important statistics summarize the data (such as the mean and standard deviation )?
Did many selected subjects refuse to respond?
Stats 101
Apply statistic methods
use technology to obtain results
Stats 101
Conclude
- statistical significance
- do the results have statistical significance?
- Do the results have practical significance?
Stats 101
statistical significance
achieved in a study when a result is given that is not likely to occur by chance.
Stats 101
practical significance
the results have some meaningful and useful implications for the real world
Stats 101
misleading conclusions
can come from reported results
small samples
and other errors
Stats 101
reported results
subjects report their results rather than the surveyor taking measurements. (I lost 5 lbs - could be a lie.)
Stats 101
Small Samples
Conclusions should not be based on samples that are far too small.
Example: Basing a school suspension rate on a sample of only three students.
Stats 101
loaded questions
survey questions intentionally worded to elicit a desired response.
Stats 101
order of questions
survey questions are unintentionally loaded by the order of items being considered.
ex: “would you say traffic contributes more to air pollution than industry?
would you say industry contributes more to air pollution than traffic?
Stats 101
nonresponse
someone who refuses to respond to a survey question or is unavailable
Stats 101
Missing data
can dramatically affect results
- can be caused by random factors such as people dropping out
-special factors such as low income people refusing to admit their annual income
Stats 101
precise numbers
241,472,385 adults in U.S.
people assume that because the number is so precise, it must be accurate. that number is an estimate and would be better represented as 240 million adults.
Stats 101
percentages
some studies have unclear or misleading percentages
ex: references to percentages that exceed 100%
Stats 101
percentage of
to find the percentage of an amount, drop the % symbol and divide the percentage value by 100.
6% of 1,200 respondents
6⁄100 × 1,200 = 72
Stats 101
Fraction → Percentage
divide the denominator into the numerator to get an equivalent decimal number, then multiply by 100 to get percent.
3/4 = .75 → .75× 100 = 75%
Stats 101
Decimal → Percent
Multiply by 100.
.25 × 100 = 25%
Stats 101
Percentage → Decimal
85% = 85/100 = .85
Stats 101
parameter
a numerical measurement that describes some characteristic of a population
Stats 101
statistic
a numerical measurement describing some characteristic of a sample.
Stats 101
2 statistics definitions
- two or more numerical measurements describing characteristics of samples.
- the science of planning studies and experiments; obtaining data; organizing; summarizing, presenting, analyzing and interpreting those data; and then drawing conclusions based on them.
Stats 101
2 statistics definitions
- two or more numerical measurements describing characteristics of samples.
- the science of planning studies and experiments; obtaining data; organizing; summarizing, presenting, analyzing and interpreting those data; and then drawing conclusions based on them.
Stats 101
quantitative data
data which consists of numbers representing counts or measurements
Stats 101
categorical data
aka qualitative or attribute data
names or labels that are not numbers representing counts and measurements
Stats 101
discrete data
result when the data values are quantitative and the number of values is finite or countable.
use the word “fewer”
Stats 101
continuous (numerical) data
result from infinitely many possible quantitative values, where the collection of values is not countable
use the word “less”
Stats 101
nominal level of measurement
data that consists of names, labels and categories only. data cannot be ranked.
ex: eye colors
Stats 101
ordinal level of measurement
data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.
rank of colleges in U.S news & world report
Stats 101
interval level of measurement
data that can be arranged in order, and the differences between data values can be found and are meaningful. data at this level do not have a natural zero starting point at which none of the quality is present. (such as time, because there is no year zero, or degrees in F)
Stats 101
ratio level of measurement
data that can be arranged in order, differences can be found and they are meaningful, and there is a natural zero starting point (where zero indicates none of the quantity are present.) ex: height, length, distance, volume
Stats 101
observational study
observe and measure certain characteristics, we don’t attempt to modify the subjects being studied.
Stats 101
experiment
some treatment is applied and it’s effects on subjects are observed.
subjects are called “experimental units.”
Stats 101
lurking variable
affects the variables included in the study, but is not included in the study.
Stats 101
simple random sample
of n subjects is selected in such a way that every possible sample of the same size n has the same size chance of being chosen.
Stats 101
random sampling
each member of the population has an equal chance of being selected: computers used to generate random phone numbers
Stats 101
systematic sampling
select some starting point, and then select every kth element in the population
Stats 101
Convenience sampling
Use results easiest to get
Stats 101
Stratified Sampling
subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket,) then we draw a sample from each subgroup (or stratum).
Stats 101
cluster sampling
divide the population into sections or clusters then randomly select some of those clusters, and then choose all members from those selected clusters.
Stats 101
multistage sampling
pollsters select a sample in different stages and each stage might use different methods of sampling.
Stats 101
cross-sectional study
data are observed, measured and collected in one point in time rather than a period of time.
Stats 101
retrospective study
aka case control study
data are collected from a past time period by going back in time (observing records, interviews etc.)
Stats 101
prospective study
aka longitudinal or cohort study
data are collected in the future from groups that share common factors.
Stats 101
randomization
used when subjects are assigned to different groups through a process of random selectionf
Stats 101
Replication
repetition of the experiment on more than one subject.
Stats 101
blinding
when subject doesn’t know if she is receiving the treatment or the placebo.
Stats 101
placebo effect
when an untreated subject reports an improvement in symptoms
Stats 101
Double-blind
neither the subjects nor the experimenter know what group the subject is in
Stats 101
confounding
occurs when the investigators are not able to distinguish among the effects of different factors
Stats 101
completely randomized experimental design
assign subjects to different treatments groups by a process of random selection
Stats 101
block
a group of subjects that are similar
Stats 101
randomized block design
blocks differ in ways that might affect the outcome of the experiment
1. form blocks or groups with similar characteristics
2.randomly assign treatments to subjects within each block
Stats 101
matched pairs design
compare two treatments groups (such as treatment and placebo) by using subjects that are matched in pairs that are somehow related or have similar characteristics examples:
Before/After
Twins
Stats 101
Rigorously Controlled Design
Carefully assign subjects to different treatment groups, so that those given each treatment are similar in ways that are important to the experiment.
Stats 101
sampling error
occurs when the sample has been selected with a random method, but there is discrepancy between a sample result and a true population result; such an error results from chance sample fluctuations
Stats 101
nonsampling error
the result of human error, including such factors as wrong data entries, computer errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances.
Stats 101
nonrandom sampling error
the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample.
Stats 101
which of the following is not a level of measurement:
ordinal, nominal, ratio, quantitative
Quantitative
Stats 101
favorite films:
choose the correct level of measurement:
ratio, interval, Ordinal, nominal
Nominal
Stats 101
When a Limo is randomly selected, it is found to have an engine with 116 hp
It is from a continuous data set because the number of possible values is infinite and not countable.