Statistics Flashcards
statistics
science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions
data
information that is collected
population
the entire group of objects being studied
individual
a single member of the population
sample
a subgroup of a population on which data is actually collected
a statistic
a numerical summary of a sample
parameter
a numerical summary of a population
inferential statistics
statistics that are being used as a parameter
The process of statistics
1) identify the research objective
2) collect the necessary data
3) describe the data
4) perform inference
variable
characteristics of individuals that have different possibilities
Qualitative Variable
non numerical traits of an individual
Quantitative Variable
numerical variable
Observational Study
researcher does not control any of the variables being studies
Designed Experiments
Researcher controls a variable on an individual basis
Confounding
occurs when the effects of multiple variables are not separated
lurking variable
variable that has an impact on the goal of a study but is not considered
Observational: Cross-sectional
when data is collected over a short window of time; at a moment
Observational: case-control
survey about the past, then use the data to split the sample into groups, then another survey is given
Observational: cohort
First choose the sample, then observe over a period of time before collecting data
census
list of individuals in a population along with data about certain characteristics of each individuals
random sampling
the process of using chance to create a sample
frame
a list of all individuals in a population with no info
simple random sampling
is random sampling where every possible sample of a chosen size has an equal chance of occuring
stratisfied sampling
separate population into non overlapping groups called strata, take a simple random sample from each strata, then use the individuals from each group to create sample
cluster
break population into groups called clusters, use simple random sampling to select clusters and study all individuals in the selected clusters
systematic sampling
put the population in order, the choose the kth individual
convenience sampling
self-selected
bias
sample whose results do not represent the entire population
sampling bias
when technique is used to create the sample naturally results in bias ex: unrepresented
non-response bias
occurs when individuals who choose not to respond have different opinions than those who do respond
response bias
occurs when answers given do not accurately represent the individuals
causes of response bias
misrepresented answers: lied/don’t remember
interviewer error: how they portray themselves/ask question
wording of question: positive/negative connotation
ordering of question: answer untruthfully bc of hypocrisy
frequency distribution table
provides all possible values of a variable and how many times that value occured
frequency
occurrences
relative frequency distribution
a table including all values of a variable and their relative frequencies
bar graphs
put each category of data value on one axis and frequency on other SPACES LEFT
Pareto chart
bar graph where the bars are put in descending order
side by side bar graph
bar graph where multiple studies are represented by their own colors
pie chart
circle broken into sectors which represent a percent of the area corresponding to relative frequencies of variable values
histograms
bar charts where rectangles have the same widths TOUCH
lower class limit
smallest value in class
upper class limit
largest value in class
tables for continuous variables
create classes which are intervals of values
class width
difference between consecutive lower class limits goals: needs to be large enough to have groups with significant freq and small enough to avoid all data to be in 1/2 groups
stem and leaf plot
for any number the leaf is the right most digit and stem is the number leftover if you erase the leaf
dot plots
place dots above each category for each piece of data
shape: uniform
every category has the same frequency
shape: bell shaped
one highest “peak” frequency in the middle of the data and freq gets smaller as you get further from the peak