Exam 1 Flashcards
parameter
characteristic of a population
statistic
characteristic of a sample
descriptive statistics
collection, organization, summarization, presentation of data
inferential statistics
generalizing from samples to populations, estimatinons, hypothesis testing, determining relationships, making predictions
inferential stats based on _________
probability
qualitative
variables can be placed into distinct categories according to some characteristic
quantitative
variables are numerical and can be ordered/ranked
2 types of quantitative data
discrete
continuous
nominal data
mutually exclusive, exhausting categories which cannot be ordered/ranked
ordinal data
categories with no precise differences which can be ranked
interval data
data is ranked and precise differences exist, but there is no meaningful 0, so ratios are meaningless
ratio data
data is ranked and precise differences exist, and ratios are meaningful because there is a meaningful 0
systematic sampling
every kth subject
stratified sampling
divide population into layers and sample from each
cluster sampling
sample from existing groups
sample population
portion of target population accessible for sampling from
we take the sample from the sample pop.
problem with simple random sampling
can give a nonrepresentative sample
2 stages of cluster sampling
- randomly select clusters
- from clusters, randomly select subjects
3 nonsampling errors
nonresponse
response error
selection bias
frequency distribution
organization of raw data in table form, using classes as frequencies
categorical distribution is for … data
nominal
grouped distribution is for…. data
data with a large range requiring classes several units in width
ungrouped distribution is for …. data
numerical data with a small range
steps for constructing a frequency distribution
- make categories
- count/tally
- find frequency
- find relative frequency
relative frequency =
frequency / total
all categorical data can be represented by a ….
bar graph
class boundaries
used to separate classes
**
class rules
- limits should have same decimal place value as data
- boundaries should have one additional place and end in 5
- width must be the same for every class
- width should be an odd number
- there should be 5-20 classes
- classes should be exhaustive, mututally exclusive, continuous
class width =
range / # of classes
find class midpoint by…
taking avg of 2 class boundaries
steps to construct a grouped distribution
- determine classes and class width
- sort data into classes
- find frequencies
- find cumulative frequencies
uses class bounderies (x ax) and frequencies (y ax) to give a “bar graph” with bars that cannot be rearranged
histogram
uses lines to connect points plotted at the midpoint of each class
polygon
a polygon is anchored…
at the x ax before and after the data
uses lines that connect points plotted at the cumulative frequency of each class
ogive
an ogive has…. on the x ax
upper class boundaries
graph using proportions
relative frequency graph
used for categorical variables; bars arranged highest to lowest
Pareto chart
represents data that occur over a period of time
time series graph
bell shape
uniform shape
right skewed shape
left skewed shape
bimodal shape
weighted mean formula
table for grouped data’s mean
mean formula
A: class
B: frequency, f
C: midpoint, Xm
D: f(Xm)
mean = ΣD/n
mean for grouped data is ……
approximate
find median for an odd n
(n + 1)/2
find median for an even n
find mean of n/2 and (n+1)/2
measures of central tendency for right skew
mode < median < mean
measures of central tendency for left skew
mean < median < mode
variance
average deviation squared
s^2
sample variance
standard deviation
square root of variance
variance and std dev used to determine …… of a variable
consistency
steps to find std dev
- find mean
- find deviation of each value: x - mean
- square each deviation
- find sum of the squares
- divide by N or n-1
standard deviation =
unbiased estimate is …….
and is used to…
n - 1
compensate for the underestimation of population variance given by n alone
coefficient of variation =
100 (s/x)
range rule of thumb
s = R/4
range rule of thumb works when…
data is unimodal and approximately symmetric
chebyshev’s formula
chebyshev’s theorem states…
proportion of values from a data set that fall within k standard deviations of the mean will be at least 1 - 1/k^2 where k > 1
% for k = 2
75%
% for k = 3
88.9%
% for k = 4
93.8%
empirical rule applies to ….. distributions
normal
empirical rule
1 s:
2 s:
3 s:
1 s: 68% inside, 16% on either side
2 s: 95% inside, 2.5% on either side
3 s: 99.7% inside, 0.15% on either side
standard/z score definition
unitless measure expressing how many s above or below the mean an observation is
z score used when…
raw data can’t be directly compared
z =
percentiles used in…
education, healthcare
percentile indicates…
position of an individual in a group
P is… such that….
P is an integer between 1 and 99 such that the Pth percentile is a value where P% of the data is less than or equal to the value
percentile =
[(values below x) + 0.5 / total # of values ]100
the cth value corresponds to the Pth percentile formula
c = nP/100
if c is a decimal…
round up
if c is a whole….
take avg of cth and (c+1)th values
Q1 = __th percentile
25th
Q3 = __th percentile
75th
steps to find quartiles
- arrange data in order
- find median, Q2
- Q1 = median of first half
- Q3 = median of second half
5-number summary
min
max
Q1
median
Q3
hypothesis testing
decision-making process for evaluating claims about a population, based on information from samples
how to find outlier interval
Q1 - IQR(1.5)
and
Q3 + IQR(1.5)
5 number summary
min
Q1
median
Q3
max