Physics #12 Flashcards
measures of central tendency
those that describe the middle of a sample
mean
adding up all of the values in the data set and dividing the result by the number of values
outlier
an extremely large or extremely small value compared to the other data values
can shift the mean in one way
median position calculation
(n+1)/2
when is using a mean not helpful
when there is an outlier
when is using a median not helpful
when there is a large range or multiple modes (good for data sets with outliers though)
if the median and mean are far from each other, this implies the presence of a ______
outlier
if the mean and median are close, this implies a ______ distribution
symmetrical
describe the values around the median
50% above and 50% below it
mode
the number that appears most often in a set of data
can there be multiple modes or no modes?
yes
the mean is best for data with a ____ distribution
normal
what is unique about the normal distribution and central tendencies?
in the normal distribution, all of the measures of central tendency are the same
what is the mean and standard deviation of a standard distrubtion?
mean is 0 and sd is 1
a ____ distribution is one that contains a tail on one side of the data set
skewed
positive skew vs. negative skew
positive skew has tail to the right and mean greater than the median
where is the tail for a right skew distribution
to the right
bimodal distribution
a distribution containing two peaks with a valley in between.
can be measured as two different distributions if there is so little data in the valley.
can data that do not follow a normal distribution be analyzed with measures of central tendency and measures of distribution?
yes
range of data set
distance between largest and smallest values
when you do not have a complete data set how do you approximate the standard deviation of a normal distribution?
1/4 of the range
interquartile range
related to the median, first, and third quartiles
what is the median in quartile speak
Q2
quartiles
divide data in ascending order into groups that comprise 1/4 of the data set
how to calculate a quartile
first quartile: multiply n by 1/4
if whole number, take mean between that whole number and the next and that is the quartile
if decimal, round up to next whole number and take that as quartile position
same for 3rd quartile except multiply by 3/4
how to calculate interquartile range
Q3-Q1
how to calculate outlier with interquartile range
an outlier is 1.5x the interquartile range above the 3rd quartile or below the 1st quartile
how is standard deviation calculated
difference between each data point and the mean, squaring it, summing all of these. Then divide by the number of points in the data set minus 1 and then taking square root of everything.
how to calculate an outlier using standard deviation
more than 3 standard deviations from the mean
the average distance from the mean will always be ____
0
independent events
have no effect on one another
can occur in any order without impacting each other
probabilities not expected to change
dependent events
the probability of the second event is dependent on the first event
mutually exclusive outcomes
cannot occur at the same time
one cannot flip both heads and tails in one throw
what is the probability of two mutually exclusive outcomes occurring?
0%
exhaustive
a group of outcomes is exhaustive if there are no other possible outcomes
Ex: flipping a coin be heads or tails. These are the only two possibilities.
if events are mutually exclusive, the probability of both of them happening is gotten by ____
multiplying their probabilities together
what is the probability of having at least one boy in 10 live births if boy or girl is 50/50
99.9%
null hypothesis
hypothesis of equivalence
two groups are statistically equal
directional vs. nondirectional hypothesis
nondirectional: the populations are not equal
directional: mean of population A is greater than mean of population B
what type of distribution do t-tests rely on?
standard
test statistic
calculated and compared to a table to determine the likelihood that that statistic was obtained by random chance. This likelihood is our p-value
p-value
from a table, the likelihood that the statistic was obtained by random chance
the p-value is compared to a ______
significance level (usually 0.05)
if p-value is greater than alpha, then we _____
fail to reject the null hypothesis, not a statistically significant difference between the two populations.
type I error
the likelihood that we report a difference between two populations when one does not actually exist
type II error
when we incorrectly fail to reject the null hypothesis. The likelihood we report no difference between two populations when one actually exists
power
the probability of correctly rejecting a false null hypothesis
confidence
the probability of correctly failing to reject a true null hypothesis.
how to produce confidence intervals
95% confidence level, find z- or t-score and multiply this by standard deviation before adding and subtracting from the mean to get a range of values
what does a pie chart represent
relative amounts of entities
popular in demographics
cons to pie chart
when there are so many categories the graph becomes incoherent
bar charts
used for categorical data, which sort data points based on predetermined categories
histograms
numerical data
good for mode, showing distribution of data
what does a box plot show
range, median, quartiles, and outliers for a set of data
also called a box and whisker plot
how. is a box and whisker plot laid out?
box is around Q1 to Q3, the median is a line in the middle. Whiskers extend to max and min values of the data set. Outliers can also be dots with the whiskers the closest point within 1.5 IQR
maps organize data _____
geographically
do linear graphs have to be linear?
no, could be linear, parabolic, exponential, or logarithmic
linear graphs
show the relationship between two variables
axes will be consistent that each until will occupy the same amount of space
semilog graphs
one axis is exponential (uses a ratio) and the other is linear
log-log graphs
both axes use ratios
what is difficult of map representations of data?
may only be able to look at 2 variables max
positive vs. negative correlation
positive: as one goes up, the other goes up
negative: as one goes up, the other goes down
correlation
connection between two variables
correlation coefficient
indicates the strength of a relationship
what does a correlational coefficient of 0 mean?
no apparent relationship
what is the only one of Hill’s criteria uniformly required for causation?
temporality
all variables that are causally related must be correlationally related. True or false?
true
if all numbers are the same, what is the mode?
no mode
the standard distribution is the normal distribution shifted so _____
the mean is 0 and standard deviation is 1
just cause there is statistical significance, should there be clinical significance
not necessarily, if it does not benefit anyone.
linear graph if ______ scale throughout
same (addition)
if a study has low power what does this mean
it is more difficult to get results that are statistically significant