STATISTICS Flashcards
Mean
total / # of data points
most useful when data set does not have outliers or is not skewed to one extreme
Affected by extreme values or outliers
Median
Middle Data Point
more helpful when there are outliers or data is skewed
Not as affected by extreme values or outliers
if there are two middle data points, the average of those two values is the median.
Mode
Most occurring value in a data set
is possible for there to be more than one mode or no modes
use mode when the data are non-numeric or when asked to choose the most popular item (For example, you
ask 100 people “what is their favorite color?” The Mode would more useful than mean or median.
Range
Largest value - smallest value = range
Normal Distribution
When the data is equally distributed and has little to no skew.
When the mean, median, and mode are equal or nearly equal
Positively Skewed
Mean being pulled by outliers in the positive direction
Negatively Skewed
Mean being pulled by outliers in the negative direction
Outlier
a point (value) that lies an abnormal distance from other points (values)
could also mean a point that doesn’t follow a trend in the data
can distort mean
Accuracy
how close a measured value is to the actual (true) value
Precision
how close the measured values are to each other
Random Errors
Fluctuations of data in either direction
can be overcome by taking more data
could have accuracy but never have precision
Systematic Errors
All of the data is off in THE SAME WAY
has precision but no accuracy
Rounding
rounded value should not go beyond the precision of instrument that was used to record
Percent Change
the amount of change relative to the original value
((Final – Initial) / Initial) * 100
Absolute Average Deviation (AAD) (MAD)
the average distance between each data value and the mean
To Find AAD:
1. find the mean (average)
2. find the difference between each data value and the mean
3. take the absolute value of each difference
4. find the mean (average) of these differences
Percent AAD
Quantifies the AAD relative to the mean.
(AAD / Mean) * 100
T-Test
Assesses if the observed difference between means is due to chance or due to the independent variable
p-value (in T-Test)
the probability (in percentage) that the observed difference in the means is not due to the the independent variable, but it is due to chance
The lower the p-value, the more likely that the observed difference is a result of the independent variable and not due to chance
Scientists generally agree that a p=value of ____ or lower is low enough
to accept that the observed difference is not due to ____. This Also Disproves the ______.
0.05, chance, null hypothesis
Null Hypothesis
a hypothesis that says there is no statistical significance between the two variables in the hypothesis
THE HYPOTHESIS THAT THE RESEARCHER IS TRYING TO DISPROVE
When to use bar graph
Data is Discrete - this is data that can be counted and has a finite number of values. These values must be able to fall within certain classifications and are unable to be broken down into smaller parts
Examples:
The number of employees in your department
The number of new customers you signed on last quarter
When to use line graph
Data is continuous - data has values that are not fixed and have an infinite number of possible values. These measurements can also be broken down into smaller individual parts.
Examples:
The height or weight of a person
The daily temperature in your city
Elements of a Proper Graph
- Title
- proper style (bar, line, pie)
- labeled axis (including units)
- key: indicating independent variables (for example
naming each of the bars by color) - line of best fit if appropriate, error bars if it is a bar graph with means
- Caption
Caption of a Graph
Describes the graph or table in enough detail for the graph or table to be understood in isolation for the text.
must include:
1. whether the graph shows raw data or means
2. Identifying error bars as AAD
3. sample size
4. number of replications
Example (see photo):
Rats were tested on their mass based on whether or not the received growth hormones. Data shown as mean +/- AAD. Sample Size = 6 Repetitions = 1.
Factors that influence P-Value
- Distance between the means
- Spread of the data
- Sample size