Statistics I Flashcards
2 Types of Statistics
Descriptive Statics
Inferential statistics
what is descriptive statistics?
describe how many observations were recorded and how frequently each score or category of observations occurred in the data
What is inferential statistics?
Show cause-and-effect relationships and test scientific hypotheses or theories
4 Types of Data/Variable
Nominal Data
Ordinal Data
Interval Data
Ratio Data
What is nominal Data
Data that have names or arbitrary numeric assignments
For example: a participant’s state of residence, gender, yes or no responses (yes/no are considered binomial variables, they only have two responses)
What is Ordinal Data
Data that can be arranged in ascending or descending order
For example: highest level of education, likert style survey questions (disagree/neutral/ agree).
What is interval Data?
Data with no true zero. Not a frequently used scale
For example: evaluation of change in participants’ total cholesterol over time, with a score assigned based on the difference. The difference between 200 and 250 mg/dL is the same as that between 150 and 200 mg/dL.
What is ratio data?
Expresses the proportion of the difference between measured values. The numbers on a scale with a meaningful zero. This type of data is used widely in nutrition research
For example: blood pressure, weight, height, total cholesterol. In continuing with the total cholesterol example, a 50mg/dl increase from 200 to 250 mg/dL would be a 25% increase. If the total cholesterol started at 150mg/dL, the increase to 200 mg/dL would be a 33% increase.
What are the measures of central tendency?
Mean, median, mode
Mean
Calculation of the mean is one of the most commonly used statistics in nutrition research
The mean is determined by summing the values for all observations and dividing by the total number of observations- the average.
The mean is simply the sum of all values divided by the number of values in a sample.
5+5+8/3=6
Median
The median is the middle value when al data are placed in ascending/descending order. This means that there are the same number of values that are greater than the median as are less than the median.
The median, unlike the mean, is not affected by extremely large or small values.
When there are an even number of observation, we average the two middle values to get the median.
2 4 6 9 10=6
Mode
The mode is the number that occurs the most often in a set of data. The most frequently occurring value in set of observations.
Similar to median, mode is not affected by extermely large or small values
Sometimes there are two (or more) modes. When ther are two modes, the data is said to be bi-modal.
2 3 6 6 8 9 10=6
Percentile
A percentile provides information about how data are spread over an interval from the smalles value to the largest value. It indicates what percentage of a sample was measured below or above a given value.
Admission test scores for colleges and universities are frequently reported in terms of percentiles (eg. You will all score at or above the 95th percentile on the RD exam). BMI and growth charts for children are also reported in terms of percentiles.
Quartiles
Values that split the data set into 4 equally sized parts
First (lower) quartile = 0 - 25th percentile
Second (lower-middle) quartile = 26 – 50th percentile
Third (upper-middle) quartile = 51 - 75th percentile
Fourth (upper) quartile = 76 – 100th Percentile
Splitting a dataset into quartiles for meaningful results requires larger sets of data. Quartiles are often then used to compare the first to the other 3 quartiles or vise versa and fourth compared to the other 3 quartiles.
Measures of Variability (6)
Range Interquartile range variance standard deviation standard error coefficient of variation
Range
The range of a data set is the difference between the largest and smallest data values (i.e. its span). Thus, by subtracting the lowest value in a set of observations from the highest value, you derive the range.
Range is the simplest measure of variability.
Range is very sensitive to the smallest and largest data values.
Age range is a very common statistics seen in human research. You might find something like this in the literature:
Subjects age ranged from 50 – 78 years. This gives us an idea of the sample characteristics, middle and older adults. However, just stating the age range was 28 years, isn’t very helpful
Interquartile Range
The interquartile range (IQR) of a data set is the difference between the 4th quartile and the 1st quartile.
The IQR is the range for the middle 50% of the data
IQR overcomes the sensitivity to extreme data values that is present in variability range when examining a complete data set.
Variance
Variance is the measure of variability that utilizes all of the data, determining the dispersion around the expected value.
Variance is based on the difference between the value of each observation (xi) and the mean (x for a sample, u for a group or population).
How to calculate Variance?
Separately subtract the mean from each value and square this difference
Sum these values
Divide by the total number of measures -1.
Standard Deviation
The SD of a data set is the positive square root of the variance
SD is measured in the same units as the data making it more easily comparable, relative to the variance
If the data set is a sample, the SD is denoted as s.
If the data se tis a population, the SD is denotes as σ (sigma
Standard Error (SE)
SE is used to describe the estimated standard deviation for a sampling distribution. It is the value most often presented in research articles and is often refereed to as the Standard Error of the Mean (SEM)
SEM is calculated as the square root of the variance divided by sample size.