Research Flashcards
Name two measures of central tendency.
mean and median.
Write the equation for the mean.
x
Write the equation for the median.
x
Explain the components of the Box plot
x
Write the equation for variance.
x
What is variance?
It is the average squared distance from the mean.
Write the equation for standard deviation.
x
What is standard deviation?
It generally tells us how spread out the numbers - are they tightly clustered around the mean or are they far from the mean.
Why is standard deviation a more desirable measure of spread than variance?
Standard deviation is often a more desirable measure of spread than variance because we are left with non-squared units which may be easier to interpret. It generally tells us how spread out the numbers - are they tightly clustered around the mean or are they far from the mean.
In what category does covariance fall?
Measures of Association
What is covariance used for?
Describes how one sample varies with respect to another.
Write the equation for covariance.
x
What type of plot would be useful to visually estimate the amount of linear correlation of a dataset?
scatterplot
What is correlation used for?
Describes how one sample varies with respect to another.
Write the equation for correlation.
x
Why is correlation typically better than covariance?
The units of covariance makes it’s value difficult to interpret. Correlation statistics are generally more easy to interpret than covariance.
How do you interpret the results of correlation?
Values close to 1 are highly positively correlated which values close to -1 are highly negatively correlated. Values close to 0 show little to no correlation.
Qualitatively describe the correlation equation.
The covariance of the datasets divided by the standard deviation of both datasets.
What can a histogram be used for?
To understand how the data is distributed - IE normal, skewed…
Describe a histogram. How would you know if the data set followed a normal distribution?
A histogram is a frequency diagram which graphs the number of times a value (or range of values) has occurred. If the data is normally distributed then there is an increased frequency of events at the mean with decreasing frequencies with distance from the mean in either direction. It follows the classic “bell curve” pattern.
Name the steps in determining the probability of something occurring.
- Convert the data set into the standard normal distribution. (mean of 0 and standard deviation of 1)
- This is done by transforming the value in question into anomaly form. This gives us a z value which represents the number of standard deviations the value is away from the mean as compared to the normal distribution.
- Now we use a look up table. The look up table tells us the amount space within the normal curve is above or below the value in question. It is the amount of space which represents the probability.
Write the equation to convert a value into anomaly form. IE to get the z-score.
x
In what situations is the z score appropriate?
normally distributed data.
Describe the steps in finding a probability for a gamma distribution.
- Get the shape and scale parameters.
- Convert the distribution into a gamma variable with a scale parameter of 1.
- Now use a look up table