Research Flashcards by Robert Rickey

Name two measures of central tendency.

mean and median.

How well did you know this?

Not at all

Perfectly

Write the equation for the mean.

How well did you know this?

Not at all

Perfectly

Write the equation for the median.

How well did you know this?

Not at all

Perfectly

Explain the components of the Box plot

How well did you know this?

Not at all

Perfectly

Write the equation for variance.

How well did you know this?

Not at all

Perfectly

What is variance?

It is the average squared distance from the mean.

How well did you know this?

Not at all

Perfectly

Write the equation for standard deviation.

How well did you know this?

Not at all

Perfectly

What is standard deviation?

It generally tells us how spread out the numbers - are they tightly clustered around the mean or are they far from the mean.

How well did you know this?

Not at all

Perfectly

Why is standard deviation a more desirable measure of spread than variance?

Standard deviation is often a more desirable measure of spread than variance because we are left with non-squared units which may be easier to interpret. It generally tells us how spread out the numbers - are they tightly clustered around the mean or are they far from the mean.

How well did you know this?

Not at all

Perfectly

In what category does covariance fall?

Measures of Association

How well did you know this?

Not at all

Perfectly

What is covariance used for?

Describes how one sample varies with respect to another.

How well did you know this?

Not at all

Perfectly

Write the equation for covariance.

How well did you know this?

Not at all

Perfectly

What type of plot would be useful to visually estimate the amount of linear correlation of a dataset?

scatterplot

How well did you know this?

Not at all

Perfectly

What is correlation used for?

Describes how one sample varies with respect to another.

How well did you know this?

Not at all

Perfectly

Write the equation for correlation.

How well did you know this?

Not at all

Perfectly

Why is correlation typically better than covariance?

The units of covariance makes it’s value difficult to interpret. Correlation statistics are generally more easy to interpret than covariance.

How well did you know this?

Not at all

Perfectly

How do you interpret the results of correlation?

Values close to 1 are highly positively correlated which values close to -1 are highly negatively correlated. Values close to 0 show little to no correlation.

How well did you know this?

Not at all

Perfectly

Qualitatively describe the correlation equation.

The covariance of the datasets divided by the standard deviation of both datasets.

How well did you know this?

Not at all

Perfectly

What can a histogram be used for?

To understand how the data is distributed - IE normal, skewed…

How well did you know this?

Not at all

Perfectly

Describe a histogram. How would you know if the data set followed a normal distribution?

A histogram is a frequency diagram which graphs the number of times a value (or range of values) has occurred. If the data is normally distributed then there is an increased frequency of events at the mean with decreasing frequencies with distance from the mean in either direction. It follows the classic “bell curve” pattern.

How well did you know this?

Not at all

Perfectly

Name the steps in determining the probability of something occurring.

Convert the data set into the standard normal distribution. (mean of 0 and standard deviation of 1)
This is done by transforming the value in question into anomaly form. This gives us a z value which represents the number of standard deviations the value is away from the mean as compared to the normal distribution.
Now we use a look up table. The look up table tells us the amount space within the normal curve is above or below the value in question. It is the amount of space which represents the probability.

Write the equation to convert a value into anomaly form. IE to get the z-score.

In what situations is the z score appropriate?

normally distributed data.

Describe the steps in finding a probability for a gamma distribution.

Get the shape and scale parameters.
Convert the distribution into a gamma variable with a scale parameter of 1.
Now use a look up table

What two new variables are needed to calculate a probability if the data is a gamma distribution?

shape and scale parameters.

Write the equation for the shape parameter.

Write the equation for the scale parameter.

Write the equation for the gamma variable

Qualitatively describe how to calc. the scale parameter.

the sample standard deviation squared divided by the mean of the dataset.

Qualitatively describe how to calc. the shape parameter.

the sample mean squared divided by the sample standard deviation squared

Qualitatively describe how to calc. the gamma variable.

The gamma variable is the value in question divided by the scale parameter.

What is the appropriate test statistic if we are testing a mean over a range?

t test.

List the general steps used to do hypothesis testing.

1. Identify the test statistic. 2. Define the null hypothesis. 3. Define the alternative hypothesis. 4. Determine the null distribution. This is the sample distribution of the test statistic if the null hypothesis is true. 5. Compare the test statistic to the null distribution to determine we should reject the null hypothesis.

What is the null hypothesis?

The null hypothesis is what we are testing. It is what could be rejected.

What is the equation to calculate the t test statistic?

Using the tornado data from the last lecture, lets determine if the average number of tornadoes in New York is more than 3 per year (α = 0.05) Follow the steps for hypothesis testing.

Choose a test statistic, here we are looking at an average over a range, so we should use a t test! Note our rejection level is 0.05, so we will have a 95% confidence in our test! Choose a null hypothesis; here, our null hypothesis should be (Ho: x < 3) Choose an alternative hypothesis: (Ho: x ≥ 3) Determine the null distribution; here, we need to set μo = 3, since we’re testing that value We finally need to compute the test statistic and compare its position on the null distribution. This is done by using the t-statistic and the degrees of freedom to get a p-value. Finally, we compare the p value to the alpha value to decide whether of not to reject the null hypothesis. In this case, if the p value is less than .05, then we could reject the null hypothesis. If the p value is greater than .05, then we could not reject the null hypothesis.

write the equation for degrees of freedom.

df = 1 - n

When do you use the t-statistic?

This is used when the number of samples is less than 30.

When do you use the z-statistic?

This is used when the number of samples is greater than 30.

Write the equation to get the slope when creating a linear regression.

Write the equation to get the intercept for a linear regression.

What form will the linear regression take?

y = B1x+B0

List the values that are helpful in identifying the usefulness of a linear regression.

Examine the Residuals, SSR (Sum of Squares of the Regression), SSE (Sum of Squares of the Residuals), F stat, R^2

What is a residual

Difference between the actual and predicted values.

What is SSR and do we want high or low values?

Sum of Squares of the Regression - High Values

What is SSE and do we want high or low values?

Sum of Squares of the Residuals - Low Values

How are SSR and SSE related?

SST = SSR + SSE

How do we get the F value?

F = MSR/MSE Mean regression sum of squares divided my mean residual sum of squares.

What is R^2

R^2 tells us about how much variability the predictor explains. A value of 1 would mean that the predictor explains all of the variability in the predictand. A value of 0 would mean the the predictor explains none of the variability in the predictand.

What does the equation for a mutivariate linear regression look like?

y = B0 + B1*X + B2*X +B3*X... | Where B0 is the intercept and B1...B2...BX is the slope for each variable.