Me, Myself and I Flashcards

Question

What are the degrees of freedom and why is it impossible for it be 0?

Answer 1

Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample. So if there were 0 independent values there wouldn't be anything to test!

Answer 2

The *p-value* is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

Answer 3

You'd add purple colour in R with the function: col = c("purple")

Answer 4

You'd perform a chi squared test as follows: chisq.test(table(x$sex))

Answer 5

If p is very small, we are more inclined to believe that our data diverges from the null hypothesis. We are also more inclined to believe that an alternative hypothesis explains our data.

Answer 6

hist(x$sex =="Male") To add women into the same graph you code: hist(x$sex =="Female") , add TRUE

Answer 7

This function creates colors corresponding to the given intensities (between 0 and max) of the red, green and blue primaries. The colour specification refers to the standard sRGB colorspace (IEC standard 61966). An alpha transparency value can also be specified (as an opacity, so 0 means fully transparent and max means opaque). If alpha is not specified, an opaque colour is generated. The names argument may be used to provide names for the colors. The values returned by these functions can be used with a col= specification in graphics functions or in par

Answer 8

In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis, while a type II error is the non-rejection of a false null hypothesis.

Answer 9

In a scientific study, post hoc analysis consists of statistical analyses that were specified after the data were seen.

Answer 10

The t-test determines whether the mean of one group is statistically different from the mean of another group.

Answer 11

The t-test function in R is as follows t.test()

Answer 12

The version of the t-test that R uses is the Welch t-test.

Answer 13

The t-value measures the size of the difference relative to the variation in your sample data. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T, the greater the evidence against the null hypothesis.

Answer 14

You would plot two continuous variables on a line graph.

Answer 15

The function for a scatter plot is: plot()

Answer 16

The response variable is the focus of a question in a study or experiment. An explanatory variable is one that explains changes in that variable. It can be anything that might affect the response variable.

Answer 17

Residuals are the differences between the observed values and those predicted by the regression line.

Answer 18

To determine the line of best fit on R is as follows: lm(response data ~ explanatory data)

Answer 19

Fits the line of best fit onto lm function: abline(lm(response data ~ explanatory data))) This data can be summarised by nesting it into the summary() function.

Answer 20

The formula for a straight line on a graph is: y = mx + c

Answer 21

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).

Answer 22

The y-intercept or vertical intercept is a point where the graph of a function or relation intersects the y-axis of the coordinate system. As such, these points satisfy x = 0.

Answer 23

For a straight-line graph, pick two points on the graph. The gradient of the line = (change in y-coordinate)/(change in x-coordinate) . We can, of course, use this to find the equation of the line.

Answer 24

Assumptions. Student's t-test assumes that the two population(being compared) distributions are normally distributed with equal variance. Welch's t-test is designed for unequal sample distribution variance, but the assumption of sample distribution normality is maintained.

Answer 25

Variance measures how far a set of data is spread out. A variance of zero indicates that all of the data values are identical. A high variance indicates that the data points are very spread out from the mean, and from one another. Variance is the average of the squared distances from each point to the mean.

Answer 26

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. ... 100% indicates that the model explains all the variability of the response data around its mean.

Answer 27

Adjusted R-squared, a modified version of R-squared, adds precision and reliability by considering the impact of additional independent variables that tend to skew the results of R-squared measurements.

Answer 28

_GLM Overview_ Data = Model + Error Model ususally takes the format of: Y = B + MX M being the gradient of the regression line X being the explanatory variable Y being the response variable B is essentially the y intercept/ The satrting point of the ab line.

Answer 29

Error doesn't mean the data is incorrect, it is a deviation from the regression model. The data isn't wrong, the model is.

Answer 30

Means there's no association between the two variables.

Answer 31

A biphasic response is observed in a regression model when the response variable displays two different correlations with the explanatory variable.

Answer 32

R-squared (R²) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

Answer 33

Ordinal Scale is defined as a variable measurement scale used to simply depict the order of variables and not the difference between each of the variables. These scales are generally used to depict non-mathematical ideas such as frequency, satisfaction, happiness, a degree of pain, etc. E.g. How was work today? 1 - Good 2 - Meh 3 - Shit

Answer 34

The variance is the average of the squared differences from the Mean. * Work out the Mean (the simple average of the numbers) * Then for each number: subtract the Mean and square the result (the squared difference). * Then work out the average of those squared differences.

Answer 35

A Confidence Interval is a range of values we are fairly sure our true value lies in. The "95%" says that 95% of experiments like we just did will include the true mean, but 5% won't. So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true mean.

Answer 36

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter.

Answer 37

A meta-analysis is a statistical analysis that combines the results of multiple scientific studies.

Answer 38

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes.

Answer 39

The multivariate model is a popular statistical tool that uses multiple variables to forecast possible outcomes.

Answer 40

The components of the ideal experiment: * Define your hypothesis * Design an experiment to collect the data * Design your statistical testing plan in advance * Conduct your experiment * Analyse the data * Publish the results (positive or negative) * [conduct any post hoc analyses] * Formulate new hypotheses.

Answer 41

A longitudinal study requires a researcher to revisit participants of the study at proper intervals. Cross-sectional study is conducted with different samples. Longitudinal study is conducted with the same sample over the years. Cross-sectional studies cannot pin down cause-and-effect relationship.

Me, Myself and I Flashcards

(66 cards)