Me, Myself and I Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What can we find in the natural world?

A

Massive amounts of diversity within and between: plants, animals, bacteria, fungi and so on. To understand this diversity we will need to be able to quantifiably measure it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give examples of variations in humans.

A

Examples of variations within humans include:

  • cognitive ability
  • personality
  • physical appearance (body shape, skin color, etc.)
  • immunology
  • height
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can we do once we understand a set of data?

A

Once we can understand a set of data we can use it to make helpful and meaningful predictions. For example, trends in global warming and predict how it will progress and affect the climate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Code for a histogram on R.

A

You can simply make a histogram by using the hist() function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Code for a specific column of a dataset.

A

select only a specific column of a dataset, x, to make a histogram, you will have to use the hist() function with the dataset name (x) in combination with the $ sign, followed by the column name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you present data from a histogram on a finer scale?

A

One would use the hist() function with the “, breaks = n”. Like so:

hist(x, breaks = 50)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a sampling error?

A

Sampling error is the random variation introduced into a dataset as a function of only sampling a subset of the total population [or possible experiments].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would you obtain a numerical description of a datacolumn within a dataset? In this case the dataset will be named, “x” and the datacolumn in question will be named, “age”.

A

To obtain a numerical description of a datacolumn within a dataset one would use the function:

summary(x$age)

It will present the minimum value, the maximum value, the median value, the mean and the lower and upper quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How would you find the mean, median, max, min values of the “age” data column within dataset “x”?

A

You’d get the mean, median, max, min values of the “age” data column within dataset “x” with the following functions:

  • mean(x$age)
  • median(x$age)
  • max(x$age)
  • min(x$age)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a data population?

A

A data population is all members of a defined group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a sample?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a sampling bias?

A

In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How would you obtain a numerical description of 10 random values in the datacolumn of “age” from dataset “x”?

A

To obtain a numerical description of 10 random values in the datacolumn of “age” from dataset “x” you would use the function:

summary(sample(x$age, 10)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are humans different heights?

A

Humans are different heights for the following reasons:

  • genetics (nature): The main factor that influences a person’s height is their genetic makeup. However, many other factors can influence height during development, including nutrition, hormones, activity levels, and medical conditions. Scientists believe that genetic makeup, or DNA, is responsible for about 80% of a person’s height
  • environment (nurture): environmental factors such as nutrition and exercise can affect growth during development
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a factor in R?

A

Conceptually, factors are variables in R which take on a limited number of different values; such variables are often refered to as categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

GIve an example of what would class as a factor in R

A

An example of what would class as a factor in R would be sex for example. Male/Female.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How would you create a barchart to compare two datacolumns of “sex” in the dataset “x”?

A

To create a barchart in R you would use the function:

barplot(table(x$age))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How would you form a table in R?

A

You can form a table with the function:

table()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How would you form a barchart for “sex” based on proportions from dataset “x”?

A

You would form a barchart for sex based on proprtions by using the function:

prop.table(table(x$sex))

You can also make the proportions show on the barchart scale by nesting the above function into the barplot() function as so:

barplot(prop.table(table(x$sex)))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Null Hypothesis?

A

A Null Hypothesis isthe default expectation that there is no relationship between two measured phenomena, that categorical outcomes are all equally likely, or there is no association between groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is H1 (an alternative hypthesis)?

A

H1 is the expectation that there is a relationship between two measured phenomena, or an association among groups, or that categorical outcomes are not all equally likely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When can the chi squared test (X2) be used?

A

The chi squared test (X2) can be used where the observations are assigned into mutually exclusive classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Breakdown chi squared test (X2).

A

In the chi squared test (X2) the number of observations in each mutually exclusive class i.e. “male or female” are compared to those under the Null Hypothesis.

X2 = Σ(d2/e)

d (difference) = o (observed) - e (expected)

Σ = Sum of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How would one calculate the number of degrees of freedom?

A

The No. of degrees of freedom = (no. of rows - 1) x (no. columns -1)

if there is only one column, the degrees of freedom = No. of rows - 1 instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the degrees of freedom and why is it impossible for it be 0?

A

Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample. So if there were 0 independent values there wouldn’t be anything to test!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the p-value?

A

The p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How would you add purple colour in R?

A

You’d add purple colour in R with the function:

col = c(“purple”)

28
Q

How would you perform a chi squareed test for sex in dataset x in R?

A

You’d perform a chi squared test as follows:

chisq.test(table(x$sex))

29
Q

What should you be more inclined to do if the p value in a chi squared test is very small?

A

If p is very small, we are more inclined to believe that our data diverges from the null hypothesis. We are also more inclined to believe that an alternative hypothesis explains our data.

30
Q

How would you look at the height of males in sex data column of dataset x in R?

A

hist(x$sex ==”Male”)

To add women into the same graph you code:

hist(x$sex ==”Female”) , add TRUE

31
Q

What is the rgb function?

A

This function creates colors corresponding to the given intensities (between 0 and max) of the red, green and blue primaries. The colour specification refers to the standard sRGB colorspace (IEC standard 61966).

An alpha transparency value can also be specified (as an opacity, so 0 means fully transparent and max means opaque). If alpha is not specified, an opaque colour is generated.

The names argument may be used to provide names for the colors.

The values returned by these functions can be used with a col= specification in graphics functions or in par

32
Q

What is a type 1 and type 2 error?

A

In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis, while a type II error is the non-rejection of a false null hypothesis.

33
Q

What is a post hoc test?

A

In a scientific study, post hoc analysis consists of statistical analyses that were specified after the data were seen.

34
Q

What does the t-test determine?

A

The t-test determines whether the mean of one group is statistically different from the mean of another group.

35
Q

What is the t-test function in R?

A

The t-test function in R is as follows

t.test()

36
Q

What is the version of the t-test that R uses?

A

The version of the t-test that R uses is the Welch t-test.

37
Q

What is the t value in the t test?

A

The t-value measures the size of the difference relative to the variation in your sample data. Put another way, T is simply the calculated difference represented in units of standard error. The greater the magnitude of T, the greater the evidence against the null hypothesis.

38
Q

How would you plot two continuous variables?

A

You would plot two continuous variables on a line graph.

39
Q

What is the function for a scatter plot?

A

The function for a scatter plot is:

plot()

40
Q

What are explanatory and response variables?

A

The response variable is the focus of a question in a study or experiment. An explanatory variable is one that explains changes in that variable. It can be anything that might affect the response variable.

41
Q

What are residuals in a scatter plot with a line of best fit?

A

Residuals are the differences between the observed values and those predicted by the regression line.

42
Q

How can one determine the line of best fit using R?

A

To determine the line of best fit on R is as follows:

lm(response data ~ explanatory data)

43
Q

What does the abline function do?

A

Fits the line of best fit onto lm function:

abline(lm(response data ~ explanatory data)))

This data can be summarised by nesting it into the summary() function.

44
Q

What is the formula for a straight line on a graph?

A

The formula for a straight line on a graph is:

y = mx + c

45
Q

What is a linear regression model?

A

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables).

46
Q

What is the y intercept?

A

The y-intercept or vertical intercept is a point where the graph of a function or relation intersects the y-axis of the coordinate system. As such, these points satisfy x = 0.

47
Q

How is the gradient of a line on a graph calculated?

A

For a straight-line graph, pick two points on the graph. The gradient of the line = (change in y-coordinate)/(change in x-coordinate) . We can, of course, use this to find the equation of the line.

48
Q

What does the Welch’s and Student’s t-test assume?

A

Assumptions. Student’s t-test assumes that the two population(being compared) distributions are normally distributed with equal variance. Welch’s t-test is designed for unequal sample distribution variance, but the assumption of sample distribution normality is maintained.

49
Q

What is the variance?

A

Variance measures how far a set of data is spread out. A variance of zero indicates that all of the data values are identical. A high variance indicates that the data points are very spread out from the mean, and from one another. Variance is the average of the squared distances from each point to the mean.

50
Q

What is the R squared value?

A

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. … 100% indicates that the model explains all the variability of the response data around its mean.

51
Q

What is the adjusted R square Value?

A

Adjusted R-squared, a modified version of R-squared, adds precision and reliability by considering the impact of additional independent variables that tend to skew the results of R-squared measurements.

52
Q

What is the General Linear Model Overview?

A

GLM Overview

Data = Model + Error

Model ususally takes the format of:

Y = B + MX

M being the gradient of the regression line

X being the explanatory variable

Y being the response variable

B is essentially the y intercept/ The satrting point of the ab line.

53
Q

What is error?

A

Error doesn’t mean the data is incorrect, it is a deviation from the regression model. The data isn’t wrong, the model is.

54
Q

What does it mean if adjusted R squared value is negative?

A

Means there’s no association between the two variables.

55
Q

What is a biphasic response in a regression model?

A

A biphasic response is observed in a regression model when the response variable displays two different correlations with the explanatory variable.

56
Q

What does the R squared value tell you?

A

R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

57
Q

What is an ordinal scale?

A

Ordinal Scale is defined as a variable measurement scale used to simply depict the order of variables and not the difference between each of the variables. These scales are generally used to depict non-mathematical ideas such as frequency, satisfaction, happiness, a degree of pain, etc.

E.g. How was work today?

1 - Good

2 - Meh

3 - Shit

58
Q

What is variance? How is it calculated?

A

The variance is the average of the squared differences from the Mean.

  • Work out the Mean (the simple average of the numbers)
  • Then for each number: subtract the Mean and square the result (the squared difference).
  • Then work out the average of those squared differences.
59
Q

What is a 95% confidence interval?

A

A Confidence Interval is a range of values we are fairly sure our true value lies in.

The “95%” says that 95% of experiments like we just did will include the true mean, but 5% won’t.

So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true mean.

60
Q

What is a point estimate?

A

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a “best guess” or “best estimate” of an unknown population parameter.

61
Q

What is meta-analysis in research?

A

A meta-analysis is a statistical analysis that combines the results of multiple scientific studies.

62
Q

What is the Fisher’s exact test and how is it calculated?

A

Fisher’s exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes.

63
Q

What is a multivariate model?

A

The multivariate model is a popular statistical tool that uses multiple variables to forecast possible outcomes.

64
Q

What are the components of the ideal experiment?

A

The components of the ideal experiment:

  • Define your hypothesis
  • Design an experiment to collect the data
  • Design your statistical testing plan in advance
  • Conduct your experiment
  • Analyse the data
  • Publish the results (positive or negative)
  • [conduct any post hoc analyses]
  • Formulate new hypotheses.
65
Q

What is cross sectional and longitudinal research?

A

A longitudinal study requires a researcher to revisit participants of the study at proper intervals. Cross-sectional study is conducted with different samples. Longitudinal study is conducted with the same sample over the years. Cross-sectional studies cannot pin down cause-and-effect relationship.

66
Q
A