Me, Myself and I Flashcards
What can we find in the natural world?
Massive amounts of diversity within and between: plants, animals, bacteria, fungi and so on. To understand this diversity we will need to be able to quantifiably measure it.
Give examples of variations in humans.
Examples of variations within humans include:
- cognitive ability
- personality
- physical appearance (body shape, skin color, etc.)
- immunology
- height
What can we do once we understand a set of data?
Once we can understand a set of data we can use it to make helpful and meaningful predictions. For example, trends in global warming and predict how it will progress and affect the climate.
Code for a histogram on R.
You can simply make a histogram by using the hist() function.
Code for a specific column of a dataset.
select only a specific column of a dataset, x, to make a histogram, you will have to use the hist() function with the dataset name (x) in combination with the $ sign, followed by the column name
How would you present data from a histogram on a finer scale?
One would use the hist() function with the “, breaks = n”. Like so:
hist(x, breaks = 50)
What is a sampling error?
Sampling error is the random variation introduced into a dataset as a function of only sampling a subset of the total population [or possible experiments].
How would you obtain a numerical description of a datacolumn within a dataset? In this case the dataset will be named, “x” and the datacolumn in question will be named, “age”.
To obtain a numerical description of a datacolumn within a dataset one would use the function:
summary(x$age)
It will present the minimum value, the maximum value, the median value, the mean and the lower and upper quartiles.
How would you find the mean, median, max, min values of the “age” data column within dataset “x”?
You’d get the mean, median, max, min values of the “age” data column within dataset “x” with the following functions:
- mean(x$age)
- median(x$age)
- max(x$age)
- min(x$age)
What is a data population?
A data population is all members of a defined group.
What is a sample?
What is a sampling bias?
In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others.
How would you obtain a numerical description of 10 random values in the datacolumn of “age” from dataset “x”?
To obtain a numerical description of 10 random values in the datacolumn of “age” from dataset “x” you would use the function:
summary(sample(x$age, 10)).
Why are humans different heights?
Humans are different heights for the following reasons:
- genetics (nature): The main factor that influences a person’s height is their genetic makeup. However, many other factors can influence height during development, including nutrition, hormones, activity levels, and medical conditions. Scientists believe that genetic makeup, or DNA, is responsible for about 80% of a person’s height
- environment (nurture): environmental factors such as nutrition and exercise can affect growth during development
What is a factor in R?
Conceptually, factors are variables in R which take on a limited number of different values; such variables are often refered to as categorical variables.
GIve an example of what would class as a factor in R
An example of what would class as a factor in R would be sex for example. Male/Female.
How would you create a barchart to compare two datacolumns of “sex” in the dataset “x”?
To create a barchart in R you would use the function:
barplot(table(x$age))
How would you form a table in R?
You can form a table with the function:
table()
How would you form a barchart for “sex” based on proportions from dataset “x”?
You would form a barchart for sex based on proprtions by using the function:
prop.table(table(x$sex))
You can also make the proportions show on the barchart scale by nesting the above function into the barplot() function as so:
barplot(prop.table(table(x$sex)))
What is the Null Hypothesis?
A Null Hypothesis isthe default expectation that there is no relationship between two measured phenomena, that categorical outcomes are all equally likely, or there is no association between groups.
What is H1 (an alternative hypthesis)?
H1 is the expectation that there is a relationship between two measured phenomena, or an association among groups, or that categorical outcomes are not all equally likely.
When can the chi squared test (X2) be used?
The chi squared test (X2) can be used where the observations are assigned into mutually exclusive classes.
Breakdown chi squared test (X2).
In the chi squared test (X2) the number of observations in each mutually exclusive class i.e. “male or female” are compared to those under the Null Hypothesis.
X2 = Σ(d2/e)
d (difference) = o (observed) - e (expected)
Σ = Sum of
How would one calculate the number of degrees of freedom?
The No. of degrees of freedom = (no. of rows - 1) x (no. columns -1)
if there is only one column, the degrees of freedom = No. of rows - 1 instead.
What are the degrees of freedom and why is it impossible for it be 0?
Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample. So if there were 0 independent values there wouldn’t be anything to test!
What is the p-value?
The p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.