me, myself and I Flashcards

1
Q

discuss quantifying biological data

A

-biological research relies on accurate and precise measurements of various biological parameters. e.g. length, mass, concentration, time, and genetic sequence.
-researchers often manipulate variables and control experimental conditions to understand cause-and-effect relationships. rigorous quantification is required to ensure reliable, reproducible results.
-mathematical models and statistical analyses play a vital role in understanding genetic data and deciphering complex genetic mechanisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are SI units?

A

system of international units
length= meter
weight= kg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is quantitative biology?

A

Quantitative biology is an umbrella term encompassing the use of mathematical, statistical or computational techniques to study life and living organisms. The central theme and goal of quantitative biology is the creation of predictive models based on fundamental principles governing living systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the mean?

A

The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the median?

A

The middle value in a set of numbers arranged in increasing order. If there is an even number of values, then median is the average of the middle two values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the range?

A

The range in statistics for a given data set is the difference between the highest and lowest values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the difference between samples and populations?

A

A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a sampling error?

A

refers to the possibility of mistaken inference when generalizing about a population based on a sample, due to chance variations between the sample and the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

why may a sampling error arise?

A

Sampling errors occur because the sample is not representative of the population or is biased in some way. Even randomized samples will have some degree of sampling error because a sample is only an approximation of the population from which it is drawn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

continuous vs categorical data

A

Continuous data can take on any value within a defined range and is often measured on a continuous scale, such as weight, height, or temperature. Categorical data, on the other hand, consists of discrete values that fall into distinct categories or groups, such as gender, ethnicity, or product types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

null vs alternative hypothesis

A

The null hypothesis is the statement or claim being made (which we are trying to disprove) and the alternative hypothesis is the hypothesis that we are trying to prove and which is accepted if we have sufficient evidence to reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the chi-squared test?

A

The Chi-Square test is a statistical procedure for determining the difference between observed and expected data. This test can also be used to decide whether it correlates to our data’s categorical variables.

a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is statistical significance?

A

In research, statistical significance measures the probability of the null hypothesis being true compared to the acceptable level of uncertainty regarding the true answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does statistical significance relate to the p-value?

A

The lower the p-value, the greater the statistical significance of the observed difference.
A p-value of 0.05 or lower is generally considered statistically significant.

P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how can the p-value be used as evidence to support/reject the null hypothesis?

A

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are type I and type II errors?

A

A type I error (false-positive);
occurs if an investigator rejects a null hypothesis that is actually true in the population
a type II error (false-negative);
occurs if the investigator fails to reject a null hypothesis that is actually false in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the effect size? how does it relate to the practical significance of findings?

A

effect size is the magnitude of the difference between groups. The absolute effect size is the difference between the average, or mean, outcomes in two different intervention groups.

An effect size is a measure that describes the magnitude or size of the difference or relatedness between the variables we are measuring. This means that it is describing the practical/meaningful significance

18
Q

what is the equation of a straight line?

A

y = mx + c

19
Q

what is r-squared?

A

The coefficient of determination (R²) is a number between 0 and 1 that measures how well a statistical model predicts an outcome. You can interpret the R² as the proportion of variation in the dependent variable that is predicted by the statistical model.

20
Q

what is the 95% confidence interval?

A

A 95% confidence interval (CI) of the mean is a range with an upper and lower number calculated from a sample. Because the true population mean is unknown, this range describes possible values that the mean could be.

21
Q

what is the best estimate of population average?

A

the best estimate (point estimate) of the population average is the SAMPLE AVERAGE

22
Q

population vs sample distribution

A

The moments of a sample distribution are referred to as statistics of the sample. The moments of a population distribution are referred to as parameters of the population. If samples are drawn from the population with replacement, then any number of samples of a given size, N, can be drawn.

23
Q

population vs sample average

A

In statistics, there are two different averages: the sample mean and the population mean. The sample mean only considers a selected number of observations—drawn from the population data. The population mean, on the other hand, considers all the observations in the population—to compute the average value.

24
Q

what is a multivariate linear model?

A

The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

25
when would you use a multivariate linear model?
You should use Multivariate Multiple Linear Regression in the following scenario: You want to use one variable in a prediction of multiple other variables, or you want to quantify the numerical relationship between them.
26
cause and effect relationship
Cause and effect is the relationship between two events or situations where the cause is directly responsible for creating the effect. For instance, if someone spills gasoline on their lawn, the grass will die. The cause is spilling the gas, and the effect is the lawn dying.
27
longitudinal data collection
Longitudinal data, sometimes referred to as panel data, track the same sample at different points in time.
28
cross-sectional data collection
In statistics and econometrics, cross-sectional data is a type of data collected by observing many subjects at a single point or period of time.
29
what is multiple testing?
comparing multiple groups between each other or versus a shared control group while in other cases it refers to comparing only two groups but based on multiple characteristics of theirs.
30
how might multiple testing impact results?
Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about the individual hypotheses are based on the unad- justed marginal p-values, then there is typically a large probability that some of the true null hypotheses will be rejected.
31
what is 'cherry picking' in analytics?
Cherry-picking in data analytics refers to the selective and biased extraction of data or information for analysis. This practice involves choosing specific data points or datasets that support a desired conclusion while disregarding or ignoring other relevant data that may contradict or challenge that conclusion
32
what is the independent variable?
It is a variable that stands alone and isn't changed by the other variables you are trying to measure.
33
what is the dependent variable?
A dependent variable is the variable that changes as a result of the independent variable manipulation. It's the outcome you're interested in measuring, and it “depends” on your independent variable. In statistics, dependent variables are also called: Response variables (they respond to a change in another variable)
34
what is regression analysis?
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable
35
what is a sampling error?
due to sampling subset of the population; only ever a sample (should be reflective of entire population) sample size not being large enough
36
sampling bias
methodology leads to underrepresentation of certain groups; not representative of population
37
cherry picking
only presenting positive results & ignoring other findings there are only results no positive/negative results.
38
measurement error
39
type I error
falsely rejecting null hypothesis
40
type II error
falsely accepting the null hypothesis
41
p-value
is probability. 0.05= theres a 0.05 chance of a false result
42