Statistics 1 Flashcards

1
Q

What is the definition of a population?

A

Every member with selected characteristics and sharing common property in a specific region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of a sample?

A

A representative sub-set of a given population, unrelated and chosen at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between the response (dependent) variable and the explanatory (independent) variable?

A

The response (dependent variable is of interest in an experiment, it depends on another factor (independent/explanatory) variable to cause change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two sub-sets of qualitative data?

A

Nominal Data - categorical information that lacks inherent order or ranking

Ordinal Data - information with order or ranking, differences between values are not quantifiable e.g. survey responses or educational levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two sub-sets of quantitative data?

A

Discontinuous - obtained by counting integers

Continuous - (Most used) obtained by measurement e.g. height, BMI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of data is:
Number of carbon atoms in a molecule

A

Discontinuous quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of data is:
Mass of a chemical compound weighed on a balance

A

Continuous quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of data is:
Absorbance measured using a spectrophotometer

A

Continuous quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data is:
Gender of students in a class

A

Nominal Qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of data is:
Educational levels of students in a class

A

Ordinal Qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Accuracy

A

Closeness of measurements to the true value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define precision

A

Closeness of repeated measurements to eachother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define Data Set

A

Collection of information based on an experiment or research question, collected in term of observations and variables, ready to be processed, analyzed, distributed or shared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define descriptive statistics

A

Summarize a set of data values in terms of center and spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does average show?

A

The general tendency of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What distribution of data can you find the true mean?

A

Normally distributed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define variance

A

Average squared deviation from the mean

18
Q

Define Standard Deviation

A

Variability or spread of the data from the mean of the sample

19
Q

Define Standard Error

A

Deviation from the mean of the populations, this tends to be estimations used to calculate confidence

20
Q

What is the Confidence Interval?

A

What percentage confidence you are that if someone repeated the test with a different sample, you would get the same results

21
Q

Give the basic principles of coefficient of variance (CoV)

A
  • Larger the number the larger the spread
  • Normally expressed as a percentage of the mean
  • Useful for comparisons of 2 data sets in different units
22
Q

Give the formula for Coefficient of variance (CoV)

A

CoV = (SD/mean)*100

23
Q

Define H0

A

The null hypothesis - there is no correlation/ difference/ association

24
Q

Define H1

A

Quantitative or alternative hypothesis
there is a correlation
H1 and H0 are mutually exclusive

25
Q

What is the P value

A

The probability (chance) that the null hypothesis is true with 95% confidence. 0.05 (5%) is the statistical cut off of rejection of the H0.

26
Q

What is the true cutoff for the P value?

A

0.05/number of predictor variables

27
Q

Why is it best t under go 2-tailed tests rater than on-sided

A

A hypothesis can either be one sided or 2 sided and you can test for statistical significance in both directions. If you only test in one direction you may miss an effect in the other direction!

28
Q

What is the odds ratio?

A

A value indicating the strength of the relationship between 2 variables in data. Compared the relative odds of the occurance of the outcome of interest (cancer vs no cancer), given the exposure to the variable of interest (age)

29
Q

What does Odds Ratio mean in relationship to 1

A
  • OR = 1 variable does not effect the odds of the outcome
  • OR > 1 variable associated with higher odds of an outcome (Increase the risk of the response variable)
  • OR < 1 variable associated with lower odds of an outcome (Decrease the risk of the response variable)
30
Q

What is the Z score ?

A

Odds ratio / standard error of the odds ratio

31
Q

What are statistics tests used for?

A

To test the probability that the null hypothesis is true

32
Q

When would you use z-test?

A

When the sample size is small (n<30) and/or the population variance is known

33
Q

When would you use t-test?

A

When the sample size is small (n<30) and/or the population variance is unknown

34
Q

When would you use Chi-squared?

A

Goodness of fit - examine whether the observed results are in order with the expected values (categorical data)

35
Q

When would you use Fisher Exact?

A

Goodness of fit - gauge if there is a significant difference between proportions of the categories in two group variables

36
Q

When would you use F-test?

A

Compare variances of 2 samples or the ratio of variances between multiple groups

37
Q

When would you use ANOVA?

A

Uses F-tests to statistically test the equality of means on 3 or multiple groups of quantitaive variables

38
Q

When would you use Wilcoxon Rang

A

Test the equality of means on 3 or multiple groups - used when data is not normally distributed

39
Q

What does the result of a t- statistic mean?

A

The higher the value, the lower the chance that the two samples means are from the same population

The higher the value of t the more likely that the two samples means are to be different.

40
Q

What is a Type I error?

A

False positive

Occurs if you reject the H0 while you are supposed to accept it due to data bias

41
Q

What is a Type II error?

A

False negative

Occurs when you accept the null hypothesis when you were supposed to reject it due to a lack of power