Statistical methods for chemical analysis Flashcards

1
Q

What are statistical methods for chemical analysis?

A
  • Data
  • Distributions
  • Associations
  • Graphical methods
  • Hypothesis testing
  • Averages
  • Power
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some different data types and what is each one mentioned?

A

*** Nominal/ categorial **
o Data that you can put into a names category
-E.g., alive, or dead

*** Ordinal **
o Data that you can order (and categorise)
-E.g., Mild/ moderate/ severe

*** Interval/ ratio **
o Data that has a measurement (and you can order and categorise)
- Interval- differences between measurements are equal e.g., time, temperature
- Ratio- has a true zero so can be negative- e.g., heights, weight, percentage, concentration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 5 rules for significant figures in measurements?

A
  1. All non-zero numbers **are significant **
    e.g., 563 has 3 sig. figs.
  2. All zeros between non-zero numbers **are significant **
    e.g., 24006 has 5 sig. fig
    2.404 has 4 sig. fig.
  3. Leading zeros are **not significant **
    e.g., 0.0063 has 2 sig/fig
  4. Trailing zeros after a number are** not significant **
    e.g., 420 have 2 sig. fig.
  5. Unless there is a decimal point before trailing zeros
    e.g., 420.0 has 4 sig fig.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data is based on measurements that are uncertain.
Not all digits have meaning (are significant) and only those numbers derived from a measurement should be written down. For instance trailing zeros if written must have meaning.

When doing addition/ subtraction and multiplications/ divisions what are some ensurences with significant figures that need to be made?

A

**Adding and subtracting- **the answer must reflect the reliability of the least precise number
2.2 + 2.66 = 4.9 (rounded to the least number after the dp)- as only have precision to 2 sf

* Multiplication and divisions- report with the least number of significant figures
* 14 is not the same the 14.0- same value but different meanings about its trustworthiness

o 2.5 x 3.42 = 8.6 (calculator 8.55 2 s.f)
o 3.10 x 4.520= 14.0 (calculator 14.012)
o 5.042 x 20= 100 (calculator 100.84)- note 1 sf in answer
o 5.042 x20.0= 101 (calculator 100.84)-note 3 sf in answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do would you round 5 in the following instances:
* Less than 5
* Greater than 5
* 5
* Exactly 5 (followed only by zeros)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats the relationship between accuracy and precision?

A

There isn’t one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is accuracy?

A

A measurement of average difference between experimental value and true value

Differences are due to systematic errors

The true value must be known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Every measurmeent has an associated uncertainty
Whats precision?

A

How close measurements are to each other

The differences due to random errors

The distribution of the random measurements is **guassian or normal **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the normal distribution of data described?

A

As the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is a histogram used?

A

For normally distributed data when a large sample size is used and this is better as leads to a bell-shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sometimes a histogram can have skewed data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of graphical method is used to compare groups and distributions?

A

Box and whisker plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different averages and what is each one?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for calculating the mean?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean?
How is it calculates?
How is it represented in a box and whisker plot?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mode and what type of thing do you have to look out for?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If a statistical test is carried out and it gets that p<0.05, what does this mean?

A

P<0.05 shows less that 5% chance that these two data sets came from the same distribution which suggest that they are different sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Draw a box and whisker plot and what does each aspect of itrepresent and how would the minimum and maximum otherwise be written?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For standard deviations:
* What kind of data is it used for?
* What is it a measure of?
* What does it describe?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the equation for the standard deviation and variance?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the** 95% reference range**?

A

**Standard deviation **
o Gives an indication of spread
o 95% of observations with mean +/-2sd (actually 1.96 sd)
o 95% reference (normal) range; expect 95% of the samples to be within this range in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is standard error?

A

The standard deviation of the means of the representative data is known as the standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Graphical representation of a standard error

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the 95% confidence interval?

A

Standard error is a standard deviation
o Of means, rather than data observations
o 95% of means lie within the mean (of means) +/- 2se (**95% confidence interval) **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the equation for calculating standard error?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Error bars, deciding when to use them in your data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a normal distribution data set?

A

Data based on continuous distributions follow a mathematical distribution- usually a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What do parametric tests rely on?

A

Parametric tests rely on the data being normally distributed- plot your data

29
Q

What can you use if your data is not normally distributed?

A

If your data is not normally distributed you may be able to transform it mathematically, or use a non-parametric test E.g., log the data values, plot, and test for normality

30
Q

What does the central limit theorem suggest?

A

Central limit theorem suggests that you can usually use parametric tests if you have a large sample size (>30)

31
Q

When should a non-parametic test be used?

A

Non-parametric tests do not assume a particular distribution/normal distribution. Use these if your data is better represented by a median than a mean

32
Q

Parametric tests normally assume that the variances in the sets of data are homogenous (homoscedastic). What can be done to support this?

A

o Use an F test to check
o If **In doubt, use a non-parametric test
**

33
Q

What are F tests?

A
  • F test looks to see if the ratio of the variances falls outside an expected level
  • Depends on the degrees of freedom (n-1) in each group and the variance (s2)
34
Q

What are F tests?

A
  • F test looks to see if the ratio of the variances falls outside an expected level
  • Depends on the degrees of freedom (n-1) in each group and the variance (s2)
35
Q

When doing hypothesis tests, what is the first thing to consider?

A

Need to consider whether the data is independent (unpaired) or dependent (paired)

o Patients given treatment V patients given placebo
- 2 sets of independent data

o Patients measured at baseline and then after treatment
-1 set of data- the difference- normally distributed, even if the original data was not

36
Q

What is a null hypothesis?

A
  • The null hypothesis H0 assumes that there will be no observed difference because of an experiment
  • The statistical test aims to look for evidence against the null hypothesis- a result that is so different from this distribution that we believe it has not occurred just by chance
  • For example, if a result falls into the extremes of the distribution we might be prepared to reject the null hypothesis
  • If the result does not fall into the extremes of the distribution we cannot reject the null hypothesis, but that does not mean that we accept the null hypothesis
37
Q

Whats the alternative hypothesis?

A
  • The alternative hypothesis H1 assumes that there will be an observed difference as a results of an experiment
  • If, what we see, is not representative of the data distribution, then we reject the null and accept the alternative hypothesis
  • P<0.05- less than 5% chance of the measurement falling into the null hypothesis distribution
  • Result fall outside that 95% confidence interval
38
Q

What is the general equation for a test statistic and what are some examples statistical tests which can be done?

A
  • All statistics tests involve calculating a test statistic
  • Test statistic is compared with a particular distribution
  • E.g., F test, T test, Chi squared test etc.
39
Q

Deciding what statistical test to use…

A
40
Q

For a t- test (or students t-test) what does the distribution describe, what are they used to compare?

A
  • The t test distribution describes sample data from the normal distribution
  • As the amount of data increases, so it approaches the normal distribution
  • T-tests are used to compare two sets of normally distributed data
41
Q

What are the 3 different forms of t-tests and the equations for each?

A

3 different forms of t-test **
o
Independent samples** t-test compares means of 2 different groups
o Paired samples t-test compares means from the same group at different times
o
One sample
t-test compares the mean of a group against the known mean

42
Q

How do you calculate the degrees of freedom for multiple data sets?

A

Calculating degrees of freedom of samples: (number in sample A + number in sample B) -number of different data sets

Degrees of freedom= n-1** (for one data set)**

43
Q

One-tailed or two-tailed tests
What percentage do they lie in, in the normal distribution curve?

A
44
Q

If there are more than 2 groups to test, what is used?

A

AVOVA

45
Q

What is AVOVA?
What is it used to compare?

A
  • ANOVA (analysis of variation)
  • Used to compare multiple groups in a single test- an extension of the t-test
46
Q

What are the different types of AVOVA test you can have and what is each one used for?

A

* One-way ANOVA- compares 3 or more single independent variables

* MANOVA- tests effect of one or more independent variable on two or more dependent variables
o E.g., repeated measures over time in treated and placebo groups

* Null- all sample means are identical

*** Alternate- **at least one sample mean is significantly different

47
Q

When the term ‘power’ is used in stats, what is this describing and what is a good level of power?

A

Power- How many samples do I need to test?
* Do I have enough power?
o Is my sample size large enough to detect a significant difference where a difference truly exists (although the truth is not known to you)?

* Questions to ask
o What power do I need? do I want to be 80% (80% power) that I will detect a difference in my test, if one really exists- or 90% sure?
o Power = Beta
o The higher the power, the more samples I will need

48
Q

What level of significance do I want to set?

If we decide that something that occurs is less than 5% of the time in an experiment is unlikely to be due to chance, then we set the p value at what? and alpha become what?

If we feel we need to be more certain that this is not a chance event, then we should set the p and alpha values to what?

A

What level of significance do I want to set?

If we decide that something that occurs is less than 5% of the time in an experiment is unlikely to be due to chance, then we set the p value at 0<0.05 **
-
Alpha= 0.05 **

If we feel we need to be more certain that this is not a chance event, then we should set **p<0.01 **
**-Alpha= 0.01 **

**
The lower the p value set, the more samples we will need to detect a difference where one truly exists**

49
Q

When is power the greatest?

A

When the variability is reduced

50
Q

What different things may power be?

A
  • Power is the probability of rejecting the null hypothesis when in fact the null hypothesis is false
  • Power is the probability of making a correct decision (to reject the null hypothesis) when the null hypothesis is false
  • Power is the probability that a test of significance will pick upon an effect that is present
  • Power is the probability that a test of significance will detect a deviation from the null hypothesis, should such a deviation exist
  • Power is the probability of avoiding a type II error (a false negative)
51
Q

What is the equation for calculating power?

A
52
Q

With power there are type I and type II errors, what are each?

A
53
Q

With associations how is it decided what statistical test to use?

A
54
Q

For categorical data the chi-squared test can be used, what is the equation for this test?

A
55
Q

Associations- observed data

A
56
Q

Associations- expected data

A
57
Q

Associations- calculations

A
58
Q

The associations flow chart when looking at relationships

A
59
Q

How is correlation measured and what are the different types?

A
60
Q

Associations- plotting data

A
61
Q

Associations method comparison…

A
62
Q

What is linear regression and what used to fit the line?

A
  • Line is fitted using the **least squares method **
  • Minimises the sum of squares of the residuals (the vertical difference of a point from a fitted line)
63
Q

Predictions from associations

A
64
Q

With associations there are r and R squared,what is each of these?

A
  • r is the correlation coefficient
    o indicated the strength of the relationship between two variables
    o ranges for -1 to +1 where 0 is no correlations
  • **R square is the regression coefficient **
    o Indicates how well the x variable can be used to predict the variable on the y axis
    o Ranges from 0 (poor predictor) to 1 (excellent predictor)
    o R squared= 0.8 implies that the y (outcome) variable explains 80% of the variation seen in the x (dependent) variable
65
Q

In HPLC the principles of linear regression are used to predict the concentration of an analyte based on a standard curve

In terms of validation of the method it is also important to determine the limit of detection (LOD) and the **limit of quantification (LOQ) **and this can be done easily in excel

What is the LOD and LOQ

A

o LOS is the lowest amount of analyte that can be detected
o LOQ is the lowest amount of analyte that can be quantified with reasonable accuracy and precision

66
Q

Graphing the data-HPLC analysis of caffeine

A
67
Q

Part 2

A

Part 3

68
Q

Part 4

A