CHemometrics test 1 Flashcards
What is a barplot
displays distribution of categorical variable - horizontall or vertically
How to simply make a barplot
barplot(data)
can include main , xlab and ylab for titles
data should be grouped under categorical variables (and can select that for the data eg
data$categorical)
Stacked vs grouped barplot and how to program
depends on if your data is matrix rather than a vector - can switch variabesl besides to true or false
What are error bars on bar graphs
typically the height Plus or - stde
What is ggplot? and how to use
an easier way to graph
ggplot(data, aes(x = , y = ) geom_point()
So typically have your gg plot function with your data x and y and then other functions after like geompoint, geom smooth etc
How to make error bars in ggplot
geom_errorbar()
What is jitter
Shows all poitns and adds random spacing to make it easy to visualize
What is a spinogram
a stacked bar plot but scaled to 1 (displays everything in %age
What is a box plot
box and whisker - displays median, upper and lower quartile (edge of box) and the upper and lower hinge (whisker
What is a parallel boxplot
multiple boxplots displayed side by side - can use to see separation of groups
What is a notched boxplot
Box plot with a notch - the notch is a narrowing of box around median - WIDTH is proportional to interqartile range and inverse proportional to size of sample
The notch is the confidence interval around the mean - if two boxes notches DONT overlap - strong evidence that their medians differ
What is a VIOLIN plot and how plot
llibrary(vioplot)
KERNAL density plots superimposed in a mirror image over the box plot(box plot on the inside with black and white lines)
Whats a histogram and how to plot
dispays distributoin of continuous variables (divide range into bins) (hist(data)
What can you put in addition to histogram
can do probability density curve or fit normal curve to it
What is a kernel density plot
estimation of probabiltiy density over variable
Dot plot
Dots (catergorical on one axis continuous on other
Scatter plots
scatter (continuous on both
What are grouping and faceting
faceting displays groups of observations in seperate side by side plots; Grouping displays two or more groups of observations in a single plot
Descriptive vs inferential stats
descriptive describes the stats (eg whats the mean, mode, stdev etc
inferential says something about the data - draws inferences about it (eg these two pop significantly different in regard to this
5 types of descriptive stats
Frequency (how often), central tendency (mean), dispersion (stdev), position (relative position eg quartiles), Shape of observation (skewness and kurtosis)
What are DOF and how determined
DOF = measure of # of independant data pieces used to eval (n- #) - # is number of parameters estimated form data
Talk about skewness and kurtosis
skewness is measure of degree of asymmetry,, 0 if symmetrical
Kurtosis is measure peakedness - 3 is normal , if high very peak , if negative its flat
TWO important notes about inferential stats
assumes each replication in a condition is assumed to be independent
Large sample size - more likely statistic to indicate differences exist
Steps for sig testing (7)
1 state null hypo
2 State alt hypo
3 check if dist normal
4 select appropriate test
5 choose level of significance and number of tails
6 calc statistical value
7 Obtain critical factor for test and compare crit value with test statistic
null vs alternative hypo
null means no difference, alt means there is
What do tails mean ( 1 vs two)
interested in change in just one direction or either
What is level of signifigance
the probability that what we are saying is not true
What is a crit value
determine if result significant or note
What is COHENS D
Measures the size of the difference (trivial, more than a standard dedication etc)
How does variance play into t tests of two independent data sets
Equal variance t tes assumes stdev of each group arising from same pop
UNEQUAL variance t test - stdev from each group sig different (welch)
What is students paired t test
comparing related samples (eg time poitns of same test subject)
Ways to test for normal dist
Shapiro Wiles test, P>0.05 means there is a normal dist
Anderson Darling test P>0.05 means normal dist
Test graphically (histogram, boxplot, QQ plot (quantile quantile - theoretical vs actual)
skew and kurtosis
What’s a bimodal distribution
when there are two modes
stdev vs %RSD vs Variance
stdev is stdev, RSD is stdev/mean and variance is stdev squared
Whats the range of skew and what do they mean
unaccpetable is greater than 1 or less than -1 0 is symmetrical - positive skew is tailing
Range of kurtosis
3 is normal - unacceptable less or greater than 3
What does describe function in psych do
gives all the descriptive stats
How does test relate to crit value
test>crit value means null is rejected (test stat is generated, , crit # is gound in table
How can t tests be used (variation
One sample mean test against a specific value,
2 independent means (key because have 2 means and 2 stdev - so have s diff - difference in stdev - need to do equal variance test
onesided vs two tailed depending on which direction of variation you care about
How to check if variance between two group sis equal or not
Welch’s, Levenes (null hypo is that groups are - so p>0.05), F test in R
What does ANOVA stand for
analysis of variance
Why ANOVA vs T test
basically multiple sources of variation not just comparing one to one
eg variation with analyst and variation with method/instrumetn (looking at data of 4 analysts making lead measurements in water)
So here we can look at within group variation (from each analyst -comparing their means) OR the between group factors (seeing how the means of groups differ from each other) and more importantly we can isolate and estimate these (in each - there is one main calculation with variation from the other)
ANOVA hypotehses
H0 - population means of groups are all equal
H1 -pop means of groups are all not equal
What are assumptions in ANOVA
Independance of observations
Normality of Residuals (the difference between observed value and estimated true value)
Homoscedasticity (Variances of data in the group is the same - homogenous variance)
What is variance in ANOVA
so within group is Mq or Sw^2 or
Between group is Sb^2 (Mb - Mw)/n
ONE WAY vs two way anova
One way - single classification variable - multiple groups
What is balanced design for ANOVA
each group has equal number of people (- observations in each treatment condition)
What are ost hoc tests for?
ID which groups are different - Anova just tells you if there is a difference not which groups or how much
What is TUKEYS HSD?
a post hoc test - honestly signfiant differene
Whats a confounding factor
a variable that can explain group differences on the dependant variable - NOT A VARIABLE WE ARE INTERESTED IN - A NUISANCE
Multivariate anova?
looks at the efect of multiple dependant variables (eg effect of treatment on concentration of compound y AND Z
How to do anova in r studio
AOV(dependant vairable ~ independant)
What are some post hoc tests
Bartletts, LEVENS, TUKEY HSD
Variations on ANOVA formula
y~A+B+C a prediction of y from A, B and C
(typically controlling for the other 2)
y~A+B+A:B - denotes interaction between variables
y~ABC means each individually but coding interaction between all 3
What order do you program into ANOVA in R
covariates, then main effects then interactions
Example ANCOVA design (Anova with 1 covariate)
evaluate whether the dependant variable are equal across levels of an independent variable while controlling for the other (covariate)
WHAT ARE ASSUMPTIONS NEEDED FOR ANCOVA
1 linearity between covariate and the outcome variable at each level of the independent variable (eg basically your covariate should effect each level of independent variable the same)
2) Homogeneity of regression slopes - they are parallel (covariate vs outcome variable - so basically no interaction between covariate and independent variable)
3) outcome variable approximately normal
4)homoscedasticity - homogeneity of residual ariances for all groups
What is adjusted p value and when do we use
Adjusted p value is adjusting the p value when you have multiple comparisons - because the more comparisons ou do - the error rate grows with each additional comparison
How to calc adjusted p value
Bonferroni correction - P value/ Number fo comparisons
(or p *n and compare to alpha
ancova vs 2 way anova
ancova covariate is a CONTINUOUS VARIABEL - like horus studied per day, 2 way anov ais a whole other set of categories!!
What is two way anova with replication vs without
withou treplication means there is only one value in each group (eg you took the mean for each) - with replciation means each group has a population of data