R Details Flashcards
How can you join vectors together?
Using the names of the data sets/vectors you want to add- eg girls and boys this is the code
> children =c(girls,boys)
How do you check the length of the vector
> length(vector name)
When adding vectors what is key to remember?
Don’t put + signs - only commas
How to extract particular elements/numbers from a vector / data set?
> nameofvector[1]
The square brackets tell r where in the vector you want to be shown
A range of elements is written
nameofvector[1:7]
How do you see a vector without certain elements
> nameofvector[-1]
Minus the first element
Maximum value of the vector
> max(vectorname)
How to work out if any vectors match our number
> which(vectorname==7)
Will give you the position of those values that match
Change the name of a vector
Vector= nameofvector
How to calculate the sum of all elements
> sum(vector)
Mean of elements
> mean(vector)
Median of elements
> median(vector)
Variance of elements
> var(vector)
Standard deviation
> std = function(x) sqrt(va(x))
std(vector)
You have to teach r how to calculate standard deviation
Normality test example
Shapiro- wilks test
When should you use Shapiro- wilks
To answer the null hypothesis: the data is drawn from a normal population
The p-value is the probability that our data are normal
A low p value lower than 0.05/5% allows us to reject the null hypothesis - meaning the alternative is true - the data is not normal
What do you do if your data is not Normal?
Calculate a non-parametric measure of data spread eg interquartile range
>IQR(vector)
Or
Median average deviation (MAD)- this finds the median of the absolute differences from the median and then multiples by a constant (1.4826)- which makes it comparable with the standard deviation
>mad(vector)
What is the code for summary and what does it show you?
> summary(vector)
Reports:
Minimum
Maximum
Median
Mean
1st quartile
3rd quartile
How do you graphically show that random data is approx normal?
“Normal probability plot”
Any curving will show that the distribution has short or long tails
The line is drawn through points formed by the 1st and 3rd quartiles
>qqnorm(vector,main=“normal (0,1)”)
>qqline(vector)
What does a data transformation do?
Attempts of approximate normality before parametric stats can be applied
If data cant be converted to normality non parametric stats have to be used
Common data transformation process
Logarithm of the data - log(x+1)
>qqnorm(log(vector+1))
>qqline(log(vector+1))
Test if it worked with a normality test
Barcharts in r
> barplot(vector)
How to generate a more informative barplot
> table(vector)
barplot(table(vector))
How to change the scale on a barplot
> barplot(table(vector)/valuemeasured(vector))
How to add labels to a barplot
> labels=as.vector*(c(“one”, “two”,”three”))
barplot((table(vector)/measurement(vector)), names.arg=labels , xlab**=“Number of children”, ylab=“relative frequency”)
*actually write as.vector here
**label for x axis etc
Histogram code
> hist(vector)
How to upload a larger data set
> dataset = read.table(“name of file”, header = TRUE)
attach(dataset)
dataset*
*this will show you the attached data set
summary(dataset)
Binomial or chi squared
Nominal or frequency data
2 categories
Chi-squared
Nominal or frequency
More than 2 categories
Pesaron product moment / spearman rank
Interval or ratio data and measures with a reasonably normal distribution
2 conditions
Testing hypotheses about:
Correlation - relationship between two dependent variables
Simple linear regression
Interval or ratio data and measures with a reasonably normal distribution
2 conditions
Testing hypotheses:
Regression - effect of an independent variable upon a dependant variable
T test
Interval or ratio data and measures with a reasonably normal distribution
2 conditions
Testing hypotheses about
Means
Independent measures design
T test
Interval or ratio data and measures with a reasonably normal distribution
2 conditions
Testing hypothesis about- means
Matched measures or repeated measures designs
Analysis of variance - ANOVA
Parametric
Interval or ratio data and measures with a reasonably normal distribution
More than 2 conditions
Testing hypotheses about - means
Difference between means
Null hypotheses= there is no significant difference between the means of two conditions
Multiple linear regression
Interval or ratio data and measures with a reasonably normal distribution
More than 2 conditions
Testing hypotheses about - regression (effect of 2 or more independent variables upon a dependant variable)
Spearman rank
Ordinal data or non-normal distribution of measure
2 conditions
Testing hypotheses - correlation - relationship between two dependent variables
Mann- Whitney
Ordinal data or non-normal distribution or measure
2 conditions
Testing hypotheses - medians
Independent measures
Wilcoxon
Ordinal data or non normal distribution of measure
2 conditions
Testing hypotheses about - medians
Repeated measures
Krystal - Wallis
Ordinal data or non-normal distribution of measure
More than 2 conditions
Non-parametric analysis of variance
Independent measures
Friedman
Ordinal or non-normal distribution of measure
More than 2 conditions
Non- parametric analysis of variance
Repeated measures
Continuous variable
Take on any value within a given range
There are an infinite number of possible values, limited only by our ability to measure them eg distance
Discrete variable
Only certain distinct values within a given range
The scale is still meaningful - cant have half numbers