R-Studio Flashcards

1
Q

what can R be used for and and what is it very powerful in doing?

A

R is very powerful in analysing biological data efficiently, it can be used as a calculator, statistical modelling, to make graphs, for programming and much more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what type of language is R?

A

R is an object-oriented programming language i.e. one creates objects, gives them names - names that can be used for statistical tests or to make graphs/tables

we use a user friendly version: R-Studio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what do we do before we start processing data in R?

A

we must prepare our data in Excel before we export it to R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the four quadrants in the R-Studio:

A

console - where you write the code
graphical window - where the graphs show up
bottom right - variables, objects & formulas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how do you assign a value to a letter or word in order transform it into an object?

A

[name of object] <- [value to assign to object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the operators in R-Studio?

A

+ = addition
* = multiplication
/ = division
^ = raised to the power
sqrt = square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how can we make R-Studio remember something?

A

through creating objects using the “<-“ method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why do we need to keep an eye on capitalisation?

A

because R is case-sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we export our data from excel to R-Studio?

A

through creating a data frame using the following code:

beetle <- read.table(“location of excel file with desired data”, header = T, stringsAsFactors = T)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

state what the following line of code does and dissect and explain each component of it:

beetle <- read.table(“C:/teaching/stats/lecture3/beetle_behaviour.txt” , header = T, stringsAsFactors = T)

A

this line of code steals all of your data from an excel sheet and places it neatly into R-Studio

beetle <- = simply the name of the data frame

read.table(…) = command to read in data frame with location of excel sheet

header = T (short for header = true): first row contains variable names

stringsAsFactors = T converts text in a data table to categorical variable (,factor’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what do you always add at the end of your file name?

A

file name.txt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the purpose of the lines “header = T” & “stringsAsFactors = T”?

A

they confirm to the graph that the heading and columns are to be read correctly - T meaning ‘true’ where “header = T” confirms that the top row is the names of the variables and “stringsAsFactors = T” confirms that the factors (data points) lie beneath the headers in columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what must you do once you have typed out your “read in data frame” line?

A

you must attach the data frame to the work space memory, i.e. make the data accessible through pressing enter after you’ve typed it to then, on the next line write: “attach(name of data frame)” which in this case would be: attach(beetle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what do hashtags mean in R-Studio?

A

in R-Studio hashtags are used to make comments and will have no effect on the actual code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

make the data accessible in R:

A

attach(name of data frame)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

once you have made your data accessible to R through entering attach(name of data-frame) what must you enter and what will that give you?

A

once the data-frame is accessible, you type names(beetle) which will give you an output containing all of the variable names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

once you have inputted your names(beetle) line to give you the name of all your variables, what must you now do and why?

A

immediately after names(beetle) you type head(beetle) which will automatically show the first rows of your data frame (the headings)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

first three steps following the uploading of your excel file and why each step is important:

A

[beetle is the name of data-frame]

attach(beetle): this allows for your data to become accessible to the code
names(beetle): this gives the names of the variables in our data-frame “beetle”
head(beetle): this only shows the first rows of our data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you calculate the probability of a normal distribution in R?

A

using the command “pnorm”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does R not like when it comes to excel sheets of data?

A

1) it does not like empty cells, therefore you must delete them of type n/a

2) it does not like headings with spaces, therefore if your header has two words - separate them with an underscore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

get excel sheet ready for R:

A

(1) make sure that all measures/observations of one variable are in one column
(2) remove all spaces within your table and replace those by ‘_’
(3) try and give short and distinct variable names (as we may need to type them later)
(4) replace all empty cells, i.e. missing values, by ‘NA’

22
Q

how can you test for normal distributions in R using the shapiro-wilk test?

A

shapiro.test(variable name)

where you will be given a W & P value where W is your test statistic and P is your probability value

if your p value is larger than 0.05 then you can conclude that your data is of a normal distribution

23
Q

how can you get your critical value that is needed for comparison following the calculation of your chi-squared value in R-Studio?

A

qchisq(certainty [0.95], Degrees of Freedom)

> critical value

24
Q

how do you calculate chi-squared in R-Studio?

A

check through command: count

[#1. we add the data columnwise into a matrix, called ‘count’]

embryo <- matrix(c(38,14,11,51), nrow=2)

[#2. our chi-squared test]

chisq.test(embryo, correct = F)

answer - including: p-values, degrees of freedom (DF) etc

nrow = specified the number of rows our matrix should have
c = connect

25
Q

to understand and check the structure of your data-frame:

A

str(name of data-frame)

26
Q

how can you create a subset in R?

A

say we are only interested in the first and second column of our data frame, we write:

twocols<-genome[,1:2]

how many columns we want<-data-frame name[, 1st column:last column]

if we are only interested in the first three rows:

threerows <-genome[1:3,] threerows

27
Q

mean in R:

A

mean(variable name)

28
Q

median in R:

A

median(variable name)

29
Q

calculating variance in R:

A

tapply(variable name 1, variable name 2)

[can just do one variable if you wanted to though

30
Q

standard deviation in R:

A

sd(variable name)

31
Q

how can you use the shapiro-wilk test to see if your data is parametric (normally distributed) or non-parametric (non-normally-distributed)?

A

input: shapiro.test(variable)

this will give you a p-value, if your p-value is MORE than 0.05 this means that your data is normally distributed and therefore you accept the null hypothesis that the data is normally distributed

32
Q

calculating standard error in R:

A

SE<-function(x)sqrt(var(x)/length(x))

where “x” are your to-be plugged in values

then we use our new object “SE” as a command where: SE(variable name)

33
Q

what should you do if the following error message shows up: “Error in plot.new() : figure margins too large”?

A

move your mouse over the borders of the ‘Plots’ pane, i.e. the graphical window, until your cursor turns into a symbol consisting of four arrows; if you now press the mouse button, you can make the window larger; you can then re-send the above command and the graph should show)

34
Q

how do we change the names of graphical axis in R-Studio?

A

after your plot(y variable~x variable) you add a comma before closing the brackets and write:

las =1, ylab = “name of new y axis”, xlab = “name of new x axis”)

35
Q

[las = 1] means:

A

command for axis labelling, means that the axes labels are always horizontally written

36
Q

how do we change the names of our graph axis?

A

[plot(command, las = 1, ylab = “new y name”, xlab = “new x name”]

37
Q

if we have two levels factors on our x-axis (e.g. two box-and-whisker plots), how can we rename them?

A

to rename these pars of the graph you must (after your xlab = “new name” command for a new axis label) write names=c(“first new name” ”second new name ”)

38
Q

if the data table is very small, how can you plot the data into R manually?

A

temperature <- c(data-point-1, data-point-2, etc) activity<- c(data-point-1, data-point-2, etc)

39
Q

how can you change the range of the y-axis in R?

A

ylim=c(0,6),

in the above instance, this would make your data on the y axis now range from 0 → 6

40
Q

how can you change the size of your axis labels and values in R, where the default value is 1:

A

changing size of axis labels - cex.lab = 1.5, (0.5 bigger)

changing size of axis values - cex.axis = 1.5, (0.5 bigger)

41
Q

how can you change the appearance of your data-point symbols?

A

fill in the circle = …,pch=19)

colour the circle = …, col=“red” [if you want red]

42
Q

how must you separate every single command when writing out a line of code in R?

A

you must separate each one using a comma

43
Q

if you are done and happy with your graph, how can you export it to another programme?

A

(1) go to the tab “plots”

(2) export

(3) copy to clipboard

[you can also save your plot]

44
Q

how can we clean up after ourselves by removing all the objects we have created?

A

we can do this through pressing the little broomstick in the top right corner of the environment pane

45
Q

full and complete command needed to have a linear regression in R:

A

(1) data<-read.csv(“excel_sheet1.csv”, header = T, stringsAsFactor = T)

(2) attach(data)

(3) names(data)

(4) m1<-lm(y variable~x variable)
#”m1” is simply your model name

(5) summary.lm(m1)

(6) summary.aov(m1)

(7) plot(m1)

46
Q

complete linear regression R command:

A

(1) read.csv
(2) attach(file name)
(3) names(file name)
(4) m1<-lm(growth~tannin) #m1 is the name of your regression
(5) summary.lm(m1)
(6) summary.aov(m1)
(7) plot(m1)

(8) plot(y-variable~x-variable, pch=19, las=1)
(9) abline(lm(y-variable~x-variable)

47
Q

what singular command gives you the information to see the two mini graphs in linear regression that allow you to confirm parametric distributions?

A

plot(m1) #must be following certain previous commands

[gives one graph that must be “sky at night” and another graph which must have the data-symbols on or v.close to the line]

48
Q

how can you command R to give the pearsons correlation?

A

after attaching your data-frame and variable names(x):

cor.test(variable_1, variable_2, method = “pearson”)

note: doesn’t matter what way around your variables are - answer will be the same either way

49
Q

what do we receive from a pearsons cor.test command and how do you infer it?

A

you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient

(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated

(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation

50
Q

how can you use R to calculate your spearman’s rank values?

A

we, once again use:

cor.test(variable one, variable two, method = “spearman”)

51
Q

χ²-tests enable us to judge whether:

A

the observed frequencies differ from the frequencies expected if the two variables were not associated

52
Q

what is the first thing you must do before conducting a chi-squared test in R?

A

you must first create a matrix for your data via command:

> matrix-name<- matrix(c(20,40,30,30,42,18), nrow = 2