RStudio functions: Flashcards

1
Q

A + B + Enter button

A

Answer for A + B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

*

A

Multiply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

/

A

Divide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A

To the power of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Brackets

A

To separate different functions used in one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A ← 3

A

Store the value 3 in variable A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a scalar variable?

A

A variable storing a single value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variable (eg. A) + Enter

A

Displays value stored in variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

C ← “apples”

A

Stores the word apples in variable C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are vectors?

A

Variables that can hold more than one value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

D ← c(3, 7, 1)

A

Stores vector 3, 7, 1 in variable D

C stands for combine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

D[2]

A

Allows access to the second value stored in the vector variable D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are data frames?

A

A store of large amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Putting name of data frame

A

Displays the full set of data in data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cars$Mpg

A

Codes for only mpg column in the cars data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DateFrameName[RowNumber, ColumnNumber]

A

Helps distinguish between rows and columns in a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cars[ , 1]

A

When row number is left blank, it will return all the rows.

In this example, the first column of rows of car will be returned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a function?

A

Anything that performs a particular operation on our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the format that all functions follow on R Studio?

A

Function_name(argument)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an argument?

A

The input for the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What would the function “ mean(e) “ do?

A

Give the mean/average of values in vector e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What code would you use to calculate the mean mpg of cars?

A

Mean(cars$mpg)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you find the column names in a data frame?

A

By printing the whole data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Names(name of data frame )

A

Only shows the column names of a data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Head(data frame name )
Only shows column names and top few rows of data frame
26
Why might the function “ head(name of data frame)” function cause the full data frame to be shown?
The function shows the first 10 rows, so if data frame has 10 or less rows then all of it will be shown
27
How are arguments separated in all functions?
Using commas
28
How do you define how many rows you want from a data frame?
head(name of date frame, n= )
29
Hist()
Creates a histogram
30
Draw histogram of horsepower (hp) from cars:
hist(cars$hp)
31
What is the standard layout of data?
- Each column represents a different variable | - Each row represents a different subject or replicate
32
How is the standard layout of data different from the layout used by people when making spreadsheets?
Each condition is put into separate columns
33
What is an advantage of the standard layout of data?
It is much easier to record additional variables in a data frame
34
What is the advantage of using histograms over box plots?
Histograms show more information
35
What is the advantage of using boxplots over histograms?
It is easier to compare data presented as a box plot than as a histogram
36
Boxplot()
Function that plots a boxplot
37
How do you code for a boxplot of only one variable (column) of the data?
Boxplot(data-frame$ColumnName)
38
What values are shown on a boxplot?
- Median - Upper (1st) quartile - Lower (3rd) quartile - Lowest value in data - Highest value in data
39
How much of the data falls within each interval of a boxplot (generally speaking)?
A quarter of the data
40
How much of the data falls within the interquartile range?
Half (50%) of the data
41
When is the boxplot() function able to produce multiple plots?
If given a variable to group the data by
42
Name the code needed to make multiple plots using the boxplot() function:
boxplot(NameOfDataFrame$a ~ NameOfDataFrame$b) - Where a is the variable that is plotted - Where b is the variable for grouping the data
43
In the function “boxplot(NameOfDataFrame$a ~NameOfDataFrame$b)”, what lies to the left and right of the “ ~ ” symbol?
- The response variables lie to the left | - The explanatory variables lie to the right
44
Summary()
This function neatly and easily provides several summary statistics
45
List the function that would be used to make summary statistics for weight from the mice data frame:
summary(mice$weight)
46
List the function that would be used to make summary statistics from the whole mice data frame:
summary(mice)
47
When sex is a variable in the data frame, what problem can be caused when trying to make summary statistics? What are possible solutions?
- Sex isn’t a numerical variable - Any numbers used would have no intrinsic meaning so wouldn’t be very useful - Better summaries involve knowing the number of males or females - To do this, we would need to make R a categorical variable
48
What are categorical variables called in R?
Factors
49
What function is used to convert numerical values to factors?
factor()
50
List the function that would be used to convert the sex column from the data frame “mice” to a factor:
Factor(mice$sex)
51
When a variable has been converted in to a factor, the outcome will not be stored unless told to be. How do you store the date in this format?
- By assigning the converted data to a column under the name of the converted variable - Here, it will overdue the current column contents
52
class()
Function helps you check what kind of variable the variable is being saved as
53
dbinom( x, size= , prob= )
Function for discrete binomial distributions Where x, size and probability are the arguments
54
List the function for finding the probability of getting 3 heads after tossing a coin 4 times:
dbinom(3, 4, 0.5)
55
pbinom()
Function calculates the probability of observing up to a certain number of successes or events
56
What are the arguments for pbinom?
pbinom (x, size = , prob = ) - X= observed number of outcomes - Size= sample size - Prob= Probability of success
57
What is the probability of getting up to 3 heads after tossing 6 different coins?
pbinom( 3, 6, 0.5)
58
15 people are admitted to hospital with a heart attack. 4 in 100 people die of a fatal heart attack. What is the probability that more than 4 people die?
1- pbinom(4, 15, 0.04)
59
Pnorm()
Calculates cumulative probability of a normal distribution
60
What is a similarity and difference between the pnorm() and pbinom() functions?
- They both calculate cumulative probability | - pnorm() is used for normal distribution and pbinom() is used for binomial distributions
61
What are the arguments for pnorm()?
pnorm(x, mean, sd) - Where x is the observation made - Where mean is the mean of the population - Where sd is the standard deviation
62
The probability that two people share the same birthday in a cohort of 150 is being calculated. Explain why the function below is incorrect. dbinom(2, 150, 0.00273973)
- As two people have the same birthday, size has to be the number of trials and we are comparing 1 birthday to 149 other birthdays - This means the number of trials is 149 - As x is the number of successes, you get a success when 1 other person has the same birthday, so x = 1
63
The probability of sharing a birthday with at least two people in a cohort of 150 is being calculated. Why is the following function incorrect? Pbinom(2, 149, 0.00273973)
- Mutual exclusivity must be taken into account - pbinom(1, 149, 1/365) is the probability of sharing a birthday with none or one other person - Taking the value away from 1 finds the probability of sharing a birthday with more than 1 person
64
qbinom()
Function calculates critical value of a given distribution at a specific alpha
65
What are the arguments in the function qbinom()?
qbinom( alpha, size= , prob=) - Alpha is the probability of success - size is the number of trials - prob is the probability of success
66
Out of 100 tosses of a coin, 59 were heads. With a value of alpha being 0.05, calculate the critical values given a two-tailed test:
The Lower critical value: qbinom(0.025, 100, 0.5) The upper critical value: qbinom(0.975, 100, 0.5) Remember during a two tailed test, the significance level is 0.025 as it is shared between both tails
67
4 in every 100 people suffering with heart attacks die. 15 people are admitted to a hospital and 3 of them die. A doctor is cared this is abnormally high, so finds 4% of 15 (which turns out to be 0.6) but is not sure what this means. Null hypothesis - hospital does not suffer more fatalities from heart attacks than expected. Alternative hypothesis - hospital suffers more fatalities from heart attacks then expected. 1. Calculate the critical value for heart attacks at hospital. Now as a two tailed test: 2. Calculate the lower critical value 3. Calculate the upper critical value
1. qbinom(0.95, 15, 0.04) 2. qbinom(0.025, 15, 0.04) 3. qbinom(0.975, 15, 0.04)
68
Why would we use a one-tailed test over a two-tailed test when trying to figure out if the fatality rate from heart attack in a hospital is higher than expected?
We are not interested in whether the hospital has a lower than expected fatality rate
69
Binom.test()
Function
70
What are the arguments in the binom.test() function?
Binom.test( x, size = , prob = ) - X is the observed number of a particular outcome - size is the number of trials - prob is the probability of success
71
What is used to compare the mean of one sample to a particular value?
One sample t test
72
What is the function for a t-test?
t.test()
73
What are the arguments for a one sample t-test?
t. test(DataFrameName, mu= ) - Mu is the population mean - DataFrameName is the name of the data frame
74
What test compares the means of two samples?
The two sample t-test
75
The weight of 10 mice for each cohort is stored in data1. The data is arranged with the weights stored in the weight column and a treatment column containing either control or drug. 1. Code a boxplot of weight grouped by treatment: 2. Perform a two sample t-test to compare the weights grouped by treatment option:
1. Boxplot(weight ~ treatment, data = data1) 2. t-test(weight~treatment, data =data1) As we are doing t-test on the same thing as we made boxplot on, we can use the same arguments in both functions
76
What are the arguments for the function t.test() for a two sample t-test?
t. test(OutputVariable ~ VariableThisIsGroupedBy, data= ) | - Where data is the name of the data frame
77
Plot()
Function plots a scatter graph of data
78
What happens if the grouping variable used in the plot() function is a factor? How do you resolve this?
- If the grouping variable is a factor, R will produce a boxplot - Use the function “as.numeric(NameOfGroupingVariable)” to plot a scatter graph
79
Plot a scatter graph of output grouped by day from data2:
Plot( output~as.numeric(day), data = data2)
80
How do you colour in points on a graph showing data?
By using the argument “col=“ to colour the data points according to the variable you want
81
How do we change the t-test so it becomes a paired t-test?
By adding the argument paired=TRUE
82
When is a paired t-test useful?
- With two sample t-tests compare the mean of two groups - The variation between replicates overrides the effects of the independent variable (the one we are looking to see has a significant effect) - Paired t-test analyses the data in pairs to stop this happening
83
What is required in R for the paired t-test to work?
The data must be arranged in the same order in each group (sorted/grouped by the same variable)
84
When is the paired t-test used?
- When observations in one group can be paired with observations in the other group - There needs to be a reason why an observation in one group is more closely related to one particular observation than the other observations in the second group
85
When can observations in one group be paired with observations in the other group?
- The observations were performed on the same subject | - The observations were performed at the same time
86
qqnorm()
Functions plots data into a graph with sample quantiles against theoretical quantiles (Q-Q plot)
87
qqline()
Adds a line to Q-Q plot to see if data is normally distributed
88
R automatically uses the Welch’s t-test, which does not assume equal variance of the two populations from which the samples have been taken. How do you specify in R to change this when variance is equal? What effect does this have?
By adding the argument “var.equal =TRUE” - It increases the power of the test, but there is little advantage in most situations