RStudio functions: Flashcards

1
Q

A + B + Enter button

A

Answer for A + B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

*

A

Multiply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

/

A

Divide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A

To the power of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Brackets

A

To separate different functions used in one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A ← 3

A

Store the value 3 in variable A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a scalar variable?

A

A variable storing a single value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variable (eg. A) + Enter

A

Displays value stored in variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

C ← “apples”

A

Stores the word apples in variable C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are vectors?

A

Variables that can hold more than one value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

D ← c(3, 7, 1)

A

Stores vector 3, 7, 1 in variable D

C stands for combine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

D[2]

A

Allows access to the second value stored in the vector variable D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are data frames?

A

A store of large amounts of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Putting name of data frame

A

Displays the full set of data in data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cars$Mpg

A

Codes for only mpg column in the cars data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DateFrameName[RowNumber, ColumnNumber]

A

Helps distinguish between rows and columns in a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cars[ , 1]

A

When row number is left blank, it will return all the rows.

In this example, the first column of rows of car will be returned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a function?

A

Anything that performs a particular operation on our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the format that all functions follow on R Studio?

A

Function_name(argument)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an argument?

A

The input for the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What would the function “ mean(e) “ do?

A

Give the mean/average of values in vector e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What code would you use to calculate the mean mpg of cars?

A

Mean(cars$mpg)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you find the column names in a data frame?

A

By printing the whole data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Names(name of data frame )

A

Only shows the column names of a data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Head(data frame name )

A

Only shows column names and top few rows of data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why might the function “ head(name of data frame)” function cause the full data frame to be shown?

A

The function shows the first 10 rows, so if data frame has 10 or less rows then all of it will be shown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How are arguments separated in all functions?

A

Using commas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you define how many rows you want from a data frame?

A

head(name of date frame, n= )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Hist()

A

Creates a histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Draw histogram of horsepower (hp) from cars:

A

hist(cars$hp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the standard layout of data?

A
  • Each column represents a different variable

- Each row represents a different subject or replicate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How is the standard layout of data different from the layout used by people when making spreadsheets?

A

Each condition is put into separate columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is an advantage of the standard layout of data?

A

It is much easier to record additional variables in a data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the advantage of using histograms over box plots?

A

Histograms show more information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the advantage of using boxplots over histograms?

A

It is easier to compare data presented as a box plot than as a histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Boxplot()

A

Function that plots a boxplot

37
Q

How do you code for a boxplot of only one variable (column) of the data?

A

Boxplot(data-frame$ColumnName)

38
Q

What values are shown on a boxplot?

A
  • Median
  • Upper (1st) quartile
  • Lower (3rd) quartile
  • Lowest value in data
  • Highest value in data
39
Q

How much of the data falls within each interval of a boxplot (generally speaking)?

A

A quarter of the data

40
Q

How much of the data falls within the interquartile range?

A

Half (50%) of the data

41
Q

When is the boxplot() function able to produce multiple plots?

A

If given a variable to group the data by

42
Q

Name the code needed to make multiple plots using the boxplot() function:

A

boxplot(NameOfDataFrame$a ~ NameOfDataFrame$b)

  • Where a is the variable that is plotted
  • Where b is the variable for grouping the data
43
Q

In the function “boxplot(NameOfDataFrame$a ~NameOfDataFrame$b)”, what lies to the left and right of the “ ~ ” symbol?

A
  • The response variables lie to the left

- The explanatory variables lie to the right

44
Q

Summary()

A

This function neatly and easily provides several summary statistics

45
Q

List the function that would be used to make summary statistics for weight from the mice data frame:

A

summary(mice$weight)

46
Q

List the function that would be used to make summary statistics from the whole mice data frame:

A

summary(mice)

47
Q

When sex is a variable in the data frame, what problem can be caused when trying to make summary statistics? What are possible solutions?

A
  • Sex isn’t a numerical variable
  • Any numbers used would have no intrinsic meaning so wouldn’t be very useful
  • Better summaries involve knowing the number of males or females
  • To do this, we would need to make R a categorical variable
48
Q

What are categorical variables called in R?

A

Factors

49
Q

What function is used to convert numerical values to factors?

A

factor()

50
Q

List the function that would be used to convert the sex column from the data frame “mice” to a factor:

A

Factor(mice$sex)

51
Q

When a variable has been converted in to a factor, the outcome will not be stored unless told to be. How do you store the date in this format?

A
  • By assigning the converted data to a column under the name of the converted variable
  • Here, it will overdue the current column contents
52
Q

class()

A

Function helps you check what kind of variable the variable is being saved as

53
Q

dbinom( x, size= , prob= )

A

Function for discrete binomial distributions

Where x, size and probability are the arguments

54
Q

List the function for finding the probability of getting 3 heads after tossing a coin 4 times:

A

dbinom(3, 4, 0.5)

55
Q

pbinom()

A

Function calculates the probability of observing up to a certain number of successes or events

56
Q

What are the arguments for pbinom?

A

pbinom (x, size = , prob = )

  • X= observed number of outcomes
  • Size= sample size
  • Prob= Probability of success
57
Q

What is the probability of getting up to 3 heads after tossing 6 different coins?

A

pbinom( 3, 6, 0.5)

58
Q

15 people are admitted to hospital with a heart attack. 4 in 100 people die of a fatal heart attack. What is the probability that more than 4 people die?

A

1- pbinom(4, 15, 0.04)

59
Q

Pnorm()

A

Calculates cumulative probability of a normal distribution

60
Q

What is a similarity and difference between the pnorm() and pbinom() functions?

A
  • They both calculate cumulative probability

- pnorm() is used for normal distribution and pbinom() is used for binomial distributions

61
Q

What are the arguments for pnorm()?

A

pnorm(x, mean, sd)

  • Where x is the observation made
  • Where mean is the mean of the population
  • Where sd is the standard deviation
62
Q

The probability that two people share the same birthday in a cohort of 150 is being calculated. Explain why the function below is incorrect.
dbinom(2, 150, 0.00273973)

A
  • As two people have the same birthday, size has to be the number of trials and we are comparing 1 birthday to 149 other birthdays
  • This means the number of trials is 149
  • As x is the number of successes, you get a success when 1 other person has the same birthday, so x = 1
63
Q

The probability of sharing a birthday with at least two people in a cohort of 150 is being calculated. Why is the following function incorrect?

Pbinom(2, 149, 0.00273973)

A
  • Mutual exclusivity must be taken into account
  • pbinom(1, 149, 1/365) is the probability of sharing a birthday with none or one other person
  • Taking the value away from 1 finds the probability of sharing a birthday with more than 1 person
64
Q

qbinom()

A

Function calculates critical value of a given distribution at a specific alpha

65
Q

What are the arguments in the function qbinom()?

A

qbinom( alpha, size= , prob=)

  • Alpha is the probability of success
  • size is the number of trials
  • prob is the probability of success
66
Q

Out of 100 tosses of a coin, 59 were heads. With a value of alpha being 0.05, calculate the critical values given a two-tailed test:

A

The Lower critical value:
qbinom(0.025, 100, 0.5)

The upper critical value:
qbinom(0.975, 100, 0.5)

Remember during a two tailed test, the significance level is 0.025 as it is shared between both tails

67
Q

4 in every 100 people suffering with heart attacks die. 15 people are admitted to a hospital and 3 of them die. A doctor is cared this is abnormally high, so finds 4% of 15 (which turns out to be 0.6) but is not sure what this means.

Null hypothesis - hospital does not suffer more fatalities from heart attacks than expected.

Alternative hypothesis - hospital suffers more fatalities from heart attacks then expected.

  1. Calculate the critical value for heart attacks at hospital.

Now as a two tailed test:

  1. Calculate the lower critical value
  2. Calculate the upper critical value
A
  1. qbinom(0.95, 15, 0.04)
  2. qbinom(0.025, 15, 0.04)
  3. qbinom(0.975, 15, 0.04)
68
Q

Why would we use a one-tailed test over a two-tailed test when trying to figure out if the fatality rate from heart attack in a hospital is higher than expected?

A

We are not interested in whether the hospital has a lower than expected fatality rate

69
Q

Binom.test()

A

Function

70
Q

What are the arguments in the binom.test() function?

A

Binom.test( x, size = , prob = )

  • X is the observed number of a particular outcome
  • size is the number of trials
  • prob is the probability of success
71
Q

What is used to compare the mean of one sample to a particular value?

A

One sample t test

72
Q

What is the function for a t-test?

A

t.test()

73
Q

What are the arguments for a one sample t-test?

A

t. test(DataFrameName, mu= )
- Mu is the population mean
- DataFrameName is the name of the data frame

74
Q

What test compares the means of two samples?

A

The two sample t-test

75
Q

The weight of 10 mice for each cohort is stored in data1. The data is arranged with the weights stored in the weight column and a treatment column containing either control or drug.

  1. Code a boxplot of weight grouped by treatment:
  2. Perform a two sample t-test to compare the weights grouped by treatment option:
A
  1. Boxplot(weight ~ treatment, data = data1)
  2. t-test(weight~treatment, data =data1)

As we are doing t-test on the same thing as we made boxplot on, we can use the same arguments in both functions

76
Q

What are the arguments for the function t.test() for a two sample t-test?

A

t. test(OutputVariable ~ VariableThisIsGroupedBy, data= )

- Where data is the name of the data frame

77
Q

Plot()

A

Function plots a scatter graph of data

78
Q

What happens if the grouping variable used in the plot() function is a factor? How do you resolve this?

A
  • If the grouping variable is a factor, R will produce a boxplot
  • Use the function “as.numeric(NameOfGroupingVariable)” to plot a scatter graph
79
Q

Plot a scatter graph of output grouped by day from data2:

A

Plot( output~as.numeric(day), data = data2)

80
Q

How do you colour in points on a graph showing data?

A

By using the argument “col=“ to colour the data points according to the variable you want

81
Q

How do we change the t-test so it becomes a paired t-test?

A

By adding the argument paired=TRUE

82
Q

When is a paired t-test useful?

A
  • With two sample t-tests compare the mean of two groups
  • The variation between replicates overrides the effects of the independent variable (the one we are looking to see has a significant effect)
  • Paired t-test analyses the data in pairs to stop this happening
83
Q

What is required in R for the paired t-test to work?

A

The data must be arranged in the same order in each group (sorted/grouped by the same variable)

84
Q

When is the paired t-test used?

A
  • When observations in one group can be paired with observations in the other group
  • There needs to be a reason why an observation in one group is more closely related to one particular observation than the other observations in the second group
85
Q

When can observations in one group be paired with observations in the other group?

A
  • The observations were performed on the same subject

- The observations were performed at the same time

86
Q

qqnorm()

A

Functions plots data into a graph with sample quantiles against theoretical quantiles (Q-Q plot)

87
Q

qqline()

A

Adds a line to Q-Q plot to see if data is normally distributed

88
Q

R automatically uses the Welch’s t-test, which does not assume equal variance of the two populations from which the samples have been taken. How do you specify in R to change this when variance is equal? What effect does this have?

A

By adding the argument “var.equal =TRUE”

  • It increases the power of the test, but there is little advantage in most situations