R-Code for Exam, Modules 1-8 Flashcards

Question

colnames(DATASET_NAME)

Answer 1

provides the names of all of the columns in a data set

Answer 2

provides the number of columns and the number of rows in the data set

Answer 3

provides the internal structure of the data set

Answer 4

provides the range in values of a certain variable from the dataset

Answer 5

provides the X% quantile of a certain variable from the dataset, which is to say that X% of the other observations are below it and (100-X)% of the observations are above it ("X%" is expressed in decimal form, not as a percentage)

Answer 6

provides the X%, Y%, and Z% quantiles for a certain variable from the dataset, with X% < Y% < Z%

Answer 7

provides all of the observed values or name for a variable from the data set

Answer 8

for all of the unique entries of a given variable, this command tabulates the number times they appears and displays it in the "Console" area

Answer 9

would produce a list of all integer numbers between the lower integer "X" and upper integer "Y"

Answer 10

code we can write to produce a new data set with fewer columns, which includes only the columns earmarked by a list of integers between "X" and "Y" in the object called "indexes"

Answer 11

code we can write to produce a new data set with fewer rows, which includes only the rows earmarked by a list of integers between "X" and "Y" in the object called "indexes"

Answer 12

code we can write to produce a new data set with fewer columns and rows, which includes only the rows and columns earmarked by a list of integers between "X" and "Y" in the object called "indexes"

Answer 13

provides a title for the histogram when typed into the "hist(DATASET_NAME)" command after a parenthesis placed after the text "DATASET_NAME"

Answer 14

provides an x-axis label for the histogram when typed into the "hist(DATASET_NAME)" command after a parenthesis placed after the text "DATASET_NAME"

Answer 15

provides a y-axis label for the histogram when typed into the "hist(DATASET_NAME)" command after a parenthesis placed after the text "DATASET_NAME"

Answer 16

would provide the standard deviation or mean for two variables in the data set, with "VARIABLE_Y" being the y-variable and "VARIABLE_X" being the x-variable

Answer 17

we have a data set which has rows with values "NA" under certain variables, and we want to exlude these from "NEW_DATASET"

Answer 18

would create a scatter-plot relating an explanatory variable to a response variable for a new dataset, "NEW_DATASET".

Answer 19

line of code which, if typed inside of the parentheses in the "plot()" command, will change the open circles denoting coordinates from the open circles to something else

Answer 20

two commands which, if typed inside of the parentheses in the "plot()" command, will denote the range in the x/y-axes for the viewing window

Answer 21

line of code written on the inside of the parentheses in the command "plot()" to place a desired name at a particular set of coordinates

Answer 22

code written which generates the same set of random numbers and allows the use of those same integers later on

Answer 23

we want to establish a normal distribution curve (called "norm_dist"), which has an explicit number of observations (n), mean value of observations (mu), and an explicit standard deviation (sigma) -- what are the four lines of code necessary to establish "norm_dist"?

Answer 24

to establish a list of numbers (called "brk_points") which is bounded between two values (LOW, HIGH) and is sub-divided between each number in the set by 'SIZE'

Answer 25

command which would create a histogram for the normal distribution "norm_dist", whose x-axis is bound between (LOW, HIGH) and which has a width of the bins defined by the number sequence from the previous question

Answer 26

if all of the averages (mu) for various normal distributions is the same value (x), but the standard deviations being different will cause some curves to be thinner (reduced variation) and those with larger variation will be wider and flatter (greater variation)

Answer 27

if averages (mu) for various normal distributions are different, but the standard deviations are all the same value (y), the curves will have the same overall shape but they will be centered at different parts of the x-axis

Answer 28

intercept ; slope ; measure of the spread of frequency ; sample size

Answer 29

situation in which we have a dataset with an explanatory variable with multiple unique traits (producer 1, producer 2, etc.), and we want to establish a dataset which includes only the responses associated with one of those unique representatives (i.e., all of the y-outputs associated with producer 1, or all of the y-outputs associated with producer 2)

Answer 30

two commands which establish a normal Q-Q plot for the relevant data we want to inspect for the normal distribution

Answer 31

command would perform a common log transformation on the list of values in "RELEVANT_DATASET", then place those values in the object "log_RELEVANT"

Answer 32

expresses the possibility that a log transformation of the original sample values may adhere better to the normal distribution than the original values of the samples

Answer 33

code to run to perform a t-test on a dataset, with "mu" standing in for the true value of the means and "alternative" specifying if the alternative hypothesis is that the sample mean is "less", "greater", or has a (default) "two-sided" difference from the true mean in the population

Answer 34

makes an object called "numerator_1sided" which is the mean of all of the values in the sample, minus the mean in the population (or in a claim made by a vendor)

Answer 35

establishes an object which is as many units long as there are samples in the study with data

Answer 36

makes an object called "denominator_1sided" which is the standard deviation of the sampled values, divided by the square root of the number of samples present

Answer 37

creates the object "T_statistic_1sided" for the manual calculation of the T-statistic associated with a manually-performed T-test -- "T_statistic_1sided" makes use of two previously-generated values, "numerator_1sided" and "denominator_1sided"

Answer 38

creates the object "df_1sided", which represents the degrees of freedom present in a manually-generated t-test, based on the length of the dataset (HINT; the object "df_1sided" and the object "n" differ in regard to only one thing)

Answer 39

allows one to calculate the P-value for a one-sided t-statistic, based on the pre-established objects "T_statistic_1sided" and "df_1sided"

Answer 40

set of six commands we could input using the query~index~new dataset method to split a larger dataset with a category with two representatives (Producer 1 and Producer 2; Producer A and Producer B) into two new datasets with just their values

Answer 41

command to run a two-sided t-test which relates the data from the values in one categorical explanatory variable (DATA_CAT_1/A) to the values in the other categorical response variable (DATA_CAT_2/B)

Answer 42

creates an object ("numerator_2sided") which can be used to calculate the numerator in order to derive the t-statistic by hand for a two-sided test

Answer 43

creates two objects which have the number of samples in the two unique categories used for our two-sided t-test (i.e., Farm 1 and Farm 2; Producer A and Producer B)

Answer 44

creates the object "df_2sided", which represents the degrees of freedom present in a manually-generated t-test, based on the length of the dataset (HINT; the object "df_2sided" is related to the objects "n_1/A" and "n_2/B"

Answer 45

list of commands used to find the standard deviation of the samples ("samp_sig"), which is the square root of the calculated pooled variance for DATA_CAT_1/A and DATA_CAT_2/B ("samp_sig2")

Answer 46

the denominator in a manually calculated t-statistic for a two-sided t-test ("denominator_2sided") equals the product of our derived standard deviation of the samples ("samp_sig"), times the square root of the inverse values of the number of samples between Category 1/A and Category 2/B

Answer 47

the t-statistic in a two-sided t-test is equal to the 2-sided numerator and the 2-sided denominator, which were previously worked out

Answer 48

code used to calculate the P-value associated with a two-sided t-test

Answer 49

command to run a paired two-sided t-test (as opposed to the default unpaired two-sided t-test) which relates the values of one variable (TREAT_1) to the values of another variable (TREAT_2)

R-Code for Exam, Modules 1-8 Flashcards

(73 cards)