Test 2 - Reproducible Research with R Flashcards

1
Q

What is a “makefile”? Why do we use them?

A

A makefile is a form of batch file that executes multiple sub-files in order to create or build a project. Makefiles allow us to separate our program into parts, and be reassembled later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What R command is used to tell R where to look for and place files?

A

The “setwd” command is used to tell R where to look for and place files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What R command is used to tell R to run code in an R source code file?

A

The “source” command is used to tell R to run code in an R source code file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The factor command converts non-factor variables into _____________ variables.

A

factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If you are using GitHub or another service that uses secure URLs to host your analysis source code files you need to use the _____________ command in the devtools package.

A

source_url

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Generally, the safest and most effective way to merge two data sets together is with the __________ command.

A

merge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A mediating variable is a variable that, while not intervening between the independent and dependent variables, influences the nature and strength of their relationship. True or False?

A

False. A mediating variable is one that intervenes between an independent and dependent variable, and removes the independent variable from being able to directly affect the dependent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The __________________ command converts non-numeric variables into numeric variables.

A

as.numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an ordinal scale of measurement?

A

An ordinal scale of measurement is one that communicates greater than/less than relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the melt command do? What package is it contained in?

A

The melt command, a part of reshape2, is used to reshape data from wide format to long format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Commands in the foreign package have similar syntax to which command?

A

Commands in the foreign package have similar syntax to the read.table command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A moderating variable is influenced by the independent variable, which in turn influences the
dependent variable. True or False?

A

False. A moderating variable is one that, instead of intervening between a dependent and independent variable, instead influences the strength of the relationship between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which command would you use to search in each element of a vector?

A

To search in each element of a vector, you would use the “grep” command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You can use the ___________________ command to read data into R that is located at a non-secure URL.

A

read.table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The __________________ command is used to read data files stored in a format created by
the Stata statistical package.

A

read.dta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the primary function of research methodology?

A

The primary function of research methodology is to guide and control the acquisition of data, and to aid in extracting meaning from the data once it’s been gathered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is face validity?

A

Face validity is the extent to which an instrument looks like it’s measuring a particular characteristic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The purpose of measurement is to systematically limit the data in a way that makes it quantifiable. True/False

A

True. Measurement is designed to systematically limit data in a way that makes it quantifiable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Measurement is applied by researchers only to insubstantial phenomena. True/False

A

False. Measurement is applied to all phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Systematic measurement assists researchers in obtaining objectivity in their researcher. True/False

A

True. Systematic measurement ensures that all data is gathered in the same way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The _______________ function replaces all matches of a string.

A

gsub

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

With the melt command, what argument is used to specify id variables?

A

In the melt command, the id.vars argument is used to specify id variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

With the melt command, what happens to the remaining columns not specified as id variables?

A

In the melt command, columns not specified as id variables are melted into two new variables, “variable” and “value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Percentile ranks are often used to report performance on scholastic aptitude and achievement tests. True/False

A

True, percentile ranks are often used to report performance on scholastic aptitude and achievement tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What package is required to use the reshape command?

A

No packages are necessary to use the reshape command, it’s included in R naturally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are three techniques used to strengthen the internal validity of a study?

A

Three techniques used to strengthen the internal validity of a study are:

  • Conduct a double-blind experiment
  • Build in opportunities for triangulation
  • Conduct the study in a controlled laboratory setting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are three characteristics of a well-written research problem?

A

Three characteristics of a well-written research problem are:

  • The problem statement identifies the important factors to be investigated in the study.
  • The problem statement clearly delimits the objects of study.
  • The problem statement explicitly identifies assumptions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

You can use the __________ command to see the number of rows and columns in a data frame object.

A

dim

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a histogram?

A

A histogram is a diagram, similar to a bar plot, that defines a sequence of breaks and then counts the number of observations in the bins formed by the breaks.

30
Q

What’s the R command to create a histogram?

A

hist

31
Q

Define inter-rater reliability.

A

Inter-rater reliability is the measure of concurrence between raters, generally used as a way of determining if a rating scale is usable.

32
Q

Define content validity.

A

Content validity is an estimate how how well a measurement or metric accurately represents the construct it’s meant to represent.

33
Q

What is an interval level of measurement?

A

The interval level of measurement is one that allows for degrees of difference between items, but doesn’t allow for a ratio between them (i.e. 100 degrees is not twice as hot as 50 degrees).

34
Q

What is a nominal scale level of measurement?

A

Nomative scale is a level of measurement that is based completely on names. This manifests in things like grammar, where you have nouns, pronouns, verbs, prepositions, etc.

35
Q

What is an ordinal scale level of measurement?

A

An ordinal scale level of measurement is a level of measurement that allows a rank “order” of measurement (1st, 2nd, 3rd, or very agree, agree, neutral, etc). Data can be sorted, but doesn’t have the ability to be compared by degrees of difference.

36
Q

What is a ratio scale level of measurement?

A

A ratio scale level of measurement is a level of measurement that has a meaningful zero value, meaningful meaning that it’s not just “random”. This allows you to compare things by saying it’s twice as X as something else.

37
Q

You can use the _______________ command to read data into R that is located at a secure URL - like GitHub.

A

getURL

38
Q

Define reliability.

A

Reliability is the consistency with which a measurement instrument yields a certain result when the entity being measured hasn’t changed.

39
Q

What R command is used to load data from a plain-text file stored locally?

A

The read.table command is used to tell R to load data from a local, plain-text file.

40
Q

What R command is used to load data from a local file that’s been saved in a format used by other statistical programs? What library is this contained in?

A

To load data from a local file in statistical formats, you’ll use the command “read.dta.” It is contained in the “foreign” library, which is full of things similar to the read.table command.

41
Q

What R command is used to import plain-text data from a non-secure URL?

A

The R command best used for pulling data from a plaintext source on an unsecured URL is the read.table command, using the URL as the source.

42
Q

How would you go about pulling data from a secured website?

A

The best way to pull data from a secured website (HTTPS) would be through the use of the source_data R command. Alternatives are the source_DropboxData command, and the RCurl package.

43
Q

What command might you use in R to decompress a gz archive file?

A

A command to decompress a gz archive file would be the “gzfile” command.

44
Q

What package contains useful commands for parsing and handling web data and scraping data from websites?

A

The XML package contains useful commands for dealing with scraping web data.

45
Q

What is the first mandatory step that must be taken before merging two or more data frames?

A

Before merging two or more data frames, we need to make sure they’re in the same format.

46
Q

What R command is used to show the variable names in a data frame object?

A

The “names” command is used in R to show the variable names in a data frame object.

47
Q

What R command is used to show the variable names in a data frame object, as well as the first few rows? The last few?

A

The “head” command and the “tail” command show the first and last few rows of a data frame in R, respectively.

48
Q

What R command is used to show the number of observations and variables of a data frame object?

A

The “dim” command shows the number of observations and variables of a data frame object, also known as the dimensions.

49
Q

What R command gives you the “metadata” of a data frame?

A

The “summary” command gives the metadata of a data frame, including things like field length, class, minimum and maximum values, and so on.

50
Q

What R command lets you view a data frame in a separate window? How about interactively edit?

A

The “View” command in R lets you view a data frame in a separate window? To interactively edit a data frame in a new window, use the “fix” command.

51
Q

What R command is used to convert objects into data frames?

A

The “data.frame” command is used to convert objects into data frames.

52
Q

What is “long” formatted data?

A

In long formatted data, it is assumed that columns are variables, and rows are specific observations.

53
Q

What is “wide” formatted data?

A

Wide formatted data is data in which one variable (column) has been used multiple times to represent multiple instances of that variable for one particular observation (row). This is bad for normalization, but whatever.

54
Q

What R commands are used to reshape data?

A

The transpose (“t”) command and “reshape” are both very useful for reshaping data, and are loaded by default. However, reshape2 contains many other commands that are more helpful.

55
Q

What R command can be used to reshape our data from wide to long format? How is it used?

A

The “melt” R command can be used to reshape data from wide to long format. It is used by preserving the variables that we don’t want melted with the “id.vars” argument

56
Q

What R command can be used to rename the variables in a data frame, and what package is it in?

A

The “rename” R command from the plyr package can be used to rename variables in a data frame.

57
Q

What R command can be used to sort the data in a data frame?

A

The “order” command can be used to sort data in a data frame.

58
Q

What R command can be used to obtain a sub-set of a data frame? How is it used?

A

The “subset” command can be used to get a subset of a data frame. It accepts two arguments, the data frame itself, and the “equation” that’s tested to sort by, using variable names.

59
Q

What command in R can be used to convert a variable from one type to another?

A

The “as.” command can be used to convert a variable from one type to another, for example “as.numeric” or “as.factor”.

60
Q

What is the most simple way of merging two data sets together, and what must be true about them for it to work?

A

The easiest way to merge two data sets together in R is to use the “cbind” command, but this can only work if the data sets represent the same “observation” of the same subject; that is, if there are an identical number of rows. This is rarely useful.

61
Q

What R command would you use to combine two data sets that have identical columns and variable names?

A

To combine two data sets with identical columns or variable names, you’d use the “rbind” command.

62
Q

What R command would you use to most reliably merge two datasets? How would it work?

A

The “merge” command is the R command most used to merge data sets. You use the “by” argument of the command to tell what the primary key by which they’d be joined would be.

63
Q

What R command is used to look for duplicate data?

A

The “duplicated” command is used to look for duplicate data in a data set.

64
Q

When considering general criteria for high-quality research projects, replicability refers to the fact that ____.

A

Replicability, in regards to research, refers to the fact that any other capable researcher would be able to follow the same steps and procedures to produce comparable results.

65
Q

Professor Harris is constructing a demographic questionnaire for use in a research project. One question asks students to report their age in years. This is an example of what scale?

A

Age in years is an example of interval scale,

66
Q

When is the best time to begin a literature review for a specific research project?

A

The best time to begin a literature review for a project is before, or during, the formulation of the research project. This is so that you can better consider what you’re creating, in light of what others have done.

67
Q

What is the point of a research review?

A

The purpose of a research review is to critique and synthesize the work of others that is related to your own research problem. It should emphasize how the studies that are being reviewed relate to the research problem.

68
Q

What is the general structure of a good literature review?

A

A good literature review will begin with broad or otherwise general information, then narrow the focus to any studies that are more specific to the research problem.

69
Q

Which scale of measurement is tied to an absolute zero?

A

The ratio scale of measurement is tied to an absolute zero. This allows for actual ratio comparisons.

70
Q

What is special about a ratio scale of measurement?

A

A ratio scale of measurement is tied to an absolute zero., which allows for ratio comparisons. Absolute zero means there are no negatives along the scale.

71
Q

Professor Harris is constructing a demographic questionnaire for use in a research project. One question asks students to report their age in years. This is an example of what scale?

A

Age in years would be an example of interval scale, as the units are equal, but there is not an absolute zero.