R Flashcards
What is the difference between a categorical variable and a continuous variable?
A categorical variable can belong to a limited number of categories and a continuous variable can correspond to an infinite number of values.
For example: sex is a categorical variable because it is limited to ‘Male’ or ‘Female’.
What is R?
R is an open-source language and environment for statistical computing and analysis.
Can you write and explain some of the most common syntax in R?
— as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be used to make code more readable by reminding future inspectors what blocks of code are intended to do.
”” — quotes operate as one might expect; they denote a string data type in R.
How do you list the preloaded datasets in R?
To view a list of preloaded datasets in R, simply type data() into the console and hit enter.
What are some advantages of R?
Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s widely accessible, free to use, and extensible.
Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the wheel as a data scientist.
Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.
What are the disadvantages of R?
Memory and performance. In comparison to Python, R is often said to be the lesser language in terms of memory and performance. This is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.
Open source. Being open source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest quality.
Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.
What are the similarities and differences between R and Python?
There are many comparisons to draw between Python and R. They are both free. They both have strong modeling capabilities. Python is generally considered more secure and easier to learn, but R is typically thought to have better visualization tools and libraries. In many jobs, you’ll be expected to use both R and Python, so it’s good to know about both, even if you aren’t fluent in both languages.
When is it appropriate to use the “next” statement in R?
A data scientist will use next to skip an iteration in a loop. As an example:
x
How do you assign a variable in R?
Variable assignment in R is a bit different from other languages. Rather than using an = sign, we typically use a less-than sign, < ,followed by a minus, –. An equals sign, =, still works, but there are arguments about its readability in addition to instances where it can actually muck up your code.
What are the different data types/objects in R?
Unlike other object-oriented languages such as C, R doesn’t ask users to declare a data type when assigning a variable. Instead, everything in R correlates to an R data object. When you assign a variable in R, you assign it a data object and that object’s data type determines the data type of the variable. The most commonly used data objects include:
Vectors Matrices Lists Arrays Factors Data frames
How do you import data in R?
Let’s use CSV as an example, as it’s a very common data format. Simply make sure the file is saved in a CSV format, then use the read function to import the data.
yourRDateHere
How do you install a package in R?
There are many ways to install a package in R. Some even include using the GUI. We’re coders, so we’re not going to give those attention.
Type the following into your console and hit enter:
install.packages(“package_name”)
Followed by:
library(package_name)
It’s that simple. The first command installs the package and the second loads the package into the session.
What is the use of with() in R?
We use the with() function to write simpler code by applying an expression to a data set. Its syntax looks like this:
with(randomDataSet, expression.test(sample))
What is the use of by() in R?
Like with(), by() can help write DRY (don’t repeat yourself) code.
You can use by() to apply a function to a data frame split by factors. Its usage is something like this:
by(data, factor, function, …)
The data frame plugged into this function is split into data frames (by row) subsetted by the values of factor(s), and a function is then applied to each subset.
When is it appropriate to use mode()?
By default, mode() gets or sets the storage mode of an object. It’s default usage is equivalent to storage.mode(). A sample usage:
x
What is a factor variable, and why would you use one?
A factor variable is a form of categorical variable that accepts either numeric or character string values. The most salient reason to use a factor variable is that it can be used in statistical modeling with great accuracy. Another reason is that they are more memory efficient.
When is it appropriate to use the which() function?
The which() function loops through a logical object until the condition returns TRUE and returns the index (position) of the element.
To get a sense of how this works, plug in the letters array and search for the index of a specific letter using which().
How do you concatenate strings in R?
Concatenating strings in R is less than intuitive. You don’t use a . operator, nor a + operator, and forget about the & operator. In fact, you don’t use an operator at all. Concatenating strings in R requires the use of the paste() function. Here’s an example:
hello
How do you read a CSV file in R?
Simply use the read.csv() function.
yourRDateHere
Can you create an R decision tree?
A decision tree is a familiar graph for data scientists. It represents choices and results through the graphical form of a tree. To keep things simple, let’s just go over the basics.
Install the party package to get started with making the tree.
install.packages(“party”)
This gives you access to a fancy new function: ctree(), and, at its most basic, this is all we need to create a tree. First, let’s grab some data from our package; make sure the package is loaded.
library(party)
Now we have access to some new data sets. Part of the strucchange package that bundles with party includes data on youth homicides in Boston called BostonHomicide. Let’s use that one. You can print the data to the screen if you like.
print(BostonHomicide)
Now we’ll create the tree. The usage of ctree() goes something like this:
ctree(formula,dataset)
We’ve got our data set. I’ll assign it to a variable for simplicity.
inputData
Why is R useful for data science?
R turns otherwise hours of graphically intensive jobs into minutes and keystrokes. In reality, you probably wouldn’t encounter the language of R outside the realm of data science or an adjacent field. It’s great for linear modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so much more.
Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.
Describe how R can be used for predictive analysis
As a data manipulation and visualization tool, R can most definitely be used for predictive analytics. Using the same sort of decision tree we developed earlier, one could predict how many shootings might occur in 2019 in Boston. R as a whole provides numerous tools and packages for predictive modeling, so it’s the right tool for a data scientist.
What are the two types of categorical variables?
nominal categorical variable: variable without implied order.
ordinal categorical variable: have natural ordering such as low medium and high.
What is a data frame?
Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.A data frame has the variables of a data set as columns and the observations as rows.