R Flashcards
What is the difference between a categorical variable and a continuous variable?
A categorical variable can belong to a limited number of categories and a continuous variable can correspond to an infinite number of values.
For example: sex is a categorical variable because it is limited to ‘Male’ or ‘Female’.
What is R?
R is an open-source language and environment for statistical computing and analysis.
Can you write and explain some of the most common syntax in R?
— as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be used to make code more readable by reminding future inspectors what blocks of code are intended to do.
”” — quotes operate as one might expect; they denote a string data type in R.
How do you list the preloaded datasets in R?
To view a list of preloaded datasets in R, simply type data() into the console and hit enter.
What are some advantages of R?
Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s widely accessible, free to use, and extensible.
Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the wheel as a data scientist.
Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.
What are the disadvantages of R?
Memory and performance. In comparison to Python, R is often said to be the lesser language in terms of memory and performance. This is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.
Open source. Being open source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest quality.
Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.
What are the similarities and differences between R and Python?
There are many comparisons to draw between Python and R. They are both free. They both have strong modeling capabilities. Python is generally considered more secure and easier to learn, but R is typically thought to have better visualization tools and libraries. In many jobs, you’ll be expected to use both R and Python, so it’s good to know about both, even if you aren’t fluent in both languages.
When is it appropriate to use the “next” statement in R?
A data scientist will use next to skip an iteration in a loop. As an example:
x
How do you assign a variable in R?
Variable assignment in R is a bit different from other languages. Rather than using an = sign, we typically use a less-than sign, < ,followed by a minus, –. An equals sign, =, still works, but there are arguments about its readability in addition to instances where it can actually muck up your code.
What are the different data types/objects in R?
Unlike other object-oriented languages such as C, R doesn’t ask users to declare a data type when assigning a variable. Instead, everything in R correlates to an R data object. When you assign a variable in R, you assign it a data object and that object’s data type determines the data type of the variable. The most commonly used data objects include:
Vectors Matrices Lists Arrays Factors Data frames
How do you import data in R?
Let’s use CSV as an example, as it’s a very common data format. Simply make sure the file is saved in a CSV format, then use the read function to import the data.
yourRDateHere
How do you install a package in R?
There are many ways to install a package in R. Some even include using the GUI. We’re coders, so we’re not going to give those attention.
Type the following into your console and hit enter:
install.packages(“package_name”)
Followed by:
library(package_name)
It’s that simple. The first command installs the package and the second loads the package into the session.
What is the use of with() in R?
We use the with() function to write simpler code by applying an expression to a data set. Its syntax looks like this:
with(randomDataSet, expression.test(sample))
What is the use of by() in R?
Like with(), by() can help write DRY (don’t repeat yourself) code.
You can use by() to apply a function to a data frame split by factors. Its usage is something like this:
by(data, factor, function, …)
The data frame plugged into this function is split into data frames (by row) subsetted by the values of factor(s), and a function is then applied to each subset.
When is it appropriate to use mode()?
By default, mode() gets or sets the storage mode of an object. It’s default usage is equivalent to storage.mode(). A sample usage:
x