Foundation R4DSC Flashcards

1
Q

what makes R programming beautiful?

A

despite its frustrating quirks, R is, at its heart, an elegant and beautiful language, well tailored for data analysis and statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

some of the best features are:?

A

It’s free, open source, and available on every major platform. As a result, if you do your analysis in R, anyone can easily replicate it.

A massive set of packages for statistical modelling, machine learning, visualisation, and importing and manipulating data. Whatever model or graphic you’re trying to do, chances are that someone has already tried to do it. At a minimum, you can learn from their efforts.

Cutting edge tools. Researchers in statistics and machine learning will often publish an R package to accompany their articles. This means immediate access to the very latest statistical techniques and implementations.

Deep-seated language support for data analysis. This includes features likes missing values, data frames, and subsetting.

A fantastic community. It is easy to get help from experts on the R-help mailing list, stackoverflow, or subject-specific mailing lists like R-SIG-mixed-models or ggplot2. You can also connect with other R learners via twitter, linkedin, and through many local user groups.

Powerful tools for communicating your results. R packages make it easy to produce html or pdf reports, or create interactive websites.

A strong foundation in functional programming. The ideas of functional programming are well suited to solving many of the challenges of data analysis. R provides a powerful and flexible toolkit which allows you to write concise yet descriptive code.

An IDE tailored to the needs of interactive data analysis and statistical programming.

Powerful metaprogramming facilities. R is not just a programming language, it is also an environment for interactive data analysis. Its metaprogramming capabilities allow you to write magically succinct and concise functions and provide an excellent environment for designing domain-specific languages.

Designed to connect to high-performance programming languages like C, Fortran, and C++.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Challenges in R?

A

Of course, R is not perfect. R’s biggest challenge is that most R users are not programmers. This means that:

Much of the R code you’ll see in the wild is written in haste to solve a pressing problem. As a result, code is not very elegant, fast, or easy to understand. Most users do not revise their code to address these shortcomings.

Compared to other programming languages, the R community tends to be more focussed on results instead of processes. Knowledge of software engineering best practices is patchy: for instance, not enough R programmers use source code control or automated testing.

Metaprogramming is a double-edged sword. Too many R functions use tricks to reduce the amount of typing at the cost of making code that is hard to understand and that can fail in unexpected ways.

Inconsistency is rife across contributed packages, even within base R. You are confronted with over 20 years of evolution every time you use R. Learning R can be tough because there are many special cases to remember.

R is not a particularly fast programming language, and poorly written R code can be terribly slow. R is also a profligate user of memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 2 metatechnique of learning R ?

A

There are two meta-techniques that are tremendously helpful for improving your skills as an R programmer: reading source code and adopting a scientific mindset.

Reading source code is important because it will help you write better code. A great place to start developing this skill is to look at the source code of the functions and packages you use most often. You’ll find things that are worth emulating in your own code and you’ll develop a sense of taste for what makes good R code. You will also see things that you don’t like, either because its virtues are not obvious or it offends your sensibilities. Such code is nonetheless valuable, because it helps make concrete your opinions on good and bad code.

A scientific mindset is extremely helpful when learning R. If you don’t understand how something works, develop a hypothesis, design some experiments, run them, and record the results. This exercise is extremely useful since if you can’t figure something out and need to get help, you can easily show others what you tried. Also, when you learn the right answer, you’ll be mentally prepared to update your world view. When I clearly describe a problem to someone else (the art of creating a reproducible example), I often figure out the solution myself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hadley Recommendation on programming?

A

R is still a relatively young language, and the resources to help you understand it are still maturing. In my personal journey to understand R, I’ve found it particularly helpful to use resources from other programming languages. R has aspects of both functional and object-oriented (OO) programming languages. Learning how these concepts are expressed in R will help you leverage your existing knowledge of other programming languages, and will help you identify areas where you can improve.

To understand why R’s object systems work the way they do, I found The Structure and Interpretation of Computer Programs (SICP) by Harold Abelson and Gerald Jay Sussman, particularly helpful. It’s a concise but deep book. After reading it, I felt for the first time that I could actually design my own object-oriented system. The book was my first introduction to the generic function style of OO common in R. It helped me understand its strengths and weaknesses. SICP also talks a lot about functional programming, and how to create simple functions which become powerful when combined.

To understand the trade-offs that R has made compared to other programming languages, I found Concepts, Techniques and Models of Computer Programming by Peter van Roy and Sef Haridi extremely helpful. It helped me understand that R’s copy-on-modify semantics make it substantially easier to reason about code, and that while its current implementation is not particularly efficient, it is a solvable problem.

If you want to learn to be a better programmer, there’s no place better to turn than The Pragmatic Programmer by Andrew Hunt and David Thomas. This book is language agnostic, and provides great advice for how to be a better programmer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) . what are they?

A

Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Data frame
nd Array

Almost all other objects are built upon these foundations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Note that R has no 0-dimensional, or scalar types.explain?

A

Individual numbers or strings, which you might think would be scalars, are actually vectors of length one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the use of str() in baseR?

A

Given an object, the best way to understand what data structures it’s composed of is to use str().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Vector?

A

The basic data structure in R is the vector. Vectors come in two flavours: atomic vectors and lists. They have three common properties:

Type, typeof(), what it is.
Length, length(), how many elements it contains.
Attributes, attributes(), additional arbitrary metadata.

They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to test is an object is vector ot not?

A

NB: is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. Use is.atomic(x) || is.list(x) to test if an object is actually a vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

rare types of atomic vector?

A

There are two rare types that I will not discuss further: complex and raw.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Atomic vectors are always flat, even if you nest c()’s: what does this means?

A
c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
# the same as
c(1, 2, 3, 4)
## [1] 1 2 3 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to specify misisng value?

A

Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type with NA_real_ (a double vector), NA_integer_ and NA_character_.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are diff way to find type of atomic vector?

A

Given a vector, you can determine its type with typeof(), or check if it’s a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NB: is.numeric() . does it check integere or double ? or all?

A

NB: is.numeric() is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Coercion

A

All elements of an atomic vector must be the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character.

For example, combining a character and an integer yields a character:

str(c("a", 1))
##  chr [1:2] "a" "1"
17
Q

What happend when a logical vector is coreced to integer?

A

When a logical vector is coerced to an integer or double, TRUE becomes 1 and FALSE becomes 0. This is very useful in conjunction with sum() and mean()

x

18
Q

What are List?

A

ists are different from atomic vectors because their elements can be of any type, including lists. You construct lists by using list() instead of c():

x

19
Q

Why Lists are called recursive vectors?

A

Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors.

x

20
Q

what happend when we combine c() and List()

A

c() will combine several lists into one. If given a combination of atomic vectors and lists, c() will coerce the vectors to lists before combining them. Compare the results of list() and c():

21
Q

What is the type of List and hwo to unlist , change to list?

A

The typeof() a list is list. You can test for a list with is.list() and coerce to a list with as.list(). You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().

22
Q

Why list are some important in R?

A

Lists are used to build up many of the more complicated data structures in R. For example, both data frames (described in data frames) and linear models objects (as produced by lm()) are lists:

23
Q

Attributes?

A

All objects can have arbitrary additional attributes, used to store metadata about the object. Attributes can be thought of as a named list (with unique names). Attributes can be accessed individually with attr() or all at once (as a list) with attributes().

y

24
Q

By default, most attributes are lost when modifying a vector. The only attributes not lost are the three most important. Which one are they?

A

The only attributes not lost are the three most important:

Names, a character vector giving each element a name, described in names.

Dimensions, used to turn vectors into matrices and arrays, described in matrices and arrays.

Class, used to implement the S3 object system, described in S3.

Each of these attributes has a specific accessor function to get and set values. When working with these attributes, use names(x), dim(x), and and class(x), not attr(x, “names”), attr(x, “dim”), and attr(x, “class”).

25
Q

How to name a vectore?

A

You can name a vector in three ways:

When creating it: x

26
Q

Does names in naming vector needs to be unique?

A

Names don’t have to be unique. However, character subsetting, described in subsetting, is the most important reason to use names and it is most useful when the names are unique.

27
Q

Do you need to names all atribute of a vector?

A

Not all elements of a vector need to have a name. If some names are missing when you create the vector, the names will be set to an empty string for those elements. If you modify the vector in place by setting some, but not all variable names, names() will return NA (more specifically, NA_character_) for them. If all names are missing, names() will return NULL.

28
Q

how to remove names in vectore? or create new vector without name?

A

You can create a new vector without names using unname(x), or remove names in place with names(x)

29
Q

Factors?

A

One important use of attributes is to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class, “factor”, which makes them behave differently from regular integer vectors, and the levels, which defines the set of allowed values.

x

30
Q

can u combine factors with c()?

A
# NB: you can't combine factors
c(factor("a"), factor("b"))
## [1] 1 1
31
Q

Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor.How to remedy this?

A

Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way like . or -. To remedy the situation, coerce the vector from a factor to a character vector, and then from a character to a double vector. (Be sure to check for missing values after this process.) Of course, a much better plan is to discover what caused the problem in the first place and fix that; using the na.strings argument to read.csv() is often a good place to start.

32
Q

most data loading functions in R automatically convert character vectors to factors. how to overcome this?

A

Unfortunately, most data loading functions in R automatically convert character vectors to factors. This is suboptimal, because there’s no way for those functions to know the set of all possible levels or their optimal order. Instead, use the argument stringsAsFactors = FALSE to suppress this behaviour, and then manually convert character vectors to factors using your knowledge of the data. A global option, options(stringsAsFactors = FALSE), is available to control this behaviour, but I don’t recommend using it. Changing a global option may have unexpected consequences when combined with other code (either from packages, or code that you’re source()ing), and global options make code harder to understand because they increase the number of lines you need to read to understand how a single line of code will behave.

33
Q

are factors chacters or not?

A

While factors look (and often behave) like character vectors, they are actually integers. Be careful when treating them like strings. Some string methods (like gsub() and grepl()) will coerce factors to strings, while others (like nchar()) will throw an error, and still others (like c()) will use the underlying integer values. For this reason, it’s usually best to explicitly convert factors to character vectors if you need string-like behaviour. In early versions of R, there was a memory advantage to using factors instead of character vectors, but this is no longer the case.