Foundation R4DSC Flashcards
what makes R programming beautiful?
despite its frustrating quirks, R is, at its heart, an elegant and beautiful language, well tailored for data analysis and statistics.
some of the best features are:?
It’s free, open source, and available on every major platform. As a result, if you do your analysis in R, anyone can easily replicate it.
A massive set of packages for statistical modelling, machine learning, visualisation, and importing and manipulating data. Whatever model or graphic you’re trying to do, chances are that someone has already tried to do it. At a minimum, you can learn from their efforts.
Cutting edge tools. Researchers in statistics and machine learning will often publish an R package to accompany their articles. This means immediate access to the very latest statistical techniques and implementations.
Deep-seated language support for data analysis. This includes features likes missing values, data frames, and subsetting.
A fantastic community. It is easy to get help from experts on the R-help mailing list, stackoverflow, or subject-specific mailing lists like R-SIG-mixed-models or ggplot2. You can also connect with other R learners via twitter, linkedin, and through many local user groups.
Powerful tools for communicating your results. R packages make it easy to produce html or pdf reports, or create interactive websites.
A strong foundation in functional programming. The ideas of functional programming are well suited to solving many of the challenges of data analysis. R provides a powerful and flexible toolkit which allows you to write concise yet descriptive code.
An IDE tailored to the needs of interactive data analysis and statistical programming.
Powerful metaprogramming facilities. R is not just a programming language, it is also an environment for interactive data analysis. Its metaprogramming capabilities allow you to write magically succinct and concise functions and provide an excellent environment for designing domain-specific languages.
Designed to connect to high-performance programming languages like C, Fortran, and C++.
Challenges in R?
Of course, R is not perfect. R’s biggest challenge is that most R users are not programmers. This means that:
Much of the R code you’ll see in the wild is written in haste to solve a pressing problem. As a result, code is not very elegant, fast, or easy to understand. Most users do not revise their code to address these shortcomings.
Compared to other programming languages, the R community tends to be more focussed on results instead of processes. Knowledge of software engineering best practices is patchy: for instance, not enough R programmers use source code control or automated testing.
Metaprogramming is a double-edged sword. Too many R functions use tricks to reduce the amount of typing at the cost of making code that is hard to understand and that can fail in unexpected ways.
Inconsistency is rife across contributed packages, even within base R. You are confronted with over 20 years of evolution every time you use R. Learning R can be tough because there are many special cases to remember.
R is not a particularly fast programming language, and poorly written R code can be terribly slow. R is also a profligate user of memory.
What are 2 metatechnique of learning R ?
There are two meta-techniques that are tremendously helpful for improving your skills as an R programmer: reading source code and adopting a scientific mindset.
Reading source code is important because it will help you write better code. A great place to start developing this skill is to look at the source code of the functions and packages you use most often. You’ll find things that are worth emulating in your own code and you’ll develop a sense of taste for what makes good R code. You will also see things that you don’t like, either because its virtues are not obvious or it offends your sensibilities. Such code is nonetheless valuable, because it helps make concrete your opinions on good and bad code.
A scientific mindset is extremely helpful when learning R. If you don’t understand how something works, develop a hypothesis, design some experiments, run them, and record the results. This exercise is extremely useful since if you can’t figure something out and need to get help, you can easily show others what you tried. Also, when you learn the right answer, you’ll be mentally prepared to update your world view. When I clearly describe a problem to someone else (the art of creating a reproducible example), I often figure out the solution myself.
Hadley Recommendation on programming?
R is still a relatively young language, and the resources to help you understand it are still maturing. In my personal journey to understand R, I’ve found it particularly helpful to use resources from other programming languages. R has aspects of both functional and object-oriented (OO) programming languages. Learning how these concepts are expressed in R will help you leverage your existing knowledge of other programming languages, and will help you identify areas where you can improve.
To understand why R’s object systems work the way they do, I found The Structure and Interpretation of Computer Programs (SICP) by Harold Abelson and Gerald Jay Sussman, particularly helpful. It’s a concise but deep book. After reading it, I felt for the first time that I could actually design my own object-oriented system. The book was my first introduction to the generic function style of OO common in R. It helped me understand its strengths and weaknesses. SICP also talks a lot about functional programming, and how to create simple functions which become powerful when combined.
To understand the trade-offs that R has made compared to other programming languages, I found Concepts, Techniques and Models of Computer Programming by Peter van Roy and Sef Haridi extremely helpful. It helped me understand that R’s copy-on-modify semantics make it substantially easier to reason about code, and that while its current implementation is not particularly efficient, it is a solvable problem.
If you want to learn to be a better programmer, there’s no place better to turn than The Pragmatic Programmer by Andrew Hunt and David Thomas. This book is language agnostic, and provides great advice for how to be a better programmer.
R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) . what are they?
Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Data frame
nd Array
Almost all other objects are built upon these foundations.
Note that R has no 0-dimensional, or scalar types.explain?
Individual numbers or strings, which you might think would be scalars, are actually vectors of length one.
What is the use of str() in baseR?
Given an object, the best way to understand what data structures it’s composed of is to use str().
Vector?
The basic data structure in R is the vector. Vectors come in two flavours: atomic vectors and lists. They have three common properties:
Type, typeof(), what it is.
Length, length(), how many elements it contains.
Attributes, attributes(), additional arbitrary metadata.
They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.
How to test is an object is vector ot not?
NB: is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. Use is.atomic(x) || is.list(x) to test if an object is actually a vector.
rare types of atomic vector?
There are two rare types that I will not discuss further: complex and raw.
Atomic vectors are always flat, even if you nest c()’s: what does this means?
c(1, c(2, c(3, 4))) ## [1] 1 2 3 4
# the same as c(1, 2, 3, 4) ## [1] 1 2 3 4
how to specify misisng value?
Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type with NA_real_ (a double vector), NA_integer_ and NA_character_.
what are diff way to find type of atomic vector?
Given a vector, you can determine its type with typeof(), or check if it’s a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().
NB: is.numeric() . does it check integere or double ? or all?
NB: is.numeric() is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.