Program Flashcards

Question

argument of function fall into 2.

Answer 1

The arguments to a function typically fall into two broad sets: one set supplies the data to compute on, and the other supplies arguments that control the details of the computation. For example: n mean(), the data is x, and the details are how much data to trim from the ends (trim) and how to handle missing values (na.rm). Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values

Answer 2

Notice that when you call a function, you should place a space around = in function calls, and always put a space after a comma, not before (just like in regular English) # Good average

Answer 3

The names of the arguments are also important. R doesn’t care, but the readers of your code (including future-you!) will. Generally you should prefer longer, more descriptive names, but there are a handful of very common, very short names. It’s worth memorising these: x, y, z: vectors. w: a vector of weights. df: a data frame. i, j: numeric indices (typically rows and columns). n: length, or number of rows. p: number of columns.

Answer 4

It’s good practice to check important preconditions, and throw an error (with stop()), if they are not true: wt\_mean

Answer 5

Missing data in R appears as NA. NA is not a string or a numeric value, but an indicator of missingness. We can create vectors with missing values. x1

Answer 6

na.omit and na.exclude: returns the object with observations removed if they contain any missing values; differences between omitting and excluding NAs can be seen in some prediction and residual functions

Answer 7

https://stackoverflow.com/questions/3057341/how-to-use-rs-ellipsis-feature-when-writing-your-own-function

Answer 8

Arguments in R are lazily evaluated: they’re not computed until they’re needed. That means if they’re never used, they’re never called. This is an important property of R as a programming language, but is generally not important when you’re writing your own functions for data analysis

Answer 9

1. Atomic vectors, of which there are six types: logical, integer, double, character, complex, and raw. Integer and double vectors are collectively known as numeric vectors. 2. Lists, which are sometimes called recursive vectors because lists can contain other lists. The chief difference between atomic vectors and lists is that atomic vectors are homogeneous, while lists can be heterogeneous

Answer 10

NULL is often used to represent the absence of a vector (as opposed to NA which is used to represent the absence of a value in a vector). NULL typically behaves like a vector of length 0

Answer 11

1. Its type, which you can determine with typeof(). 2.Its length, which you can determine with length().

Answer 12

1. Factors are built on top of integer vectors. 2. Dates and date-times are built on top of numeric vectors. 3. Data frames and tibbles are built on top of lists.

Answer 13

The four most important types of atomic vector are logical, integer, double, and character. Raw and complex are rarely used during a data analysis,

Answer 14

Logical vectors are the simplest type of atomic vector because they can take only three possible values: FALSE, TRUE, and NA. 1:10 %% 3 == 0 #\> [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE

Answer 15

nteger and double vectors are known collectively as numeric vectors. In R, numbers are doubles by default. To make an integer, place an L after the number: typeof(1) #\> [1] "double" typeof(1L) #\> [1] "integer" 1.5L #\> [1] 1.5

Answer 16

Doubles are approximations. Doubles represent floating point numbers that can not always be precisely represented with a fixed amount of memory. This means that you should consider all doubles to be approximations. For example, what is square of the square root of two? x [1] 2 x - 2 #\> [1] 4.44e-16 This behaviour is common when working with floating point numbers: most calculations include some approximation error. Instead of comparing floating point numbers using ==, you should use dplyr::near() which allows for some numerical tolerance.

Answer 17

Integers have one special value: NA, while doubles have four: NA, NaN, Inf and -Inf. All three special values NaN, Inf and -Inf can arise during division: c(-1, 0, 1) / 0 #\> [1] -Inf NaN Inf Avoid using == to check for these other special values. Instead use the helper functions is.finite(), is.infinite(), and is.nan():

Answer 18

Character vectors are the most complex type of atomic vector, because each element of a character vector is a string, and a string can contain an arbitrary amount of data.

Answer 19

Here I wanted to mention one important feature of the underlying string implementation: R uses a global string pool. This means that each unique string is only stored in memory once, and every use of the string points to that representation. This reduces the amount of memory needed by duplicated strings.

Answer 20

Note that each type of atomic vector has its own missing value: NA # logical #\> [1] NA NA\_integer\_ # integer #\> [1] NA NA\_real\_ # double #\> [1] NA NA\_character\_ # character #\> [1] NA Normally you don’t need to know about these different types because you can always use NA and it will be converted to the correct type using the implicit coercion rules

Answer 21

There are two ways to convert, or coerce, one type of vector to another: 1.Explicit coercion happens when you call a function like as.logical(), as.integer(), as.double(), or as.character(). Whenever you find yourself using explicit coercion, you should always check whether you can make the fix upstream, so that the vector never had the wrong type in the first place. For example, you may need to tweak your readr col\_types specification. 2. Implicit coercion happens when you use a vector in a specific context that expects a certain type of vector. For example, when you use a logical vector with a numeric summary function, or when you use a double vector where an integer vector is expected.

Answer 22

You may see some code (typically older) that relies on implicit coercion in the opposite direction, from integer to logical: if (length(x)) { # do something } You may see some code (typically older) that relies on implicit coercion in the opposite direction, from integer to logical: if (length(x)) { # do something }

Answer 23

typeof(c(TRUE, 1L)) #\> [1] "integer" typeof(c(1L, 1.5)) #\> [1] "double" typeof(c(1.5, "a")) #\> [1] "character"

Answer 24

One option is to use typeof(). Another is to use a test function which returns a TRUE or FALSE Base R provides many functions like is.vector() and is.atomic(), but they often return surprising results. Instead, it’s safer to use the is\_\* functions provided by purrr is\_logical() x is\_integer() x is\_double() x is\_numeric() x x is\_character() x is\_atomic() x x x x is\_list() x is\_vector() x x x x x Each predicate also comes with a “scalar” version, like is\_scalar\_atomic(), which checks that the length is 1. This is useful, for example, if you want to check that an argument to your function is a single logical value

Answer 25

R doesn’t actually have scalars: instead, a single number is a vector of length 1. Because there are no scalars, most built-in functions are vectorised, meaning that they will operate on a vector of numbers. That’s why, for example, this code works: sample(10) + 100 #\> [1] 109 108 104 102 103 110 106 107 105 101

Answer 26

1:10 + 1:2 #\> [1] 2 4 4 6 6 8 8 10 10 12 1:10 + 1:3 #\> Warning in 1:10 + 1:3: longer object length is not a multiple of shorter #\> object length #\> [1] 2 4 6 5 7 9 8 10 12 11 Here, R will expand the shortest vector to the same length as the longest, so called recycling. This is silent except when the length of the longer is not an integer multiple of the length of the shorter:

Answer 27

the vectorised functions in tidyverse will throw errors when you recycle anything other than a scalar. If you do want to recycle, you’ll need to do it yourself with rep() tibble(x = 1:4, y = 1:2) #\> Error: Tibble columns must have consistent lengths, only values of length one are recycled: #\> \* Length 2: Column `y` #\> \* Length 4: Column `x` tibble(x = 1:4, y = rep(1:2, 2)) #\> # A tibble: 4 x 2 #\> x y #\> #\> 1 1 1 #\> 2 2 2 #\> 3 3 1 #\> 4 4 2 tibble(x = 1:4, y = rep(1:2, each = 2)) #\> # A tibble: 4 x 2 #\> x y #\> #\> 1 1 1 #\> 2 2 1 #\> 3 3 2 #\> 4 4 2

Answer 28

All types of vectors can be named. You can name them during creation with c(): c(x = 1, y = 2, z = 4) #\> x y z #\> 1 2 4 Or after the fact with purrr::set\_names(): set\_names(1:3, c("a", "b", "c")) #\> a b c #\> 1 2 3

Answer 29

So far we’ve used dplyr::filter() to filter the rows in a tibble. filter() only works with tibble, so we’ll need new tool for vectors: . [is the subsetting function, and is called like x[a]

Answer 30

1. A numeric vector containing only integers. The integers must either be all positive, all negative, or zero. Subsetting with positive integers keeps the elements at those positions: x [1] "two" "four" It’s an error to mix positive and negative values: x[c(1, -1)] #\> Error in x[c(1, -1)]: only 0's may be mixed with negative subscripts

Answer 31

Subsetting with a logical vector keeps all values corresponding to a TRUE value. This is most often useful in conjunction with the comparison functions. x [1] 10 3 5 8 1 # All even (or missing!) values of x x[x %% 2 == 0] #\> [1] 10 NA 8 NA

Answer 32

1. integere 2. Logical vector 3. With charcter vector\4. nothing . The simplest type of subsetting is nothing, x[], which returns the complete x. This is not useful for subsetting vectors, but it is useful when subsetting matrices

Answer 33

If you have a named vector, you can subset it with a character vector: x xyz def #\> 5 2

Answer 34

Lists are recursive vector

Answer 35

lists can contain other lists. This makes them suitable for representing hierarchical or tree-like structures. You create a list with list() x [[1]] #\> [1] 1

Answer 36

Vectors we use [x(1)] , List we use [x[1]] all () and [] works in each case

Answer 37

What is str? Str is a compact way to display the structure of an R object. This allows you to use str as a diagnostic function and an alternative to summary. Str will output the information on one line for each basic structure. Str is best for displaying contents of lists. The goals is to get an output for any R object. str(x) #\> List of 3 #\> $ : num 1 #\> $ : num 2 #\> $ : num 3 x\_named List of 3 #\> $ a: num 1 #\> $ b: num 2 #\> $ c: num 3 Unlike atomic vectors, list() can contain a mix of objects: y List of 4 #\> $ : chr "a" #\> $ : int 1 #\> $ : num 1.5 #\> $ : logi TRUE list can contain other list . Lists can even contain other lists! z

Answer 38

There are three ways to subset a list, which I’ll illustrate with a list named a: a List of 2 #\> $ a: int [1:3] 1 2 3 #\> $ b: chr "a string" Like with vectors, you can subset with a logical, integer, or character vector. [[extracts a single component from a list. It removes a level of hierarchy from the list. str(a[[1]]) #\> int [1:3] 1 2 3 str(a[[4]]) #\> List of 2 #\> $ : num -1 #\> $ : num -5 $ is a shorthand for extracting named elements of a list. It works similarly to [[except that you don’t need to use quotes. a$a #\> [1] 1 2 3 a[["a"]] #\> [1] 1 2 3

Answer 39

The distinction between [ and [[ is really important for lists, because [[ drills down into the list while [ returns a new, smaller list.

Answer 40

Subsetting a tibble works the same way as a list; a data frame can be thought of as a list of columns. The key difference between a list and a tibble is that all the elements (columns) of a tibble must have the same length (number of rows). Lists can have vectors with different lengths as elements.

Answer 41

Atomic vectors and lists are the building blocks for other important vector types like factors and dates. I call these augmented vectors, because they are vectors with additional attributes, including class. Because augmented vectors have a class, they behave differently to the atomic vector on which they are built. 4 argumented vectors Factors Dates Date-times Tibbles

Answer 42

Factors are designed to represent categorical data that can take a fixed set of possible values. Factors are built on top of integers, and have a levels attribute:

Answer 43

Dates in R are numeric vectors that represent the number of days since 1 January 1970. ``` x \<- as.Date("1971-01-01") unclass(x) #\> [1] 365 ``` ``` typeof(x) #\> [1] "double" attributes(x) #\> $class #\> [1] "Date" ```

Answer 44

Tibbles Tibbles are augmented lists: they have class “tbl\_df” + “tbl” + “data.frame”, and names (column) and row.names attributes: The difference between a tibble and a list is that all the elements of a data frame must be vectors with the same length. All functions that work with tibbles enforce this constraint. Traditional data.frames have a very similar structure: The main difference is the class. The class of tibble includes “data.frame” which means tibbles inherit the regular data frame behaviour by default.

Answer 45

1. It’s easier to see the intent of your code, because your eyes are drawn to what’s different, not what stays the same. 2. It’s easier to respond to changes in requirements. As your needs change, you only need to make changes in one place, rather than remembering to change every place that you copied-and-pasted the code. 3. You’re likely to have fewer bugs because each line of code is used in more places.

Answer 46

iteration : which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.

Answer 47

imperative programming and functional programming On the imperative side you have tools like for loops and while loops, which are a great place to start because they make iteration very explicit, so it’s obvious what’s happening. However, for loops are quite verbose, and require quite a bit of bookkeeping code that is duplicated for every for loop. Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function. Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.

Answer 48

learn about loops. They offer you a detailed view of what it is supposed to happen at the elementary level as well as they provide you with an understanding of the data that you’re manipulating. And after you have gotten a clear understanding of loops, get rid of them. Put your effort into learning about vectorized alternatives. It pays off in terms of efficiency.

Answer 49

A general way of creating an empty vector of given length is the vector() function. It has two arguments: the type of the vector (“logical”, “integer”, “double”, “character”, etc) and the length of the vector. output \<- vector("double", length(x)). output \<- vector("double", length(x)) Before you start the loop, you must always allocate sufficient space for the output

Answer 50

output \<- vector("double", ncol(df)) # 1. output for(i in seq\_along(df)) { # 2. sequence output[[i]] \<- median(df[[i]]) # 3. body }

Answer 51

1. The output: output \<- vector("double", length(x)). Before you start the loop, you must always allocate sufficient space for the output 2. The sequence: i in seq\_along(df). This determines what to loop over: each run of the for loop will assign i to a different value from seq\_along(df) . You might not have seen seq\_along() before. It’s a safe version of the familiar 1:length(l) 3. The body: output[[i]] \<- median(df[[i]])

Answer 52

There are four variations on the basic theme of the for loop: 1. Modifying an existing object, instead of creating a new object. 2. Looping over names or values, instead of indices. 3. Handling outputs of unknown length. 4. Handling sequences of unknown length.

Answer 53

Function automatically return the result of last statement in the body. So even u dont used return or variable to return the result.

Answer 54

Fuction and iteration(which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets)

Answer 55

Most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural.dplyr, ggplot2, and all the other packages in the tidyverse are designed to work with tidy data

Answer 56

## Footnote ``` Compute cases per year table1 %\>% count(year, wt =cases) #\> # A tibble: 2 x 2 #\> year n #\>
#> 1 1999 250740
#> 2 2000 296920 ```

Answer 57

A common problem is a dataset where some of the column names are not names of variables, but values of a variable ``` table4a #\> # A tibble: 3 x 3 #\> country `1999` `2000` #\> \*
#> 1 Afghanistan 745 2666
#> 2 Brazil 37737 80488 ``` ``` table4a %\>% gather(`1999`, `2000`, key = "year", value = "cases") #\> # A tibble: 6 x 3 #\> country year cases #\>
#> 1 Afghanistan 1999 745
#> 2 Brazil 1999 37737 ```

Answer 58

left\_join(tidy4a, tidy4b)

Answer 59

names() function gets or sets the names of an object. names(x) names(x) \<- valuex: R object value: to be assigned to the x, with the same length as x, or NULL ...

Answer 60

command + Shift + C

Program Flashcards

(94 cards)