Program Flashcards
what are core tidyverse packages?
the ggplot2, tibble, tidyr, readr, purrr, and dplyr packages
Y master one thing at a time?
However, we strongly believe that it’s best to master one tool at a time. You will get better faster if you dive deep, rather than spreading yourself thinly over many topics. This doesn’t mean you should only know one thing, just that you’ll generally learn faster if you stick to one thing at a time. You should strive to learn new things throughout your career, but make sure your understanding is solid before you move on to the next interesting thing.
Where does pipe package comes from?
The pipe, %>%, comes from the magrittr package by Stefan Milton Bache
if you load tidyverse do u need to load magrittr?
Packages in the tidyverse load %>% for you automatically, so you don’t usually load magrittr explicitly
Pipe cant work for 2 classes ?
- Functions that use the current environment. For example, assign() will create a new variable with the given name in the current environment: 2. Functions that use lazy evaluation. In R, function arguments are only computed when the function uses them, not prior to calling the function. The pipe computes each element in turn, so you can’t rely on this behaviour.
Pipes are most useful for rewriting a fairly short linear sequence of operations. when not use pipe?
- Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names. That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent. 2. You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe. 3. You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.
For assignment magrittr provides the %<>% operator which allows you to do what?
replace code like: mtcars % transform(cyl = cyl * 2) with mtcars %<>% transform(cyl = cyl * 2) I’m not a fan of this operator because I think assignment is such a special operation that it should always be clear when it’s occurring. In my opinion, a little bit of duplication (i.e. repeating the name of the object twice) is fine in return for making assignment more explicit.
Functions. why?
Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. three advantages 1. You can give a function an evocative name that makes your code easier to understand. 2. As requirements change, you only need to update code in one place, instead of many. 3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
On writting function do we need tidyverse or base R?
On R4DSC the focus is just to use function for base R , so no need of using any library.
When should you consider writting a function?
You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?
To use tibble to create data.frame , u need to specify it explicitly if u dont import it from library.
df
There are 3 key steps to create a function. what are they?
There are three key steps to creating a new function: 1. You need to pick a name for the function. Here I’ve used rescale01 because this function rescales a vector to lie between 0 and 1. 2. You list the inputs, or arguments, to the function inside function. Here we have just one argument. If we had more the call would look like function(x, y, z). 2. You place the code you have developed in body of the function, a { block that immediately follows function(…).
It’s easier to start with working code and turn it into a function; it’s harder what?
it’s harder to create a function and then try to make it work.
Generally function names should be what ? argument should be what?
Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. mean() is better than compute_mean()), or accessing some property of an object (i.e. coef() is better than get_coefficients()). A good sign that a noun might be a better choice is if you’re using a very broad verb like “get”, “compute”, “calculate”, or “determine”. Use your best judgement and don’t be afraid to rename a function if you figure out a better name later.
What is recommendation for naming function in R?
If your function name is composed of multiple words, I recommend using “snake_case” where each lowercase word is separated by an underscore.camelCase is a popular alternative. It doesn’t really matter which one you pick, the important thing is to be consistent: # Never do this! col_mins
How to name family of function that do similar things?
If you have a family of functions that do similar things, make sure they have consistent names and arguments. Use a common prefix to indicate that they are connected. That’s better than a common suffix because autocomplete allows you to type the prefix and see all the members of the family. # Good input_select() input_checkbox() input_text() # Not so good select_input() checkbox_input() text_input() A good example of this design is the stringr package: if you don’t remember exactly which function you need, you can type str_ and jog your memory
How do we use comment in function?
Use comments, lines starting with #, to explain the “why” of your code. You generally should avoid comments that explain the “what” or the “how”. If you can’t understand what the code does from reading it, you should think about how to rewrite it to be more clearly. Another important use of comments is to break up your file into easily readable chunks. Use long lines of - and = to make it easy to spot the breaks. # Load data ————————————– # Plot data ————————————–
?if does not give u help menu. How to do it?
?if
To get help on if you need to surround it in backticks:
What || (or) and && (and) use for?
You can use || (or) and && (and) to combine multiple logical expressions. Dont use I . You should never use | or & in an if statement: these are vectorised operations that apply to multiple values (that’s why you use them in filter())
be careful when testing equality with == . why?
Be careful when testing for equality. == is vectorised, which means that it’s easy to get more than one output Either check the length is already 1, collapse with all() or any(), or use the non-vectorised identical(). identical() is very strict: it always returns either a single TRUE or a single FALSE, and doesn’t coerce types. This means that you need to be careful when comparing integers and doubles:
near() and ==
There’s another common problem you might encounter when using ==: floating point numbers. These results might surprise you! remember that every number you see is an approximation. Instead of relying on ==, use near() near(sqrt(2) ^ 2, 2) #> [1] TRUE near(1 / 49 * 49, 1) #> [1] TRUE sqrt(2) ^ 2 == 2 #> [1] FALSE 1 / 49 * 49 == 1 #> [1] FALSE
Multiple condition
use if else if or switch . But if you end up with a very long series of chained if statements, you should consider rewriting. One useful technique is the switch() function. It allows you to evaluate selected code based on position or name.
if u have a logical vector , how can you collapse it to single value?
If you do have a logical vector, you can use any() or all() to collapse it to a single value.
Code style
Both if and function should (almost) always be followed by squiggly brackets ({}), and the contents should be indented by two spaces An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces. # Good if (y < 0 && debug) { message(“Y is negative”) } if (y == 0) { log(x) } else { y ^ x }
argument of function fall into 2.
The arguments to a function typically fall into two broad sets: one set supplies the data to compute on, and the other supplies arguments that control the details of the computation. For example: n mean(), the data is x, and the details are how much data to trim from the ends (trim) and how to handle missing values (na.rm). Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values
spaces in function call and after = , what is best way of using them ?
Notice that when you call a function, you should place a space around = in function calls, and always put a space after a comma, not before (just like in regular English) # Good average
how to name arguments?
The names of the arguments are also important. R doesn’t care, but the readers of your code (including future-you!) will. Generally you should prefer longer, more descriptive names, but there are a handful of very common, very short names. It’s worth memorising these: x, y, z: vectors. w: a vector of weights. df: a data frame. i, j: numeric indices (typically rows and columns). n: length, or number of rows. p: number of columns.
Checking values in function
It’s good practice to check important preconditions, and throw an error (with stop()), if they are not true: wt_mean
What is NA in R?
Missing data in R appears as NA. NA is not a string or a numeric value, but an indicator of missingness. We can create vectors with missing values. x1
What is na.omit and na.exclude?
na.omit and na.exclude: returns the object with observations removed if they contain any missing values; differences between omitting and excluding NAs can be seen in some prediction and residual functions
How to use R’s ellipsis feature when writing your own function?
https://stackoverflow.com/questions/3057341/how-to-use-rs-ellipsis-feature-when-writing-your-own-function
What deos that means R is lazy evaluation programming language?
Arguments in R are lazily evaluated: they’re not computed until they’re needed. That means if they’re never used, they’re never called. This is an important property of R as a programming language, but is generally not important when you’re writing your own functions for data analysis
How many types of vectors?
- Atomic vectors, of which there are six types: logical, integer, double, character, complex, and raw. Integer and double vectors are collectively known as numeric vectors. 2. Lists, which are sometimes called recursive vectors because lists can contain other lists. The chief difference between atomic vectors and lists is that atomic vectors are homogeneous, while lists can be heterogeneous
What is relationship between NULL and NA ?
NULL is often used to represent the absence of a vector (as opposed to NA which is used to represent the absence of a value in a vector). NULL typically behaves like a vector of length 0
Every vector has two key properties, what are they?
- Its type, which you can determine with typeof(). 2.Its length, which you can determine with length().
There are three important types of augmented vector:?
- Factors are built on top of integer vectors. 2. Dates and date-times are built on top of numeric vectors. 3. Data frames and tibbles are built on top of lists.
What are the 4 most important of atomic vector>
The four most important types of atomic vector are logical, integer, double, and character. Raw and complex are rarely used during a data analysis,