Program Flashcards
what are core tidyverse packages?
the ggplot2, tibble, tidyr, readr, purrr, and dplyr packages
Y master one thing at a time?
However, we strongly believe that it’s best to master one tool at a time. You will get better faster if you dive deep, rather than spreading yourself thinly over many topics. This doesn’t mean you should only know one thing, just that you’ll generally learn faster if you stick to one thing at a time. You should strive to learn new things throughout your career, but make sure your understanding is solid before you move on to the next interesting thing.
Where does pipe package comes from?
The pipe, %>%, comes from the magrittr package by Stefan Milton Bache
if you load tidyverse do u need to load magrittr?
Packages in the tidyverse load %>% for you automatically, so you don’t usually load magrittr explicitly
Pipe cant work for 2 classes ?
- Functions that use the current environment. For example, assign() will create a new variable with the given name in the current environment: 2. Functions that use lazy evaluation. In R, function arguments are only computed when the function uses them, not prior to calling the function. The pipe computes each element in turn, so you can’t rely on this behaviour.
Pipes are most useful for rewriting a fairly short linear sequence of operations. when not use pipe?
- Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names. That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent. 2. You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe. 3. You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.
For assignment magrittr provides the %<>% operator which allows you to do what?
replace code like: mtcars % transform(cyl = cyl * 2) with mtcars %<>% transform(cyl = cyl * 2) I’m not a fan of this operator because I think assignment is such a special operation that it should always be clear when it’s occurring. In my opinion, a little bit of duplication (i.e. repeating the name of the object twice) is fine in return for making assignment more explicit.
Functions. why?
Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. three advantages 1. You can give a function an evocative name that makes your code easier to understand. 2. As requirements change, you only need to update code in one place, instead of many. 3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
On writting function do we need tidyverse or base R?
On R4DSC the focus is just to use function for base R , so no need of using any library.
When should you consider writting a function?
You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?
To use tibble to create data.frame , u need to specify it explicitly if u dont import it from library.
df
There are 3 key steps to create a function. what are they?
There are three key steps to creating a new function: 1. You need to pick a name for the function. Here I’ve used rescale01 because this function rescales a vector to lie between 0 and 1. 2. You list the inputs, or arguments, to the function inside function. Here we have just one argument. If we had more the call would look like function(x, y, z). 2. You place the code you have developed in body of the function, a { block that immediately follows function(…).
It’s easier to start with working code and turn it into a function; it’s harder what?
it’s harder to create a function and then try to make it work.
Generally function names should be what ? argument should be what?
Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. mean() is better than compute_mean()), or accessing some property of an object (i.e. coef() is better than get_coefficients()). A good sign that a noun might be a better choice is if you’re using a very broad verb like “get”, “compute”, “calculate”, or “determine”. Use your best judgement and don’t be afraid to rename a function if you figure out a better name later.
What is recommendation for naming function in R?
If your function name is composed of multiple words, I recommend using “snake_case” where each lowercase word is separated by an underscore.camelCase is a popular alternative. It doesn’t really matter which one you pick, the important thing is to be consistent: # Never do this! col_mins
How to name family of function that do similar things?
If you have a family of functions that do similar things, make sure they have consistent names and arguments. Use a common prefix to indicate that they are connected. That’s better than a common suffix because autocomplete allows you to type the prefix and see all the members of the family. # Good input_select() input_checkbox() input_text() # Not so good select_input() checkbox_input() text_input() A good example of this design is the stringr package: if you don’t remember exactly which function you need, you can type str_ and jog your memory
How do we use comment in function?
Use comments, lines starting with #, to explain the “why” of your code. You generally should avoid comments that explain the “what” or the “how”. If you can’t understand what the code does from reading it, you should think about how to rewrite it to be more clearly. Another important use of comments is to break up your file into easily readable chunks. Use long lines of - and = to make it easy to spot the breaks. # Load data ————————————– # Plot data ————————————–
?if does not give u help menu. How to do it?
?if
To get help on if you need to surround it in backticks:
What || (or) and && (and) use for?
You can use || (or) and && (and) to combine multiple logical expressions. Dont use I . You should never use | or & in an if statement: these are vectorised operations that apply to multiple values (that’s why you use them in filter())
be careful when testing equality with == . why?
Be careful when testing for equality. == is vectorised, which means that it’s easy to get more than one output Either check the length is already 1, collapse with all() or any(), or use the non-vectorised identical(). identical() is very strict: it always returns either a single TRUE or a single FALSE, and doesn’t coerce types. This means that you need to be careful when comparing integers and doubles:
near() and ==
There’s another common problem you might encounter when using ==: floating point numbers. These results might surprise you! remember that every number you see is an approximation. Instead of relying on ==, use near() near(sqrt(2) ^ 2, 2) #> [1] TRUE near(1 / 49 * 49, 1) #> [1] TRUE sqrt(2) ^ 2 == 2 #> [1] FALSE 1 / 49 * 49 == 1 #> [1] FALSE
Multiple condition
use if else if or switch . But if you end up with a very long series of chained if statements, you should consider rewriting. One useful technique is the switch() function. It allows you to evaluate selected code based on position or name.
if u have a logical vector , how can you collapse it to single value?
If you do have a logical vector, you can use any() or all() to collapse it to a single value.
Code style
Both if and function should (almost) always be followed by squiggly brackets ({}), and the contents should be indented by two spaces An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces. # Good if (y < 0 && debug) { message(“Y is negative”) } if (y == 0) { log(x) } else { y ^ x }


#> 1 1999 250740
#> 2 2000 296920
#> 1 Afghanistan 745 2666
#> 2 Brazil 37737 80488
#> 1 Afghanistan 1999 745
#> 2 Brazil 1999 37737