Communication Flashcards

1
Q

What is the diff btw R markdown and R notebook?

A

R notebook files show the output of code chunks inside the editor, while hiding the console, when they are edited in RStudio. This contrasts with R markdown files, which show their output inside the console, and do not show output inside the editor. This makes R notebook documents appealing for interactive exploration.

R markdown files can be knit to a variety of formats including HTML, PDF, and DOCX. R notebooks can only be knit to HTML, with the extension .nb.html. The output of an R notebook keeps a copy of the original .Rmd source. If a .nb.html file is opened in RStudio, the source of the .Rmd file can be extracted and edited. In contrast, there is no way to recover the original source of an R markdown file from its output, except through the parts that are displayed in the output itself.

R markdown files and R notebooks differ in the value of output in their YAML headers

ouptut: html_notebook
- –

while for R markdwon is ouptut: html_document

Rnotebook have preview option while Rmarkdwon does not have preview option only knit

more here : https://stackoverflow.com/questions/43820483/difference-between-r-markdown-and-r-notebook/43898504#43898504

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What will happned if you change the rnotebook YAML header with r markdwon heder?

A

Copying the YAML header from an R notebook to a R markdown file changes it to an R notebook, and vice-versa. More specifically, an .Rmd file can be changed to R markdown file or R notebook by changing the value of the output key in the header.

The RStudio IDE and the rmarkdown package both use the YAML header of an .Rmd file to determine the document-type of the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Knitr shorcut ?

A

Cmd/Ctrl + Alt + K.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is out YML header for PDF , HTML , Word?

A

output: pdf_document , output: html_document , output: word_document , pdf_document: default

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to format text in R markdown?

A

italic or italic
bold __bold__
code
superscript^2^ and subscript~2~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to form heading in R markdown?

A

1st Level Header

2nd Level Header

3rd Level Header

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to make list in R markdown?

A
  • Bulleted list item 1
  • Item 2
    • Item 2a
    • Item 2b
      1. Numbered list item 1
      2. Item 2. The numbers are incremented automatically in the output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to insert images and links in Rmardkown?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Insert code chunk, 3 ways?

A
  1. Cmd/Ctrl + Alt + I
  2. The “Insert” button icon in the editor toolbar.
  3. By manually typing the chunk delimiters
    {r} and
    .
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

chunk/run code in markdwon?

A

Cmd/Ctrl + Shift + Enter, which runs all the code in the chunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Chunk has a name. how can we create name for chunk?

A

Chunks can be given an optional name: ```{r by-name}. This has three advantages:

  1. You can more easily navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor:
  2. Graphics produced by the chunks will have useful names that make them easier to use elsewhere.
  3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chunk has option. What are they usfeul for ?

A

Chunk output can be customised with options, arguments supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Chunk option eval = FALSE . what does this means?

A

eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Include = FALSE chunk option?

A

Include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

echo= FALSE, chunk option?

A

echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

message= FALSE , what does it mean? . the chunk options are all True by default

A

message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

result = hide?

A

results = ‘hide’ hides printed output; fig.show = ‘hide’ hides plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

error = TRUE

A

error = TRUE causes the render to continue even if code returns an error. This is rarely something you’ll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your .Rmd. It’s also useful if you’re teaching R and want to deliberately include an error. The default, error = FALSE causes knitting to fail if there is a single error in the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to use Table in Rmarkdwon?

A

By default, R Markdown prints data frames and matrices as you’d see them in the console:

If you prefer that data be displayed with additional formatting you can use the knitr::kable function. The code below generates Table 27.1.

knitr::kable(
mtcars[1:5, ],
caption = “A knitr kable.”
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Caching in Rmarkdwon?

A

Normally, each knit of a document starts from a completely clean slate. This is great for reproducibility, because it ensures that you’ve captured every important computation in code. However, it can be painful if you have some computations that take a long time. The solution is cache = TRUE. When set, this will save the output of the chunk to a specially named file on disk. On subsequent runs, knitr will check to see if the code has changed, and if it hasn’t, it will reuse the cached results.

e.g

```{r processed_data, cache = TRUE, dependson = “raw_data”}
processed_data %
filter(!is.na(import_var)) %>%
mutate(new_variable = complicated_transformation(x, y, z))
~~~

21
Q

Global options , what does that mean in Markdown?

A

As you work more with knitr, you will discover that some of the default chunk options don’t fit your needs and you want to change them. You can do this by calling knitr::opts_chunk$set() in a code chunk. For example, when writing books and tutorials I set:

knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE
)
This uses my preferred comment formatting, and ensures that the code and output are kept closely entwined. On the other hand, if you were preparing a report, you might set:

knitr::opts_chunk$set(
echo = FALSE
)
That will hide the code by default, so only showing the chunks you deliberately choose to show (with echo = TRUE). You might consider setting message = FALSE and warning = FALSE, but that would make it harder to debug problems because you wouldn’t see any messages in the final document.

22
Q

How to use inline code?

A

We have data about r nrow(diamonds) diamonds. Only r nrow(diamonds) - nrow(smaller) are larger than 2.5 carats. The distribution of the remainder is shown below:

When the report is knit, the results of these computations are inserted into the text:

We have data about 53940 diamonds. Only 126 are larger than 2.5 carats. The distribution of the remainder is shown below:

23
Q

what does format() does ?

A

When inserting numbers into text, format() is your friend. It allows you to set the number of digits so you don’t print to a ridiculous degree of accuracy, and a big.mark to make numbers easier to read. I’ll often combine these into a helper function:

comma [1] “3,452,345”
comma(.12358124331)
#> [1] “0.12”

24
Q

Troubleshooting Rmarkdwon?

A

he first thing you should always try is to recreate the problem in an interactive session. Restart R, then “Run all chunks” (either from Code menu, under Run region), or with the keyboard shortcut Ctrl + Alt + R. If you’re lucky, that will recreate the problem, and you can figure out what’s going on interactively.

also , Check the working directory is what you expect by including getwd() in a chunk.

Next, brainstorm all the things that might cause the bug. You’ll need to systematically check that they’re the same in your R session and your R markdown session. The easiest way to do that is to set error = TRUE on the chunk causing the problem, then use print() and str() to check that settings are as you expect

25
Q

What YAML means?

A

YAML stands for: it’s “yet another markup language. which is designed for representing hierarchical data in a way that’s easy for humans to read and write. R Markdown uses it to control many details of the output

YAML header has parameter , i dont understand.

26
Q

Bibliographies and Citations in Rmarkdown?

A

Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the bibliography field in your file’s header. The field should contain a path from the directory that contains your .Rmd file to the file that contains the bibliography file:

bibliography: rmarkdown.bib

Separate multiple citations with a ;: Blah blah [@smith04; @doe99].

You can add arbitrary comments inside the square brackets:
Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1].

Remove the square brackets to create an in-text citation: @smith04
says blah, or @smith04 [p. 33] says blah.

Add a - before the citation to suppress the author’s name:
Smith says blah [-@smith04].

You can change the style of your citations and bibliography by referencing a CSL (citation style language) file in the csl field:

bibliography: rmarkdown.bib
csl: apa.csl

27
Q

Learn more on Rmarkdown , where?

A

R Markdown is still relatively young, and is still growing rapidly. The best place to stay on top of innovations is the official R Markdown website: http://rmarkdown.rstudio.com.

28
Q

Hadley wickham recommend learn Git with R in ?

A

“Happy Git with R”: a user friendly introduction to Git and GitHub from R users, by Jenny Bryan. The book is freely available online: http://happygitwithr.com

The “Git and GitHub” chapter of R Packages, by Hadley. You can also read it for free online: http://r-pkgs.had.co.nz/git.html.

29
Q

Two ways to set the output of doc in RMArkdwon?

A

Permanently, by modifying the YAML header:

title: “Viridis Demo”
output: html_document

Transiently, by calling rmarkdown::render() by hand:
rmarkdown::render(“diamond-sizes.Rmd”, output_format = “word_document”)

RStudio’s knit button renders a file to the first format listed in its output field. You can render to additional formats by clicking the dropdown menu beside the knit button.

30
Q

hen generating a document to share with decision makers, you can turn off the default display of code by setting global options in the setup chunk. how to do it?

A

knitr::opts_chunk$set(echo = FALSE)

31
Q

html_document is focus on what?

A

A html_document is focused on communicating with decision makers, while a notebook is focused on collaborating with other data scientists. These different purposes lead to using the HTML output in different ways. Both HTML outputs will contain the fully rendered output, but the notebook also contains the full source code. That means you can use the .nb.html generated by the notebook in two ways:

You can view it in a web browser, and see the rendered output. Unlike html_document, this rendering always includes an embedded copy of the source code that generated it.

You can edit it in RStudio. When you open an .nb.html file, RStudio will automatically recreate the .Rmd file that generated it. In the future, you will also be able to include supporting files (e.g. .csv data files), which will be automatically extracted when needed.

32
Q

Tip if u are using github with Rstudio?

A

There’s one tip that’s useful if you’re already using them: use both html_notebook and github_document outputs:

output:
html_notebook: default
github_document: default

html_notebook gives you a local preview, and a file that you can share via email. github_document creates a minimal md file that you can check into git. You can easily see how the results of your analysis (not just the code) change over time, and GitHub will render it for you nicely online.

33
Q

R markdwon in presentation?

A

You can also use R Markdown to produce presentations. You get less visual control than with a tool like Keynote or PowerPoint, but automatically inserting the results of your R code into a presentation can save a huge amount of time. Presentations work by dividing your content into slides, with a new slide beginning at each first (#) or second (##) level header. You can also insert a horizontal rule (***) to create a new slide without a header.

R Markdown comes with three presentation formats built-in:

ioslides_presentation - HTML presentation with ioslides

slidy_presentation - HTML presentation with W3C Slidy

beamer_presentation - PDF presentation with LaTeX Beamer.

34
Q

Dashboards in Rmarkdown?

A

Dashboards are a useful way to communicate large amounts of information visually and quickly. Flexdashboard makes it particularly easy to create dashboards using R Markdown and a convention for how the headers affect the layout:

Each level 1 header (#) begins a new page in the dashboard.
Each level 2 header (##) begins a new column.
Each level 3 header (###) begins a new row.

Learn more here: http://rmarkdown.rstudio.com/flexdashboard/.

35
Q

Interactivity in RMarkdown?

A

Any HTML format (document, notebook, presentation, or dashboard) can contain interactive components.

36
Q

What is Shiny?

A

shiny, a package that allows you to create interactivity using R code, not JavaScript.

To call Shiny code from an R Markdown document, add runtime: shiny to the header:

title: “Shiny Web App”
output: html_document
runtime: shiny

Then you can use the “input” functions to add interactive components to the document:

library(shiny)

textInput(“name”, “What is your name?”)
numericInput(“age”, “How old are you?”, NA, min = 0, max = 150)

more :http://shiny.rstudio.com/.

37
Q

What is bookdown?

A

The bookdown package, https://github.com/rstudio/bookdown, makes it easy to write books, like this one. To learn more, read Authoring Books with R Markdown, by Yihui Xie, which is, of course, written in bookdown. Visit http://www.bookdown.org to see other bookdown books written by the wider R community.

38
Q

rarticle?

A

The rticles package, https://github.com/rstudio/rticles, compiles a selection of formats tailored for specific scientific journals.

39
Q

Why R markdown important?

A

R Markdown is also important because it so tightly integrates prose and code. This makes it a great analysis notebook because it lets you develop code and record your thoughts. An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences. It:

40
Q

good advice about using lab notebooks effectively ?

A

Ensure each notebook has a descriptive title, an evocative filename, and a first paragraph that briefly describes the aims of the analysis.

Use the YAML header date field to record the date you started working on the notebook:

date: 2016-08-23
Use ISO8601 YYYY-MM-DD format so that’s there no ambiguity. Use it even if you don’t normally write dates that way!

If you spend a lot of time on an analysis idea and it turns out to be a dead end, don’t delete it! Write up a brief note about why it failed and leave it in the notebook. That will help you avoid going down the same dead end when you come back to the analysis in the future.

Generally, you’re better off doing data entry outside of R. But if you do need to record a small snippet of data, clearly lay it out using tibble::tribble().

If you discover an error in a data file, never modify it directly, but instead write code to correct the value. Explain why you made the fix.

Before you finish for the day, make sure you can knit the notebook (if you’re using caching, make sure to clear the caches). That will let you fix any problems while the code is still fresh in your mind.

If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), you’ll need to track the versions of the packages that your code uses. A rigorous approach is to use packrat, http://rstudio.github.io/packrat/, which stores packages in your project directory, or checkpoint, https://github.com/RevolutionAnalytics/checkpoint, which will reinstall packages available on a specified date. A quick and dirty hack is to include a chunk that runs sessionInfo() — that won’t let you easily recreate your packages as they are today, but at least you’ll know what they were.

You are going to create many, many, many analysis notebooks over the course of your career. How are you going to organise them so you can find them again in the future? I recommend storing them in individual projects, and coming up with a good naming scheme.

41
Q

how to add lable to ur graphic?

A

You add labels with the labs() function.

e.g

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = “Fuel efficiency generally decreases with engine size”)

42
Q

What is the purpose for Plot title?

A

The purpose of a plot title is to summarise the main finding. Avoid titles that just describe what the plot is, e.g. “A scatterplot of engine displacement vs. fuel economy”.

43
Q

What are two other useful labels in graphics?

A

there are two other useful labels that you can use in ggplot2 2.2.0 and above (which should be available by the time you’re reading this book):

subtitle : adds additional detail in a smaller font beneath the title.

caption : adds text at the bottom right of the plot, often used to describe the source of the data.

44
Q

How can you replace the axis and legend titles using lab() from default one in graphics?

A

It’s usually a good idea to replace short variable names with more detailed descriptions, and to include the units.

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  geom_smooth(se = FALSE) +
  labs(
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    colour = "Car type"
  )
45
Q

Annotations, how do u lable individual observation in ur plot?

A

it’s often useful to label individual observations or groups of observations. The first tool you have at your disposal is geom_text(). geom_text() is similar to geom_point(), but it has an additional aesthetic: label. This makes it possible to add textual labels to your plots.

There are two possible sources of labels. First, you might have a tibble that provides labels

best_in_class %
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_text(aes(label = model), data = best_in_class)

46
Q

using geom_text() for annotation of observations is hard to read. How can we do better?

A

This is hard to read because the labels overlap with each other, and with the points. We can make things a little better by switching to geom_label() which draws a rectangle behind the text. We also use the nudge_y parameter to move the labels slightly above the corresponding points:

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)

To avoid overlap of annotation , use the ggrepel package by Kamil Slowikowski. This useful package will automatically adjust labels so that they don’t overlap:

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)

47
Q

Stop at the chapter graphics at where?

A

28.4.1 Axis ticks and legend keys

48
Q

how to scale your plot?

A

The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. For example, when you type:

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))

ggplot2 automatically adds default scales behind the scenes:

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_colour_discrete()
Note the naming scheme for scales: scale_ followed by the name of the aesthetic, then _, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date. There are lots of non-default scales which you’ll learn about below.

The default scales have been carefully chosen to do a good job for a wide range of inputs. Nevertheless, you might want to override the defaults for two reasons:

You might want to tweak some of the parameters of the default scale. This allows you to do things like change the breaks on the axes, or the key labels on the legend.

You might want to replace the scale altogether, and use a completely different algorithm. Often you can do better than the default because you know more about the data.

49
Q

Learn more about graphic in R ?

A

The absolute best place to learn more is the ggplot2 book: ggplot2: Elegant graphics for data analysis. It goes into much more depth about the underlying theory, and has many more examples of how to combine the individual pieces to solve practical problems. Unfortunately, the book is not available online for free, although you can find the source code at https://github.com/hadley/ggplot2-book.

Another great resource is the ggplot2 extensions guide http://www.ggplot2-exts.org/. This site lists many of the packages that extend ggplot2 with new geoms and scales. It’s a great place to start if you’re trying to do something that seems hard with ggplot2.