Communication Flashcards

Question

What YAML means?

Answer 1

YAML stands for: it’s “yet another markup language. which is designed for representing hierarchical data in a way that’s easy for humans to read and write. R Markdown uses it to control many details of the output YAML header has parameter , i dont understand.

Answer 2

Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the bibliography field in your file’s header. The field should contain a path from the directory that contains your .Rmd file to the file that contains the bibliography file: bibliography: rmarkdown.bib Separate multiple citations with a `;`: Blah blah [@smith04; @doe99]. You can add arbitrary comments inside the square brackets: Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1]. Remove the square brackets to create an in-text citation: @smith04 says blah, or @smith04 [p. 33] says blah. Add a `-` before the citation to suppress the author's name: Smith says blah [-@smith04]. You can change the style of your citations and bibliography by referencing a CSL (citation style language) file in the csl field: bibliography: rmarkdown.bib csl: apa.csl

Answer 3

R Markdown is still relatively young, and is still growing rapidly. The best place to stay on top of innovations is the official R Markdown website: http://rmarkdown.rstudio.com.

Answer 4

“Happy Git with R”: a user friendly introduction to Git and GitHub from R users, by Jenny Bryan. The book is freely available online: http://happygitwithr.com The “Git and GitHub” chapter of R Packages, by Hadley. You can also read it for free online: http://r-pkgs.had.co.nz/git.html.

Answer 5

Permanently, by modifying the YAML header: title: "Viridis Demo" output: html_document Transiently, by calling rmarkdown::render() by hand: rmarkdown::render("diamond-sizes.Rmd", output_format = "word_document") RStudio’s knit button renders a file to the first format listed in its output field. You can render to additional formats by clicking the dropdown menu beside the knit button.

Answer 6

knitr::opts_chunk$set(echo = FALSE)

Answer 7

A html_document is focused on communicating with decision makers, while a notebook is focused on collaborating with other data scientists. These different purposes lead to using the HTML output in different ways. Both HTML outputs will contain the fully rendered output, but the notebook also contains the full source code. That means you can use the .nb.html generated by the notebook in two ways: You can view it in a web browser, and see the rendered output. Unlike html_document, this rendering always includes an embedded copy of the source code that generated it. You can edit it in RStudio. When you open an .nb.html file, RStudio will automatically recreate the .Rmd file that generated it. In the future, you will also be able to include supporting files (e.g. .csv data files), which will be automatically extracted when needed.

Answer 8

There’s one tip that’s useful if you’re already using them: use both html_notebook and github_document outputs: output: html_notebook: default github_document: default html_notebook gives you a local preview, and a file that you can share via email. github_document creates a minimal md file that you can check into git. You can easily see how the results of your analysis (not just the code) change over time, and GitHub will render it for you nicely online.

Answer 9

You can also use R Markdown to produce presentations. You get less visual control than with a tool like Keynote or PowerPoint, but automatically inserting the results of your R code into a presentation can save a huge amount of time. Presentations work by dividing your content into slides, with a new slide beginning at each first (#) or second (##) level header. You can also insert a horizontal rule (***) to create a new slide without a header. R Markdown comes with three presentation formats built-in: ioslides_presentation - HTML presentation with ioslides slidy_presentation - HTML presentation with W3C Slidy beamer_presentation - PDF presentation with LaTeX Beamer.

Answer 10

Dashboards are a useful way to communicate large amounts of information visually and quickly. Flexdashboard makes it particularly easy to create dashboards using R Markdown and a convention for how the headers affect the layout: Each level 1 header (#) begins a new page in the dashboard. Each level 2 header (##) begins a new column. Each level 3 header (###) begins a new row. Learn more here: http://rmarkdown.rstudio.com/flexdashboard/.

Answer 11

Any HTML format (document, notebook, presentation, or dashboard) can contain interactive components.

Answer 12

shiny, a package that allows you to create interactivity using R code, not JavaScript. To call Shiny code from an R Markdown document, add runtime: shiny to the header: title: "Shiny Web App" output: html_document runtime: shiny Then you can use the “input” functions to add interactive components to the document: library(shiny) textInput("name", "What is your name?") numericInput("age", "How old are you?", NA, min = 0, max = 150) more :http://shiny.rstudio.com/.

Answer 13

The bookdown package, https://github.com/rstudio/bookdown, makes it easy to write books, like this one. To learn more, read Authoring Books with R Markdown, by Yihui Xie, which is, of course, written in bookdown. Visit http://www.bookdown.org to see other bookdown books written by the wider R community.

Answer 14

The rticles package, https://github.com/rstudio/rticles, compiles a selection of formats tailored for specific scientific journals.

Answer 15

R Markdown is also important because it so tightly integrates prose and code. This makes it a great analysis notebook because it lets you develop code and record your thoughts. An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences. It:

Answer 16

Ensure each notebook has a descriptive title, an evocative filename, and a first paragraph that briefly describes the aims of the analysis. Use the YAML header date field to record the date you started working on the notebook: date: 2016-08-23 Use ISO8601 YYYY-MM-DD format so that’s there no ambiguity. Use it even if you don’t normally write dates that way! If you spend a lot of time on an analysis idea and it turns out to be a dead end, don’t delete it! Write up a brief note about why it failed and leave it in the notebook. That will help you avoid going down the same dead end when you come back to the analysis in the future. Generally, you’re better off doing data entry outside of R. But if you do need to record a small snippet of data, clearly lay it out using tibble::tribble(). If you discover an error in a data file, never modify it directly, but instead write code to correct the value. Explain why you made the fix. Before you finish for the day, make sure you can knit the notebook (if you’re using caching, make sure to clear the caches). That will let you fix any problems while the code is still fresh in your mind. If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), you’ll need to track the versions of the packages that your code uses. A rigorous approach is to use packrat, http://rstudio.github.io/packrat/, which stores packages in your project directory, or checkpoint, https://github.com/RevolutionAnalytics/checkpoint, which will reinstall packages available on a specified date. A quick and dirty hack is to include a chunk that runs sessionInfo() — that won’t let you easily recreate your packages as they are today, but at least you’ll know what they were. You are going to create many, many, many analysis notebooks over the course of your career. How are you going to organise them so you can find them again in the future? I recommend storing them in individual projects, and coming up with a good naming scheme.

Answer 17

You add labels with the labs() function. e.g ggplot(mpg, aes(displ, hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE) + labs(title = "Fuel efficiency generally decreases with engine size")

Answer 18

The purpose of a plot title is to summarise the main finding. Avoid titles that just describe what the plot is, e.g. “A scatterplot of engine displacement vs. fuel economy”.

Answer 19

there are two other useful labels that you can use in ggplot2 2.2.0 and above (which should be available by the time you’re reading this book): subtitle : adds additional detail in a smaller font beneath the title. caption : adds text at the bottom right of the plot, often used to describe the source of the data.

Answer 20

It’s usually a good idea to replace short variable names with more detailed descriptions, and to include the units. ``` ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(se = FALSE) + labs( x = "Engine displacement (L)", y = "Highway fuel economy (mpg)", colour = "Car type" ) ```

Answer 21

it’s often useful to label individual observations or groups of observations. The first tool you have at your disposal is geom_text(). geom_text() is similar to geom_point(), but it has an additional aesthetic: label. This makes it possible to add textual labels to your plots. There are two possible sources of labels. First, you might have a tibble that provides labels best_in_class % group_by(class) %>% filter(row_number(desc(hwy)) == 1) ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_text(aes(label = model), data = best_in_class)

Answer 22

This is hard to read because the labels overlap with each other, and with the points. We can make things a little better by switching to geom_label() which draws a rectangle behind the text. We also use the nudge_y parameter to move the labels slightly above the corresponding points: ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5) To avoid overlap of annotation , use the ggrepel package by Kamil Slowikowski. This useful package will automatically adjust labels so that they don’t overlap: ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_point(size = 3, shape = 1, data = best_in_class) + ggrepel::geom_label_repel(aes(label = model), data = best_in_class)

Answer 23

28.4.1 Axis ticks and legend keys

Answer 24

The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. For example, when you type: ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) ggplot2 automatically adds default scales behind the scenes: ``` ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + scale_x_continuous() + scale_y_continuous() + scale_colour_discrete() Note the naming scheme for scales: scale_ followed by the name of the aesthetic, then _, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date. There are lots of non-default scales which you’ll learn about below. ``` The default scales have been carefully chosen to do a good job for a wide range of inputs. Nevertheless, you might want to override the defaults for two reasons: You might want to tweak some of the parameters of the default scale. This allows you to do things like change the breaks on the axes, or the key labels on the legend. You might want to replace the scale altogether, and use a completely different algorithm. Often you can do better than the default because you know more about the data.

Answer 25

The absolute best place to learn more is the ggplot2 book: ggplot2: Elegant graphics for data analysis. It goes into much more depth about the underlying theory, and has many more examples of how to combine the individual pieces to solve practical problems. Unfortunately, the book is not available online for free, although you can find the source code at https://github.com/hadley/ggplot2-book. Another great resource is the ggplot2 extensions guide http://www.ggplot2-exts.org/. This site lists many of the packages that extend ggplot2 with new geoms and scales. It’s a great place to start if you’re trying to do something that seems hard with ggplot2.

Communication Flashcards

(49 cards)