00R Basic and Intro Flashcards
how to explicitly specify that u are using a particular function from a package
ggplot2::ggplot()…package::function().
How to install package ?
install.packages(“tidyverse”)
how to load package?
library(tidyverse)
which one has double quote? Loading or installing package?
Installing package
ggpplot
gplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy),)
To set an aesthetic manually, set the aesthetic by name as an argument of your geom function; i.e. it goes outside of aes(). You’ll need to pick a level that makes sense for that aesthetic:
geom_point(mapping = aes(x = displ, y = hwy), color = “blue”)
what is the first argument in ggplot??
data
which geom create scatter plot?
geom_point()
what is an aesthetics?
An aesthetic is a visual prop‐ erty of the objects in your plot. Aesthetics include things like the size, the shape, or the color of your points
Once you map an aesthetic, ggplot2 takes care of the rest. It selects a reasonable scale to use with the aesthetic, and it constructs a legend that explains the mapping between levels and values.
What does this does glimpse(mpeg)?
displays the type of each column
What happened when u see + and code does not execute?
Sometimes you’ll run the code and nothing happens. Check the left-hand of your console: if it’s a +, it means that R doesn’t think you’ve typed a complete expression and it’s waiting for you to finish it. In this case, it’s usually easy to start from scratch again by pressing ESCAPE to abort processing the current command.
“The simple graph has brought more information to the data analyst’s mind than any other device.”
— John Tukey
What is ggplot2?
R has several systems for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs. With ggplot2, you can do more faster by learning one system and applying it in many places
what is relationship between ggplot2 and tidyverse?
ggplot2, one of the core members of the tidyverse.
How to view all ur data set in R studio pane?
View(flights)
What are tibbles?
Tibbles are data frames, but slightly tweaked to work better in the tidyverse.
What is data frame?
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows)
what is difference between ggplot2::mpg and mpg?
The first we explicitly call the data frame mpg and second we have already import the ‘tidyverse’ which ia a collection of packages including mpg
what is the graphing template ?
ggplot(data = ) +
(mapping = aes())
What is aesthetics ?
An aesthetic is a visual property of the objects in your plot. Aesthetics include things like the size, the shape, or the color of your points
in aesthetic , how do u use colour?
(If you prefer British English, like Hadley, you can use colour instead of color.)
what is scaling? what is relation with colour?
To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside aes(). ggplot2 will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling
aesthetic
For each aesthetic, you use aes() to associate the name of the aesthetic with a variable to display. The aes() function gathers together each of the aesthetic mappings used by a layer and passes them to the layer’s mapping argument. The syntax highlights a useful insight about x and y: the x and y locations of a point are themselves aesthetics, visual properties that you can map to variables to display information about the data.
can u select ads properties manually?
yes …ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = “blue”). To set an aesthetic manually, set the aesthetic by name as an argument of your geom function; i.e. it goes outside of aes(). You’ll need to pick a level that makes sense for that aesthetic: 1. The name of a color as a character string. 2. The size of a point in mm. 3. The shape of a point as a number, as shown in Figure 3.1.
aes can be ?
colour , size , shape etc?
What does glimpse() does?
glimpse() displays the type of each column.
What happens if you map the same variable to multiple aesthetics?
n the above plot, hwy is mapped to both location on the y-axis and color, and displ is mapped to both location on the x-axis and size. The code works and produces a plot, even if it is a bad one. Mapping a single variable to multiple aesthetics is redundant. Because it is redundant information, in most cases avoid mapping a single variable to multiple aesthetics.
what are diff ways to use ggplot?
ggplot(mtcars, aes(wt, mpg)) +
geom_point(shape = 21, colour = “black”, fill = “white”, size = 5, stroke = 5)
ggplot(mpg, aes(x = displ, y = hwy, shape = cty)) +
geom_point()
What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y.
Aesthetics can also be mapped to expressions like displ < 5. The ggplot() function behaves as if a temporary variable was added to the data with with values equal to the result of the expression. In this case, the result of displ < 5 is a logical variable which takes values of TRUE or FALSE
One common problem when creating ggplot2 graphics is to put the + in the wrong place:Where does it supposed to come?
it has to come at the end of the line, not the start. In other words, make sure you haven’t accidentally written code like this:
ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))
Facets
One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data.
What is geom?
A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. Scatterplots break the trend; they use the point geom.
Does each geom have same aesthetic and mapping?
Every geom function in ggplot2 takes a mapping argument. However, not every aesthetic works with every geom
how many geom ggplot2 provide?
ggplot2 provides over 30 geoms, and extension packages provide even more (see https://www.ggplot2-exts.org for a sampling). The best way to get a comprehensive overview is the ggplot2 cheatsheet, which you can find at http://rstudio.com/cheatsheets. To learn more about any single geom, use help: ?geom_smooth.
how to display multiple geom ?
To display multiple geoms in the same plot, add multiple geom functions to ggplot():
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
better one
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
Diff kind of geom
line chart: geom_line()
boxplot: geom_boxplot()
histogram: geom_hist()
area chart: geom_area()
What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter?
The theme option show.legend = FALSE hides the legend box.
Many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot:
bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin.
smoothers fit a model to your data and then plot predictions from the model.
boxplots compute a robust summary of the distribution and then display a specially formatted box.
The algorithm used to calculate new values for a graph is called?
A stat : short for statistical transformation. T
what is reship between geoms and stats interchangeably.?
You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar():
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
What does geom_col() do? How is it different to geom_bar()?
The geom_col() function has different default stat than geom_bar(). The default stat of geom_col() is stat_identity(), which leaves the data as is. The geom_col() function expects that the data contains x values and y values which represent the bar height.
The default stat of geom_bar() is stat_bin(). The geom_bar() function only expects an x variable. The stat, stat_bin(), preprocess input data by counting the number of observations for each value of x. The y aesthetic uses the values of these counts.
You can colour a bar chart using either the colour aesthetic, or, more usefully, fill:. But how ?
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
if you map the fill aesthetic to another variable, like clarity: the bars are automatically stacked. Each colored rectangle represents a combination of cut and clarity
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
The stacking is performed automatically by the position adjustment specified by the position argument. If you don’t want a stacked bar chart, you can use one of three other options: “identity”, “dodge” or “fill”.
Position = identity
position = “identity” will place each object exactly where it falls in the context of the graph. This is not very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars slightly transparent by setting alpha to a small value, or completely transparent by setting fill = NA.
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = “identity”)
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = “identity”)
position = “fill” works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = “fill”)
position = “dodge” places overlapping objects directly beside one another. This makes it easier to compare individual values.
gplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = “dodge”)
Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?
geom_bar() stat_count() geom_bin2d() stat_bin_2d() geom_boxplot() stat_boxplot() geom_contour() stat_contour() geom_count() stat_sum() geom_density() stat_density() geom_density_2d() stat_density_2d() geom_hex() stat_hex() geom_freqpoly() stat_bin() geom_histogram() stat_bin()
when there is overplotting because there are multiple observations for each combination of cty and hwy values.how to improve it?
I would improve the plot by using a jitter position adjustment to decrease overplotting.
coord_flip() ?
coord_flip() switches the x and y axes. This is useful (for example), if you want horizontal boxplots. It’s also useful for long labels: it’s hard to get them to fit without overlapping on the x-axis.
The layered grammar of graphics
ggplot(data = ) + ( mapping = aes(), stat = , position = ) + \+
In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.
The seven parameters in the template compose the grammar of graphics, a formal system for building plots. The grammar of graphics is based on the insight that you can uniquely describe any plot as a combination of a dataset, a geom, a set of mappings, a stat, a position adjustment, a coordinate system, and a faceting scheme.
names recommendation style?
I recommend snake_case where you separate lowercase words with _. i_use_snake_case
cmd+up arrow ? what it does?
You will see all previous command and u can move around and select one u want reexecute
How to select function easily using Tab command?
just type the small part e.g seq and press tab , A popup shows you possible completions
When u see + , what does R means?
The + tells you that R is waiting for more input; it doesn’t think you’re done yet. Usually that means you’ve forgotten either a “ or a ). Either add the missing pair, or press ESCAPE to abort the expression and try again.
Where can you see all objects in Rstudio?
Under Environment
Error messages of the form “object ‘…’ not found” mean exactly what they say. R cannot find an object with that name. How to solve?
The most common scenarios in which I encounter this error message are
I forgot to create the object, or an error prevented the object from being created.
I made a typo in the object’s name, either when using it or when I created it (as in the example above), or I forgot what I had originally named it. If you find yourself often writing the wrong name for an object, it is a good indication that the original name was not a good one.
I forgot to load the package that contains the object using library().
filter function what it does?
select observations with certain conditions filter(diamonds, carat > 3)
How to show keyboard shortcut ? with which key?
Alt + Shift + K.
How to show keyboard shortcut ? with which key?
Alt + Shift + K.
This gives a menu with keyboard shortcuts. This can be found in the menu under Tools -> Keyboard Shortcuts Help.
clear workspace in r ?
rm(list=ls())
or click the broom thintsiya to remove all.
Leyer plot means what?
if you say to layers plots mean to combine them together. for example to layer gglpot_point() and ggplot_line() means to combine the two plots in one graph. that is poin and line.
Where + comes in ggplot at end of line or start of new line?
end of a line
Shortcut for pipe operator
shift + cmd + m
R programming from Cousera starts here
Cousera R Prgramming
Who wrote R?
R is dialect of S (S developed by John Chambers and others at Bell Labs in 1976, revision in 1988 in c writtrn by hastie and chamber)
Which country R created?
1991…created in NewZealand
1993 first announcement to public
1997 : The R core group formed
Drawbacks of R?
- Base on 40year old langauage
- Functionality base on user command and user contribution
- Objects must be generally store in physical memory (though there is advancement now increase in memory)
- Not ideal for all possible situation(all software package)
• R system is divided into 2 conceptual parts, what are they?
- The base R system that u can download form CRAN
- The“base”R system contains,among other things,the base package which Is required to run R and contains the most fundamental functions.
- The other packages contained in the“base system include utils,stats,datasets,graphics, grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4. - Everything else
- People often make packages available on their personal websites and Git; there is no reliable way to keep track of how many packages are available in this fashion.
How to ask questions?
• Asking Questions • Provide reproducible output • What d u expect output • What d u see instead • Version u using , e.g R package • OS Additional info
• Subject Headers : • Smarter R 3.0.2 lm() function on MAC OS 10.1 …seg fault on large data frame • Do • Describe the goal , not step • Explicit • Followup if solution found solution •
Attribute in R?
• Attribute : R object can have attribute
□ Names , dimension ,class , length , other define
□ Attribute() is a function use to find attribute of an object
• Explicit coercion?
As.numeric(x)….changes x to numeric As.character(x)…changes x to character .etc
matrix how to create?
• Matrix
§ Dim attribue (row , col)
§ M
matrix how to create?
• Matrix
§ Dim attribue (row , col)
§ M
how to create matrix with cbind and rbind?
§ Matrix can be created by colum bindng and row-binding with cbind() and rbind(). Rbind combine colum , rbind combine raw
§ x
hwo missing values is represented?
• Mixing Values
§ NA or NAN for undefind mthcl oprx
§ Is.na() use to test if object are NA
§ Is.nan() use to test for NAN
§ NA have class , can be integer NA, charcter NA
A NAN value in also NA but the converse is not tru
Dataframes?
• Data Frames
§ Store tabular data
§ They are special type of list wher every elemet have the same length
§ Data frames can store object of diff classes
§ Have special attr called row.names
§ Data frames created by calling read.table() or read.csv()
§ Can be converted to matrix by calling data.matrix()
§ Can be created using data.frame()
□ E.g a
Names in R for object?
§ R objects can also have names , which is very useful for writitng readable code and self-describing
§ E.g x
How to read tabular data in R ?
Read.table or read.csv for reading tabular data text file , row and data file
§ ReadLine : for reading lines of txt file
§ Source for reading R code files (inverse of dump)
§ Dget for reading in R code file (inverse of dput)
§ Load for reading in saved worksapce
How to read text with rea.table or read.csv?
• Reading with read.table / read.csv is same except default separator is comma
§ File , name of file or connection
§ Header , logical indicating if the file has a header line
§ Sep , a string indicating how the colums are separated
§ colClasses , a xter vector indicating the class of each colum in the dataset
§ Nrows ..the number of rows in the dataset
§ Comment.char. A character string indicating the comment xter
§ Skip, the number of lines to skip from d begining
writting data in R?
• Writing Data § Write.table § writeLines § Dump § Dput § Save
Reading Lage Datasets with read.table, how to do it easier?
• With much larger datasets, doing the following things will make your life easier and will prevent R from choking.
§ Read the help page for read.table which contain hints for large data set
§ Make rough calc of the memory requ to store ur data and find if ur RAM can do that
§ Set comment.chat = “” if no commented lines in the file
§ Use the colClasses argument. Specifying this option instead of using the default can make ’read.table’ run MUCH faster, often twice as fast. In order to use this option, you have to know the class of each column in your data frame. If all of the columns are “numeric”, for example, then you can just set colClasses = “numeric”. A quick an dirty way to figure out the classes of each column is the following:
§
initial
How to calculate memory requirment when using R with large data?
• Calculating Memory Requirements Calculating Memory
Requirements I have a data frame with 1,500,000 rows and 120 columns, all of which are numeric data.
Roughly, how much memory is required to store this data frame? 1,500,000 × 120 × 8 bytes/numeric = 1440000000 bytes = 1440000000 / bytes/MB = 1,373.29 MB = 1.34 GB
• Rule of Thumb
§ U need twice as memory as the data needs
Example in this case 1.34 * 2
subseeting list?
• Subsetting List
• X X $bar gives value of bar , it return 0.6
□ Or x ([“bar”])
extracting multiple element from list?
• Extract multiple element from list
• X[ c(1,2)] this returns foo and baz values
U cant use $ sign or double bracket when extracting multiple from list only single bracket
How to remove NAN values from list?
X a X[!a]
[1] 1 2 4 5
>
The capability of R reflect needs of the community. what can u say?
The capabilities of the R system generally reflect the interests of the R user community. As the community has ballooned in size over the past 10 years, the capabilities have similarly increased
The primarcy source code of R can only be change by who?
The R core group
Who wrote design R graphics system original ?
Murrel.R Graphics
Springer has Use R.
for R books in specific areas. I think we will write R bok or our PhDs to be turned into Springerinbriefs
x=5.
x
[1] 5. What [1] means?
It tell us x is vector and 5 is the first element
Everything in R is ?
object.
R has five basic atomic class of object?
xter , numeric , boolean , complex , intgere
What is shorthand for TREU and FALSE
T and F.
Does vector and List print out same?
No , because the List content are different and so it print them differently in diff lines.
What does print out of list shows us ? using [ 0r [[ for indexing?
using [[
> a
[[1]]
[1] 1
Element of List will have double bracket and Elemet of other vectors will have single bracket for subsetting. Tue or False?
True