V6 Flashcards
ggplot2
pros of ggplot 2 compared to other packages in R
- is one of the most elegant and most versatile
- layered, customisable plots
- implements the grammar of graphics, a coherent system for describing and building graphs
- faster by learning one system and applying it in many places
format needed for ggplot2
- must be in data.frame
- more rows than columns
important terms :
- data
- aesthetics
- geometries
- facets
- statistics
- coordinates
- themes
- data: dataset being plotted
- aesthetics: scales onto which we map our data
- geometries: visual elements used for our data
- facets: plotting small multiples
- statistics: representation of our data to aid understanding
- coordinates: space on which the data will be plotted
- themes: all non-data ink.
plot basics
- all ggplot2 plots with a call to ggplot(), supplying default data and aesthetic mappings, specified by ads()
- you then add layers, scales, coords and facets with +
- to save a plot to disk, use ggsave()
layer: geoms
- a layer combines data, aesthetic mapping, a geom(geometric object), a stat(statistical transformation), and a position adjustment
- typically you will create layers using a geom_ function overriding the default position and stat if needed
layer: stats
stat_function
statistical transformation rather than the visual appearance
e.g. -> str.identity() -> leave data as is
layer: position adjustment:
resolves overlapping gems; overrides the default of the geom_ or stat_ function
e.g. position_jitter() -> jitter points to avoid overplotting
layer: annotations
special types of layer: they don’t inherit global settings from the plot. they are used to add fixed reference data to plot
e.g. annotate() -> create an annotation layer
layer: scales
control the details of ow data values are translated to visual properties
e.g. scale_colour_continuous() -> apply a continuous colour scale
bar graphs - bar heights ?
- two different things that the heights of bars represent
- the count of cases for each group -> stat_bin()
- the value of a column in the data set -> stat_identity() leaves the y values unchanged
-> default is stat_bin
make basic bar graph with ggplot (with value not number of cases in each group)
add or remove colour and legend + black outline
add x, y main labels
library(ggplot2)
# simple plot ggplot(data=dat, aes(x=time, y=total_bill)) + geombar(stat ="identity)
# add colour and legend + black outline ggplot(data=dat, aes(x=time, y=total_bill, fill = time)) + geombar(colour="black", stat ="identity)
# remove legend (if redundant) \+ guides(fill=FALSE)
add x, y, main labels
+ xlab(“”)
+ ylab(“”)
+ggtitle(“”)
code count of cases in bar graph
+ geom_bar(stat = “count”)
how to code bar graph with multiple variables bar
time : xaxis
sex: color fill
total bill: y-axis
how to change colour
ggplot(data=tips, aes(x=time, y=total_bill, fill=sex)) + geom_bar(stat=”summary”,position = position_dodge())
-> make 2 bars right next to each other for each sex
# change colour \+ scale_fill_manual(values=c("",""))
code line graph
ggplot(data=tips, aes(x=time, y=total_bill, fill=sex)) + geom_line(stat=”summary”, size = 1.5) +
geom_point(stat=”summary”, size = 3)
how to add different symbols into plot
+ scale_shape_manual(values=c(“º”, “Ω”)