Programming terms Flashcards
vector
c
myData[2]
second element
myData[-3]
all elements apart from the 3rd
myData[c(1,4)]
only 1st and 4th elements
myData[2:4]
2nd 3rd 4th elements
rep(3,4)
3333
4:7
4 5 6 7
seq(1,3)
1 2 3
seq(start,end,by = 2)
step of 2
seq(start,end length.out = 7)
has total of 7 elements evenly spaced
sum(1:10)
sum of all integers from 1 to 10
sum(seq(2,100,by =2))
sum all even integers between 1 and 100
x< - 1:4, x*x
1 4 9 16
x%*% x
matrix multiplication
x<- 1:4 , x+c(0,10)
1 12 3 14 recycles vectors when lengths are different
sort
sorts a vector
rank
provides the rank of each element
order
gives the indices of the elements in order
unique
returns just the unique values in the vector
table
provide counts of the occurrence of each element
length
total number of elements in the vector
sample
randomly sample from the elements of a vector
paste
concatenate a textual representation of vectors together
essential stats functions
mean, median, sd, var, min, max, range, quantile, cumsum
for (i in vec){
}
executes the code within {} for each element of the vector
you cant modify the vector you’re looping over
it is copied before the loop starts
y[-(10:20)]
vector without elements 10 though 20 inclusive
data.frame(Height = c(…….)
Weight = c(…..))
create a data frame manually
useful self explanatory functions for data frames
colMeans, rowMeans, colSums, rowSums, cov, cor, scale
hw = Height Weight data frame, hw$Height
hw[,1]
hw$Weight
hw[,2]
Interrogating data frames
names(hw)
dim(hw)
nrow(hw)
ncol(hw)
head(hw)
summary(hw)
str(hw)
hw[,1,drop = FALSE]
Keeps the data in an nx1 matrix rather than it becoming a vector
hw%BMI <- hw$Weight/(hw$Height/100)^2
making a new variable in a data frame
wq.red[order(wq.red$ph),]
this is an accessor, the original data frame is unchanged. To change it we would have to overwrite it using wq.red<- wq.red[order….
list
each variable can be completely different sixe and data type
lists [] = access an element of the list as a single item
[[]] access item directly
$ access item by name
my function <- function(arg){
return z
}
functions
install once
install.packages(“ “)
load many times
library(“ “)
Factor variables
have a value from a limited set of possible levels
nlevels(chickwts$feed)
number of levels
data loaded by read.csv() is loaded as a string
to correct this use
mydat$var<- as.factor(mydat$var)
forcats
package to solve common problems with factors
changing the order of factors
chickwts$feed<- fct_inorder(chickwts$feed)
can also use fct_infreq, or fct_reorder(….,….)
calculating mean with missing data
na.rm=TRUE
plot(x,y,…)
scatter plot of points x vsy
Common extra arguments for base r plot
col = colour
pch = plotting symbol
xlab,ylab = axis labels
xlim,ylim = plotting range of x or y
main = plot title
type , p= points, l= line, b = both
points(x,y….)
adds to an existing plot
lines(x,y….)
adds a line to an existing plot
lowess()
fits a smoothed line, f argument controls smoothness
density()
fit a smoothed continuous version of histogram
other base r plotting functions
hist()= histogram
boxplot()
barplot() = for categorical bar charts
abline() = add straight lines to existing plot
pairs()
get a grid of all pairwise scatter plots
tidy data
= third normal form,
- each variable is in a column
- each observation is in a row
- each type of observational unit forms a table
too wide
one variable is spread across multiple columns
too long
one observation is spread across multiple rows
pivot_longer()
- gather multiple columns in to key-value pairs
- makes wide data longer
arguments:
-data frame
-columns to transform - name of column where previous column names should go
-name of column where values from the column should go
pivot_wider()
-gather multiple columns into key-value pairs
- makes long data wider
arguments
- data frame
- name of column where previous column names should go
- name of column where values from column should go
other useful tidyr functions
separate() = splits one columns of strings into multiple new columns
unite()= combines many columns into one (as as string)
extract() = uses regular expressions to pull out specific info from a string column
main dplyr functions
filter = focus on a subset of rows
arrange = reorder the rows
select = focus on a subset of variables
mutate = create new derived variables
summarise = create summary statistics (collapsing many rows) by groupings
joining data frames
rbind() = paste rows together (above/below)
cbind()= paste cols together (left/right)
left_join(x,y)
add new variables from y to x keeping all x
right_join(x,y)
add new variables from x to y keeping all y
inner_join(x,y)
keep only matching rows
full_join(x,y)
keep all rows in both x and y
ggplot(diamonds, aes(x=carat,y= price)
mapping to specify what variables map to the x axis,y axis, color legend etc
mapping specified by aes()
use geoms to specify how data is plotted
adding + to the plot ;
+ geom_point()
geoms
inherit data and mapping from the original ggplot() but can be overridden (or added to with aes)
geom functions
geom_point
geom_smooth
geom_hex
can make a plot a variable and then literally add geom to it
p +geom_smooth(method = “lm”)
stats
stat_bin_hex(bins = 60)
stat_ecdf()
faceting enables splitting data into multiple plots according to categorical variable
facet_wrap() = single variable split
facet_grid() = two variable split
r markdown styles
italic text
bold text
~~strikeout text~~
r markdown sections
section heading
##subsection heading
###sub sub section heading
r markdown lists
4 spaces needed to create the indent
dont need to increment the numbering manually
including r code in r markdown
‘r 1+1’ gives value 2 ‘1+1’ gives text 1+1
including r in r markdown chunks
’’‘{r}
’’’
more including r in r markdown, outputs
echo =FALSE just shows the output not the code
eval = FALSE just shows code but doesnt run it
included in ‘’‘{r,echo=}
r markdown and latex
can use display style with double dollar signs
ui hierarchy
pages >layouts>panels>inputs/outputs
pages shiny
-just 1 per ui
-define overall page structure
layouts and panels shiny
-define how to place the arguments given to them on the page
-layouts can have complex structure
-panel often define the look of the added item
inputs/outputs shiny
-create the visible content of the page
-enable user to interact with your app
-provide placeholders you can programmatically update
ui pages
fluidPage() = every item passed to it is just placed straight on the page, wrapped where necessary
navbarPage() = first argument is title, 1+ more arguments are calls to tabPanel() for each tabbed panel eg
titlePanel
create full width title
sidebarLayout()
creates a sidebar with styling, usually for inputs in sidebar and outputs in main area
sidebarPanel() to set left menu
mainPanel() set body of output
fluidRow() defines a new row containing column() calls - first a number 1-12, indicating how much width to take up (must sum to 12), second+ arguments are outputs
ui inputs
inputId = must be unique
accessed by input$name
ui inputs text
textInput() = single line text input
passwordInput() = hides the input text on screen
textAreaInput() = allows multiline inputs
ui inputs numeric
numericInput() = type the number directly
sliderInput() = to drag to choose a number - can choose a range too
ui inputs categorical
selectInput() = drop down list, single selection default (multiple = TRUE for more)
radioButtons() = single selection radio buttons
checkboxGroupInput() = multiple selection checkboxes
ui outputs
all outputs also take the same first arguments
outputId = must be unique
ui output text
textOutput() and renderText()
verbatimTextOutput() renderPrint()
ui outputs plots
plotOutput() and renderPlot()
argument res= 96 is recommended for the plot to look as close to scale as r studio
variables outside plots
need to wrap any calculations in reactive() and then access those new variables like a function
lubridate
part of tidyverse but not loaded automatically, date and time
today()
current date
now()
current date-time
constructing dates/date-times
ymd()
mdy()
dmy()
from either a string or a number
ymd_hms()
mdy_hm()
make_date()
make_datetime()
timezones
now(tz=”America/New_York”)
changing timezone
force_tz(x, “America/New_York”) = forces zone without converting
with_tz(x, “America/New_York”) = converts to new timezone
extractingfrom dates/datetimes
year()
month()
month(datetime, label = TRUE ) gives names of month
mday()
yday()
wday()
hour()
minute()
second()
rounding up/down dates
floor_date(datetime, unit= “ “)
ceiling_date(datetime, unit = “ “)
stringr string lengths
str_length()
combining strings
str_c(“Data”,”Science”,”and”, sep = “ “)
str_c(c(“Data”,”Science”,”and”,collapse=” “)
sub setting strings
str_sub(data,start, end)
can be negative
can also do str_sub(data,1,2)<- “Zo”
trimming strings
str_trim()
str_squish()
exact matching using regex
str_view(x,”an”)
for any single character str_view(x,”.a.”)
to match the character . you must provide \.
regex anchoring
anchoring the start = str_view(x,”^a”)
and the end str_view(x,”a$”)