exam 2 Flashcards
combines rows from 2 datasets where there’s a match btw the specified variables
- rows with no matching values are excluded
- returns results if the keys are matched in both tables
inner join
includes all rows form the left dataset and the matching rows form the right dataset. If there is no match, the columns from he right dataset will be filled with NA. Her e the rows of the first tables are always returned, regardless of whether there is a match in the second table
left join
opposite of left join. Includes all rows from the right dataset and the matching rows from the left dataset
right join
includes all rows from both datasets. columns from the dataset with missing values will be filled with NA where there is is no match
full join
refers to a specific way of organizing data tables in a tabular format to facilitate data analysis. In tidy data:
- each variable forms a column
- each observation forms a row
tidy data
data often comes in various formats and its structure might not be ideal for the task at hand. Pivoting helps you reorganizing your data to format that makes it easier to analyze, visualize, or model
pivots
used to convert data from a wide format into. along format. its particularly useful when you have variables spread across different columns and you want to stack them into a single column, often with corresponding values
pivot longer
inputs data frame
wide_dataframe
used to convert data from a long format to a wide format. useful when you want to take distinct values from a column and spread them across new columns
pivot_wider
how data variables map to plot aes like position, color shape
aes
visual elements to represent the data (lines, points, bars, etc)
geometric objects
splitting Fata into subplots based on a variable
facets
controlling the overall appearance of the plot
theme
the smaller the bandwidth the ____ the peaks (more or less)
more
position = “dodge” does what
groups bars together
makes a line in a graph
geom_smooth
is color or fill used in scatterplots
color
is color or fill in bar graphs
fill
deal with one variable
facet_wrap
deals with 2 variables
facet_grid
detects patterns – checks if a strong contains a specific pattern. output : T/F
str_detect
finds the length (# of characters) in a string)
str_length
trims white spaces from the beginning and the end of the string
str_trim
remove extra white space within string, as well as the beginning + end
str_squish
extracts substrings from a string
str_sub
replaces a pattern with another string
str_replace
concatenates strings together (brings them together)
str_c
splits a string into a character vector using a specified delimiter
str_split
groups data into rows with all the variables
nested
is a function that works like $. It calls a specific column of your dataframe
pull()