P1: Intro to R and 16S Data Flashcards
what is R
a programming language primarily used for statistics, data analysis, and geographical representations
why learn R
- widely used in biology
- ideal for working with large datasets
- handles various data structures
- R code is great for reproducibility
- specialized packages/repositories
- active community and support, free and open resource
- highly values skill
why learn R - how is it ideal for working with large data sets
excel does not work well with thousands of data
why learn R: how is it ideal for working with large data sets - examples of large data sets
- genomics
- ecological
- microbial
why learn R - how is it great for reproducibility
- script-based analysis
- its a written protocol and using this, you can run a script in any computer using the exact same package and you should get the exact same result
- easy to keep track of
why learn R: specialized packages/repositories - define package
- software with different functions
- recipe for how to treat data
why learn R: specialized packages/repositories - define repository
sites online to download packages
why learn R: active community and support, free and open-source - what do we mean by open-source
- anyone can write a function and upload it onto a repository as a package
- can only be done is it is written in R
why learn R - how is it a highly valued skill
- can be a significant advantage in academic, government, and industrial roles
- it is a highly sought-after skill for data analysis
R data structure
- vector
- list
- data frame
- matrix
- array
R data structure - vector
- simplest to import data
- sequential set of variables
R data structure - list
list of different vectors
R data structure - data frame
- tables with variables and information in different columns and rows
R data structure - matrix
- similar to a data frame
- but all columns need to have the same information (homogenous)
- typically numerical
R data structure - array
groups of matrixes