R for bio Flashcards
What is the purpose of a Shiny package?
Shiny is an R package that enables the creation of interactive web applications directly from R.
- web applications for data visualization, analysis, and communication
Why should you augment the data?
It will increase:
- dataset size (Q-value, P-value, Residual)
- Robustness to variations
- feature learning
and reduce sensitivity
When does it make sense to augment the data?
It makes sense to use the augment() function or similar approaches to augment the data when you want to enrich your dataset with additional information related to the fitted model.
what does it mean to perform augment on the data?
Typically refers to a function or operation that adds new columns to a dataset containing additional information related to the model
- P-value, Q-value or residuals
What is the 3 tidy data rules?
- each variable must have its own colum
- each observation must have its own row
- each value must have its own cell
What does it mean to wrangle the data?
Data wrangling is the process of converting raw data into a usable form and is done before any data analysis is done.
- Ensure reliable and complete data
Why should you wrangle your data?
because it cleans the data, transforming and organizing it to a more suitable and structured format for data analysis
What is the general pipeline for bio data science?
import -> tidy -> (Transform -> visualize -> model -> ) -> communicate
Why is reproducibility of data analysis important?
Ensuring verification and validate the analysis, making it reusable.
- reduce error
- enhance adaptility and learning
What is the components of reducebility
- Raw data
- Cleaning data
- code and script
- parameterization and tuning
- explanation
- documented workflow
- results
- accessibility and sharing
Which data forms best fit a boxplot?
- 2 < numeric variables
- several groups in the data
Which data forms best fit a heatmap?
not ordered numeric variables
Which data forms best fit density plot?
Numeric values, not ordered and work with many points
Which data forms best fit histogram?
Numeric variables, not ordered and works for a few points
What is the purpose of boxplot?
Gives a summary of one or several numeric variables. The line that divides the box into 2 parts represents the median of the data. includes potential outliers in lines