T2: R Project management + R basis conts Flashcards
Debugging
1) Read the error message
2) Pay attention to the line
3) Try to see if you can fix it yourselves
4) Google/Stack overflow
SOME PRINCIPLES OF
COLLABORATION
Data management:
* Keep raw data
* Document all steps of data
* Use a comprehensive and global name
convention
Coding (Software):
* Comment code
* Structure code
* Write modular code
* Avoid redundancy
* Use a comprehensive and global name convention (again!)
* Use a common repository
* Use one common to-do list
Loading Data
Function: read.csv()
* Used to read comma-separated files.
* Returns data in a data frame format.
Viewing Data
Functions: head()
, View()
* head()
: View the first few rows of the
dataset.
* View()
: Opens a detailed viewer for the
data.
Dimensions & Structure
Functions: nrow()
, ncol()
* nrow()
: Returns the number of rows.
* ncol()
: Returns the number of columns.
Useful to understand the size of your dataset.
Summary Statistics
Function: summary()
* Get basic statistics for numeric columns.
* Counts for categorical columns.
Data Visualization
Function: hist()
* Important to visualize distributions before any
advanced analysis to understand your data
* E.g., the spread and central tendency
Subsetting Data
Function: subset()
* Filter rows based on conditions.
* Focus on specific segments of data.
Grouped Operations
Functions: table()
, tapply()
* table()
: Count occurrences in categorical
data.
* tapply()
: Apply functions to subsets.
Hypothesis Testing
Function: t.test()
* Compare means of two groups.
* Check if differences are statistically
significant.
WRITING A FUNCTION
[function_name] <- function(arg1,
arg2, …) { Function body
}
Return or not
return: * f.square(7)
* xx <- f.square(7)
Example:
f.square <- function(arg1)
{ return(arg1^2)
}