Descriptive Statistics Flashcards
What is tidy data and what does it include?
The key to being able to successfully analyze data.
Rows (across)= observations
Columns (down)= variables
What is the code for getting the first 6 and last 6 rows of data in R
> Head(name of table)
Tail(name of table)
What is the code for finding the dimensions of a table in R?
Dim(name of table)
This will give you the number of rows (observations) and the number of variables in the data set
How do you get useful information on a stored data table in R?
Help(name of table)
What are discrete and continuous quantitative variables?
Discrete-can take on a finite number of values
Continuous-infinite number of values
What is the code for getting a stored table’s structure?
Str(name of table)
This will report :
‘data.frame’(object): number of observations and number of variables. Also includes the variable names, what type of variable (quant or cat) Variable in about the first 10 entries.
What does R call categorical variables?
Factors
Each individual category is a “level”
Mode
The most frequently occurring value among all observations in a sample
Frequency distribution
In order display of each value in a data set together with a number of times that value occurs
Easy way to find the mode
How do you make a list in R?
Object <- c(put your list)
Concatenate function
How do you make a frequency distribution in R?
table(object)
What are the strengths and weaknesses of the mode as a measure of location?
Strength-easy to compute
Weakness-not useful if there’s a large number of possible values that occur in frequently
What are two ways to calculate the mean in R?
- sum(object)/length(object)
- mean(object)
What are the strengths and weaknesses of the arithmetic mean as a measure of location?
Strength-natural and most widely used
Weakness -over sensitive to extreme values
What are the strengths and weaknesses of using the median as a measure of location?
Strength – insensitive to extreme values
Weakness-
Less sensitive to the actual values of the remaining data
What is and what are the properties of a symmetric distribution?
-two halves of the distribution appear like mirror images
-mean and median are approximately the same
What is another properties of a positively skewed distribution?
-Skewed to the right- long tail on the right
-Mean is usually larger than the median
What is the properties of a negatively skewed distribution?
-Skewed to the left - long tail on the left
-Mean is usually smaller than the median
What is the geometric mean?
The antilogarithm of the arithmetic mean computed in the log scale
-good for highly skewed lab data that can be based on logs of concentrations
What is the antilogarithm of the natural log (ln) function?
The exponential function
How do you make a matrix/array in R?
cbind( function
File.name <- cbind(v1, v2, v3….)
How do you turn in array into a data frame structure in R?
Wrap the cbind function in as.data.frame
Variable.name <- as.data.frame(cbind(v1,v2,v3….)