Descriptive Statistics Flashcards

1
Q

What is tidy data and what does it include?

A

The key to being able to successfully analyze data.

Rows (across)= observations
Columns (down)= variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the code for getting the first 6 and last 6 rows of data in R

A

> Head(name of table)
Tail(name of table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the code for finding the dimensions of a table in R?

A

Dim(name of table)

This will give you the number of rows (observations) and the number of variables in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you get useful information on a stored data table in R?

A

Help(name of table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are discrete and continuous quantitative variables?

A

Discrete-can take on a finite number of values
Continuous-infinite number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the code for getting a stored table’s structure?

A

Str(name of table)
This will report :
‘data.frame’(object): number of observations and number of variables. Also includes the variable names, what type of variable (quant or cat) Variable in about the first 10 entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does R call categorical variables?

A

Factors
Each individual category is a “level”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

The most frequently occurring value among all observations in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Frequency distribution

A

In order display of each value in a data set together with a number of times that value occurs

Easy way to find the mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you make a list in R?

A

Object <- c(put your list)
Concatenate function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you make a frequency distribution in R?

A

table(object)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the strengths and weaknesses of the mode as a measure of location?

A

Strength-easy to compute
Weakness-not useful if there’s a large number of possible values that occur in frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are two ways to calculate the mean in R?

A
  1. sum(object)/length(object)
  2. mean(object)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the strengths and weaknesses of the arithmetic mean as a measure of location?

A

Strength-natural and most widely used
Weakness -over sensitive to extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the strengths and weaknesses of using the median as a measure of location?

A

Strength – insensitive to extreme values
Weakness-
Less sensitive to the actual values of the remaining data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is and what are the properties of a symmetric distribution?

A

-two halves of the distribution appear like mirror images
-mean and median are approximately the same

17
Q

What is another properties of a positively skewed distribution?

A

-Skewed to the right- long tail on the right
-Mean is usually larger than the median

18
Q

What is the properties of a negatively skewed distribution?

A

-Skewed to the left - long tail on the left
-Mean is usually smaller than the median

19
Q

What is the geometric mean?

A

The antilogarithm of the arithmetic mean computed in the log scale
-good for highly skewed lab data that can be based on logs of concentrations

20
Q

What is the antilogarithm of the natural log (ln) function?

A

The exponential function

21
Q

How do you make a matrix/array in R?

A

cbind( function
File.name <- cbind(v1, v2, v3….)

22
Q

How do you turn in array into a data frame structure in R?

A

Wrap the cbind function in as.data.frame
Variable.name <- as.data.frame(cbind(v1,v2,v3….)