Descriptive Statistics Flashcards

1
Q
A

Analysis and summary of the measurements of a set of objects on a single variable

e.g
water levels in a group of wells
ages of water in a group of aquifers
nitrate concentrations in a group of lakes

analysis of 1D vector data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

tabular and graphical summaries

A

frequency distributions (most basic type)
-count of how often values occur

relative frequency
-dividing each frequency by total number of observations -%

cumulative frequency
-tallying number of values that occur up to and including each value
-how many observations are above or below a certain value

cumulative relative frequency
-cumulative frequency divided by the total number of observations
-%

most commonly displayed as a histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data has to be grouped - how do we know how many intervals to have

A

usually between 5 and 15 intervals
-goal is to reduce data without masking important features

for samples of less than 200, use sturges rule k=3.3logn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

theoretical frequency distributions

A

frequency distributions based on observations are empirical (finite number of observations)

when you decrease an interval and add groups, get more information about shape

theoretical (imagine what it would look like if it was continuous

use empirical distribution to estimate the properties of the theoretical distribution
(use a sample of a population to estimate the whole population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

shapes of distributions

A

uniform (straight line across)

u-shaped (reveals polarization, more that favour either end and less in the middle)

J-shaped
-counting number of defects in a quality controlled products
-more will have closer to 0 defects, number of products with higher defects is less and less

bell-shaped
-normal distribution (very common)
-heights of males, grades, etc

skewed
-varies like a bell shape but is skewed over to one side

bi-modal
- two bumps
-eg heights of a population that includes both male and female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

skewed distribution

A

named according to the side the tail exists on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

central tendancy

A

single summary value that suggests a typical or representative observation

tends to describe the value that occurs the most often

assess using mode (value in distribution that occurs the most frequently)
-if data is grouped, it is the interval with the most observations

median: middle value of a set of ordered data (50% of observations lower, 50% higher)
-if grouped, it is the midpoint of the interval with the crf of 50%
-not sensitive to extremes

mean
-most important measure of central tendency
-takes into account each value
-very sensitive to outliers of data/

weighted mean
-provides a central tendency measure of data when the observations vary in their degree of importance
-each observation is multiplied by importance weight
-sensitive to extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to operations on observations affect the mean

A

any operation applied to a set of observations will apply the same operation to the mean

x= set of observations
c= constant
X= mean

x+c = X+c
x-c = X-c
xc = Xc
x/c = X/c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

transforming data

A

if data are skewed, we will want to transform to reduce skewness before calculating central tendency
-log transform (take natural log of each value of data)
- lots of stats require normal distribution
- log transform cant be done on values less than 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

range

A

difference between highest and lowest observed value
-only takes into account extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mean absolute deviation (MAD)

A

-greater variation corresponds with greater deviation from the mean
-average of deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

variance

A
  • similar to MAD but the squares of the deviations are taken before summing them to get an average squared deviation
    -we get better representation of true population with more data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard deviation

A

square root of the variance (standard deviation)
-aka root mean square (RMS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

variation in a normal distribution

A

about 95% of data are within 2 standard deviations
-about 100% of data is within 3 standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

standardized z scores

A

distance an observation is from the mean in standard deviations

value - mean over standard deviation

-can find the proportion of data that falls above or below a certain z score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do operations on observations affect standard deviation