statistics Flashcards

1
Q

kinds of variables

A
  1. categorical (qualitative)
  2. numerical (quantitative) => discrete (counting), continuous (measuring)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

population def and kinds

A

= set from which the data is collected
finite (everything in life), inifinite (math concepts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sampling

A

= selecting the group from which data is collected from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sampling methods

A
  1. statistical (selection based on chance)
  2. non-statistical (selection based on convenience)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical sampling methods

A
  1. simple random sampling => every possible sample of a specific size has an equal chance of being chosen
  2. stratified => a population is divided into strata based on the values of interest and then random sampling within strata
  3. systematic => first element is selected randomly and then each nth element (sampling interval) is selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

non-statistical sampling methods

A
  1. quota sampling => non-random selection of a predertermined number of units
  2. availability sampling => based on convenience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

statistical interferences

A

= using characteristics of a sample to draw conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

descriptive statistics

A

displaying and summarizing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

infernal statistics

A

choosing a representative sample, drawing conclusions from sample to population, predict, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

absolute vs relative frequency

A

absolute = number of data
relative = abs/number of total data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

cumulative absolute vs relative frequency

A

cum. abs. => less or equal absolute frequency (final = total number of data)
cum. relative => less or equal relative frequency (final = 100%)

adding consecutive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

width, class width

A

width => difference between subsequent grades
class width = beginning of next class - beginning of said class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

class boundaries

A
  • upper class of previous = lower class of current
  • class width is evident
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

histogram

A
  • = graphical representation of a frequency distribution table
  • used for numerical values
  • no spaces between columns
  • variable axis has direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

bar graph

A

represents qualitative data
variable axis has no direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

distribution curve

or ogive

A

= smooth histogram, indicating the general behaviour of the histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

distribution in a distribution curve

A
  • symmetrical
  • skewed to the left
  • skewed to the right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

modality of a distribution curve

A
  1. unimodal => one peak
  2. bimodal => two peaks
  3. trimodal => three peaks

height of peaks doesn’t matter

19
Q

uniform distribution

A
  • straight line histogram
  • each class has roughly the same distribution
20
Q

bell-shaped distribution

A

unimodal, symmetric distribution
Gaussian (normal) curve

21
Q

approximation of central tendencies on a distribution curve

A
  • mode => x-coordinate of the highest peak
  • median => x-coordinate of the vertical line that halves the area of the distribution curve
  • mean => symmetric distribution (coincides with the median and mode at the center of distribution), skewed (to the side of the tail of the median)
22
Q

measures of central distribution

A
  • mode = value with the highest frequency
  • median = value in the middle position of data in ascending order
  • arithmetic mean (average) = sum of all data/number of data
23
Q

modal class

A

class with the most elements

24
Q

average of data in k classes with mid-interval values m1, m2, …, mk

A

(m1f1 + m2f2 + … + mkfk)/n

25
Q

measures of spread (variability)

A

how well the typical value represents the list
range, variance, standard deviation, interquartile percentiles

26
Q

range

A

largest value - smallest value

27
Q

variance

A

V = sum of the differences squared of a value from the mean
= sum of squared values - squared average

28
Q

sample variance

A

s^2 = (sum of the differences of values to the average squared)/(n-1)

29
Q

percentiles

A

seperate large ordered data sets into humdredths
C1, C2, …, C99

30
Q

quartiles

+interquartile range

A

seperate data into 4 quarters
Q1, Q2 (median), Q3
IQR = Q3 - Q1

31
Q

five-number data summary

A
  • x min, Q1, Q2, Q3, x max
  • exludes outliers
  • box and whiskers plot graphically depicts the summary
32
Q

exclusion of outliers

A
  • all data outside of [lower fence, upper fence] are outliers
  • lower fence = Q1 - 1.5IQR
  • upper fence = Q3 + 1.5IQR
33
Q

bivariate analysis

A
  • analysis of 2 variables, generally their corrolation, over the same population
  • the explanatory (independant) variable seemingly explains the change in the response (dependant) variable
  • graphically represented with a scatter plot
34
Q

linear corrolation

A
  • how close to a line points in a scatter plot are
  • represented with Pearson’s (product-moment) correlation coefficient
35
Q

weak, moderate, strong, perfect, no correlation

A
  • weak => weakly represents a line
  • moderate => slightly resembles a line
  • strong => close to a line
  • perfect correlation => forms a line
  • no correlation => no line
36
Q

Pearson’s (product-moment) correlation coefficient

range, undefined, sample vs population

A
  • -1 ≤ r ≤ 1
  • does not measure gradient
  • horizontal line => r is undefined
  • sample r = population r
37
Q

value of r means

A
  • r > 0 => increasing line
  • r < 0 => decreasing line
  • r = 1 => perfect positive correlation
  • r = -1 => perfect negative correlation
  • r = 0 => no correlation
38
Q

regression line (model)

A

= line of best fit, trend line = a line that represents points in a plane as good as possible => the line that minimizes the sum of squares of vertical distances to the line
minimal d1^2 + d2^2 + … + dn^2 is the best

39
Q

regression line of y on x

form of line, what it does, GDC, used for

A
  • y = ax + b
  • minimizing the vertical distances to the line
  • GDC: X on Xlist, Y on Ylist (y = ax+b)
  • used to estimate values of y given x
40
Q

regression line of x on y

form of line, what it does, GDC, used for

A
  • x = ay + b
  • minimizing the horizontal distance to the line
  • GDC: Y on Xlist, X on Ylist (y=ax+b)
  • estimation of values of x given y
41
Q

where do regression lines y on x and x on y meet

A

at (average of x, average of y)

42
Q

interpolation

A

estimation of y given x inside the data, usually more accurate

43
Q

extrapolation

A
  • estimation of y given x outside the given data
  • we don’t know the relationship of variables outside the given data set => less accurate
  • aim to extrapolate close to the given data