1. categorical (qualitative) 2. numerical (quantitative) => discrete (counting), continuous (measuring)

1. statistical (selection based on chance) 2. non-statistical (selection based on convenience)

upper class of previous = lower class of current class width is evident

= graphical representation of a frequency distribution table used for numerical values no spaces between columns variable axis has direction

statistics Flashcards by Arina Zhdanova

kinds of variables

categorical (qualitative)
numerical (quantitative) => discrete (counting), continuous (measuring)

How well did you know this?

Not at all

Perfectly

population def and kinds

= set from which the data is collected
finite (everything in life), inifinite (math concepts)

How well did you know this?

Not at all

Perfectly

sampling

= selecting the group from which data is collected from

How well did you know this?

Not at all

Perfectly

sampling methods

statistical (selection based on chance)
non-statistical (selection based on convenience)

How well did you know this?

Not at all

Perfectly

statistical sampling methods

simple random sampling => every possible sample of a specific size has an equal chance of being chosen
stratified => a population is divided into strata based on the values of interest and then random sampling within strata
systematic => first element is selected randomly and then each nth element (sampling interval) is selected

How well did you know this?

Not at all

Perfectly

non-statistical sampling methods

quota sampling => non-random selection of a predertermined number of units
availability sampling => based on convenience

How well did you know this?

Not at all

Perfectly

statistical interferences

= using characteristics of a sample to draw conclusions

How well did you know this?

Not at all

Perfectly

descriptive statistics

displaying and summarizing data

How well did you know this?

Not at all

Perfectly

infernal statistics

choosing a representative sample, drawing conclusions from sample to population, predict, …

How well did you know this?

Not at all

Perfectly

absolute vs relative frequency

absolute = number of data
relative = abs/number of total data

How well did you know this?

Not at all

Perfectly

cumulative absolute vs relative frequency

cum. abs. => less or equal absolute frequency (final = total number of data)
cum. relative => less or equal relative frequency (final = 100%)

adding consecutive

How well did you know this?

Not at all

Perfectly

width, class width

width => difference between subsequent grades
class width = beginning of next class - beginning of said class

How well did you know this?

Not at all

Perfectly

class boundaries

upper class of previous = lower class of current
class width is evident

How well did you know this?

Not at all

Perfectly

histogram

= graphical representation of a frequency distribution table
used for numerical values
no spaces between columns
variable axis has direction

How well did you know this?

Not at all

Perfectly

bar graph

represents qualitative data
variable axis has no direction

How well did you know this?

Not at all

Perfectly

distribution curve

or ogive

= smooth histogram, indicating the general behaviour of the histogram

How well did you know this?

Not at all

Perfectly

distribution in a distribution curve

symmetrical
skewed to the left
skewed to the right

How well did you know this?

Not at all

Perfectly

modality of a distribution curve

Study These Flashcards

unimodal => one peak
bimodal => two peaks
trimodal => three peaks

height of peaks doesn’t matter

uniform distribution

Study These Flashcards

straight line histogram
each class has roughly the same distribution

bell-shaped distribution

Study These Flashcards

unimodal, symmetric distribution
Gaussian (normal) curve

approximation of central tendencies on a distribution curve

Study These Flashcards

mode => x-coordinate of the highest peak
median => x-coordinate of the vertical line that halves the area of the distribution curve
mean => symmetric distribution (coincides with the median and mode at the center of distribution), skewed (to the side of the tail of the median)

measures of central distribution

Study These Flashcards

mode = value with the highest frequency
median = value in the middle position of data in ascending order
arithmetic mean (average) = sum of all data/number of data

modal class

Study These Flashcards

class with the most elements

average of data in k classes with mid-interval values m1, m2, …, mk

Study These Flashcards

(m1f1 + m2f2 + … + mkfk)/n

measures of spread (variability)

how well the typical value represents the list range, variance, standard deviation, interquartile percentiles

range

largest value - smallest value

variance

V = sum of the differences squared of a value from the mean = sum of squared values - squared average

sample variance

s^2 = (sum of the differences of values to the average squared)/(n-1)

percentiles

seperate large ordered data sets into humdredths C1, C2, ..., C99

quartiles | +interquartile range

seperate data into 4 quarters Q1, Q2 (median), Q3 IQR = Q3 - Q1

five-number data summary

* x min, Q1, Q2, Q3, x max * exludes outliers * box and whiskers plot graphically depicts the summary

exclusion of outliers

* all data outside of [lower fence, upper fence] are outliers * lower fence = Q1 - 1.5IQR * upper fence = Q3 + 1.5IQR

bivariate analysis

* analysis of **2 variables**, generally **their corrolation**, over the same population * the **explanatory** (independant) variable seemingly explains the change in the **response** (dependant) variable * graphically represented with a **scatter plot**

linear corrolation

* how close to a line points in a scatter plot are * represented with Pearson's (product-moment) correlation coefficient

weak, moderate, strong, perfect, no correlation

* weak => weakly represents a line * moderate => slightly resembles a line * strong => close to a line * perfect correlation => forms a line * no correlation => no line

Pearson's (product-moment) correlation coefficient | range, undefined, sample vs population

* -1 ≤ r ≤ 1 * does not measure gradient * horizontal line => r is undefined * sample r = population r

value of r means

* r > 0 => increasing line * r < 0 => decreasing line * r = 1 => perfect positive correlation * r = -1 => perfect negative correlation * r = 0 => no correlation

regression line (model)

= line of best fit, trend line = a line that represents points in a plane as good as possible => the line that minimizes the sum of squares of vertical distances to the line minimal d1^2 + d2^2 + ... + dn^2 is the best

regression line of y on x | form of line, what it does, GDC, used for

* **y = ax + b** * minimizing the **vertical distances** to the line * GDC: **X on Xlist**, Y on Ylist (*y = ax+b*) * used to **estimate values of y** given x

regression line of x on y | form of line, what it does, GDC, used for

* **x = ay + b** * minimizing the **horizontal distance** to the line * GDC: **Y on Xlist**, X on Ylist (*y=ax+b*) * **estimation of values of x** given y

where do regression lines y on x and x on y meet

at (average of x, average of y)

interpolation

estimation of y given x inside the data, usually more accurate

extrapolation

* estimation of y given x outside the given data * we don't know the relationship of variables outside the given data set => less accurate * aim to extrapolate close to the given data

statistics Flashcards

(43 cards)