Statistics Midterm Flashcards
Descriptive statistics
Consists of collection, organisation, summarisation and presentation of data.
Inferential statistics
Consists of generalising from samples to population, performing estimations, hypothesis testing, determining relationships among variables, and making predictions.
Population
Consists of all subjects that are being studied.
Sample
Is a group of subjects selected from the population.
Parameter
A measure that describes a characteristic of a population.
Statistic
A measure that describes a characteristic of a sample.
Selection bias
A distortion of evidence or data that arises from the way that the data are collected.
Variable
A characteristic or attribute that can assume different values.
Data
The values that variables can assume.
Qualitative variable
The characteristic being studied is non numeric.
Quantitative variable
Information is reported numerically.
Discrete variable
Can only take on a finite number of values. (ex people)
Continuous variables
Can only take on an infinite number of values. (ex hight/ time)
Nominal level of measurement
Classifies data into mutually exclusive (no overlapping), exhaustive categories in which no order or ranking can be imposed.
Ordinal level of measurement
Classifies data into categories that can be ranked. (precise differences between the ranks do not exist).
Interval level of measurement
Ranks data, and precise differences between units of measurement do exist. However, there is no meaningful zero. (ex temperature)
Ratio level of measurement
Ranks data, and precise differences between units of measurement do exist. A true zero exists.
Frequency table (categorical frequency distribution)
The organisation of qualitative data into table form, using mutually exclusive classes and showing the number of observations in each class.
Relative frequency
Class frequency / total frequency
Pie chart
degrees = class relative frequency * 360
Building frequency tables
You should have: class limit, class boundaries, class midpoint, frequency, cumulative frequency, relative frequency + sometimes cumulative relative frequency.
Rules: between 5 to 20 classes, the classes must be mutually exclusive, the classes must be continuous, the classes must be exhaustive (cover full data range), the classes must be equal width.
Procedure: determine class range, decide number of classes, decide the width, set class limits and boundaries, count the number of occurrences in each class.
Cumulative frequency
The total number of values that are less than a given upper class boundary.
Class midpoint
Lower + upper class limit / 2
Histogram
Display the data by using vertical bars of various heights to represent the frequencies. Vertical line: frequencies Horizontal line: class boundaries
Frequency polygon
Display the data by using lines that connect points plotted for the frequencies at the midpoint of the classes. Vertical line: frequencies Horizontal line: class midpoint
Ogive / cumulative frequency
Represents the cumulative frequencies for the classes in a frequency distribution. The line will alway go up. Vertical line: cumulative frequency Horizontal line: upper class boundaries
Mean
The sum of the values, divided by the total number of values.
Rounding rule of the mean
The mean should be rounded to one more decimal place than occurs in the raw data.
X-bar x̄
Sample mean
μ
Population mean
Weighted mean
When some values are of more importance. Ex calculating the grade.
Median
The midpoint of an ordered data set. (when the data is arranged in order.
Mode
The value that occurs most often in the dataset.
If you don’t have a mode, it’s called bimodal.
Midrange
lowest + highest values / 2
Normally distributed data
When it looks like a pyramid. Zero skewness. It has approximately the same amount of values on both sides. The top is in the middle.
The mean is the most used to find the average.
Positive skewness
When the top is on the right. (ex with salaries)
The median is the most used to find the average.
Negative skewness
The top is to the left. (ex exam grades.)
The median is the most used to find the average.
Range
Highest value - lowest value
Variance
The average of the squares of the distance each value is from the mean.
- Find the mean
- Value - mean
- Square the result
- Add the result together
- (sample) Divide the result by the total number of numbers in the data set minus one.
(population) Divide the result by the total number of numbers in the data set.
Standard deviation
The square root of the variance.
Chebyshev’s theorem
1-(1/k^2)
Step 1: difference between value and mean / standard deviation
Step 2: 1-(1/k^2)