cogs 14b definitions Flashcards
What is statistics?
quantification and interpretation of variability
Discrete Variables
a variable that takes on distinct, countable values ; giving whole numbers
examples: # of siblings, political party
Continuous Variables
have potentially infinite values between any two observed values
examples: height, weight, interest rates
What are the three levels of measurement?
Nominal, Ordinal (ranked), Interval/Ratio
Define Nominal data fa me?
variables that have two or more categories, but which do not have an intrinsic order
examples: sex, blood type, favorite kpop group
Define Ordinal (ranked) data fa me?
a set of categories that are organized in and ordered or ranked sequence ; possesses an inherent order
examples: letter class grades, clothing size
Define Interval/Ratio data fa me?
- used to measure variables with equal intervals between values
~ interval has no true zero point while ratio does
~ quantitative
interval example: IQ score, GPA
ratio example: distance, weight, income
Population
Complete collection of observations or potential observations
for all individuals or units of interest
Sample
A partial set of observations taken from the population
Convenience sample
respondents from a population that can be conveniently
contacted/accessed by the researcher
examples: from a poll, survey, people in crowded locations
Parameter
value reflecting something in the entire population of interest
Statistic
a value that reflects something from a sample (can be estimate
of population parameter)
Random sampling
all potential observations in the population have an
equal chance of being selected in a sample
Sampling error
samples can be unrepresentative of the whole population to varying degrees and this causes errors of varying sizes based on level of representativeness - due to this, sample statistics will
vary by chance
Descriptive Statistics
Provides description of data collected
- Approaches presentation of data in a digestible manner
- How can we organize the sample data?
- Measures of central tendency and variability, mean, median, mode
Inferential Statistics
Helps figure out how sample of data will generalize
- Makes inferences and estimates using data
- Hypothesis testing, confidence intervals, regression analysis, ANOVA
- What does the sample data say about the population?
Bar Charts
when is best used?
- Best used when x-axis var is discrete and nominal
- Used for presentation of summary stats or raw data
- Differences easy to see
Line charts
what is best used for and useful for indicating?
- X-axis variable is continuous and interval/ratio (quantitative data)
- Useful for indicating trends over time
Scatter Plots
best used for what kinda data and what kind of values?
- Best used when both x and y coordinate values are interval/ratio scale
- best used for bivariate data ; used in observational studies with no independent variables
- X & Y coordinates represent values of 2 diff variables
Frequency distributions
sorting observations into classes and displaying the number of occurrences in those classes
(can be shown using a histogram or table)
Ungrouped frequency
distribution
- distribution that displays the frequency of each individual data value instead of groups of data values
- best to use when you have less
than 20 single-value classes
Grouped Frequency Distribution
only possible with what data? and what does it organize?
- organizing a large set of data into classes with more than 1 value
- only possible with ordinal and interval/ratio data
- even if a group has 0 observations, it is included
- choose appropriate bins based on # of observations
Outliers
extreme scores or observations - they lie at the far edge of the frequency distribution and are extremely unlike rest of the sample
Relative frequency (f) distributions
what does it display? and it’s helpful when…?
- display the frequency of each class as a proportion
- helpful when discussing ratios
- distribution that shows the proportion of the total number of observations associated with each value or class of values
Cummulative frequency
distributions
frequency distribution that represents the sum of a class and all classes below it
Relative cumulative distributions
Divide cumulative freq. by # of observations
Positively skewed
- extreme values lie to the right of the distribution
- goes up then down
Negative skewed
- extreme values lie to the left of the distribution
- goes up so starts low then rises
Mean and what its best for
- the average value of a data set
- appropriate for interval/ratio data
- affected by outliers
Population mean (𝜇)
a parameter - the mean of the whole population
Sample mean (x̄)
a statistic - the mean of a sample of the population /
estimate of the population mean
Define Median
how do you get it? what is it best used for? is it impacted by outliers?
- middle value when data is organized from smallest to largest value
- best for ordinal, interval, or ratio data
- not impacted by outliers
How do you calculate median?
Steps to calculate median:
1) Order observations in ascending order
2) Find middle position by adding 1 to total number of observations &
dividing by 2
3) If you have an odd number of data points, the middle value will be the median, but if if it is even, add the number above the middle position and divide by 2
Mode and what it works for
- value or category that has the greatest frequency
- only measure of central tendency that can be used for nominal data (works for all 4 data types - ordinal, interval, ratio, nominal)
- not impacted by outliers
Variability
the degree to which scores in a distribution are spread out or clustered together
Interquartile Range (IQR)
the range covered by the middle 50% of the
data - this measure is much more resistant to extreme/outlier values
because it does not count in the lower and upper extreme values
How do you calculate (IQR)?
1) Arrange data from lowest to highest
2) Find quartile index ((n+1)/4) and round to the nearest whole # if needed
3) With index number, count to max to get 3rd quartile
4) Count to min with index number to get 1st quartile
5) IQR: 3rd quartile - 1st quartile
Standard Deviation
what does it measure? and how does it measure whats being measured lol?
measures variability by measuring how the scores differ from the mean
What is Variance and how do you calculate it?
- variance is the mean of all deviation scores
- the equation is SS/N
Sum of Squares (SS)
the sum of all values in a data set with the mean
subtracted from it and then squared