cogs 14b definitions Flashcards

1
Q

What is statistics?

A

quantification and interpretation of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete Variables

A

a variable that takes on distinct, countable values ; giving whole numbers

examples: # of siblings, political party

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous Variables

A

have potentially infinite values between any two observed values

examples: height, weight, interest rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three levels of measurement?

A

Nominal, Ordinal (ranked), Interval/Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Nominal data fa me?

A

variables that have two or more categories, but which do not have an intrinsic order

examples: sex, blood type, favorite kpop group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Ordinal (ranked) data fa me?

A

a set of categories that are organized in and ordered or ranked sequence ; possesses an inherent order

examples: letter class grades, clothing size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Interval/Ratio data fa me?

A
  • used to measure variables with equal intervals between values
    ~ interval has no true zero point while ratio does
    ~ quantitative

interval example: IQ score, GPA
ratio example: distance, weight, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Population

A

Complete collection of observations or potential observations
for all individuals or units of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample

A

A partial set of observations taken from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Convenience sample

A

respondents from a population that can be conveniently
contacted/accessed by the researcher

examples: from a poll, survey, people in crowded locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Parameter

A

value reflecting something in the entire population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistic

A

a value that reflects something from a sample (can be estimate
of population parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Random sampling

A

all potential observations in the population have an
equal chance of being selected in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sampling error

A

samples can be unrepresentative of the whole population to varying degrees and this causes errors of varying sizes based on level of representativeness - due to this, sample statistics will
vary by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Descriptive Statistics

A

Provides description of data collected
- Approaches presentation of data in a digestible manner
- How can we organize the sample data?
- Measures of central tendency and variability, mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inferential Statistics

A

Helps figure out how sample of data will generalize
- Makes inferences and estimates using data
- Hypothesis testing, confidence intervals, regression analysis, ANOVA
- What does the sample data say about the population?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bar Charts
when is best used?

A
  • Best used when x-axis var is discrete and nominal
  • Used for presentation of summary stats or raw data
  • Differences easy to see
18
Q

Line charts
what is best used for and useful for indicating?

A
  • X-axis variable is continuous and interval/ratio (quantitative data)
  • Useful for indicating trends over time
19
Q

Scatter Plots
best used for what kinda data and what kind of values?

A
  • Best used when both x and y coordinate values are interval/ratio scale
  • best used for bivariate data ; used in observational studies with no independent variables
  • X & Y coordinates represent values of 2 diff variables
20
Q

Frequency distributions

A

sorting observations into classes and displaying the number of occurrences in those classes
(can be shown using a histogram or table)

21
Q

Ungrouped frequency
distribution

A
  • distribution that displays the frequency of each individual data value instead of groups of data values
  • best to use when you have less
    than 20 single-value classes
22
Q

Grouped Frequency Distribution
only possible with what data? and what does it organize?

A
  • organizing a large set of data into classes with more than 1 value
  • only possible with ordinal and interval/ratio data
  • even if a group has 0 observations, it is included
  • choose appropriate bins based on # of observations
23
Q

Outliers

A

extreme scores or observations - they lie at the far edge of the frequency distribution and are extremely unlike rest of the sample

24
Q

Relative frequency (f) distributions

what does it display? and it’s helpful when…?

A
  • display the frequency of each class as a proportion
  • helpful when discussing ratios
  • distribution that shows the proportion of the total number of observations associated with each value or class of values
25
Q

Cummulative frequency
distributions

A

frequency distribution that represents the sum of a class and all classes below it

26
Q

Relative cumulative distributions

A

Divide cumulative freq. by # of observations

27
Q

Positively skewed

A
  • extreme values lie to the right of the distribution
  • goes up then down
28
Q

Negative skewed

A
  • extreme values lie to the left of the distribution
  • goes up so starts low then rises
29
Q

Mean and what its best for

A
  • the average value of a data set
  • appropriate for interval/ratio data
  • affected by outliers
30
Q

Population mean (𝜇)

A

a parameter - the mean of the whole population

31
Q

Sample mean (x̄)

A

a statistic - the mean of a sample of the population /
estimate of the population mean

32
Q

Define Median

how do you get it? what is it best used for? is it impacted by outliers?

A
  • middle value when data is organized from smallest to largest value
  • best for ordinal, interval, or ratio data
  • not impacted by outliers
33
Q

How do you calculate median?

A

Steps to calculate median:

1) Order observations in ascending order

2) Find middle position by adding 1 to total number of observations &
dividing by 2

3) If you have an odd number of data points, the middle value will be the median, but if if it is even, add the number above the middle position and divide by 2

34
Q

Mode and what it works for

A
  • value or category that has the greatest frequency
  • only measure of central tendency that can be used for nominal data (works for all 4 data types - ordinal, interval, ratio, nominal)
  • not impacted by outliers
35
Q

Variability

A

the degree to which scores in a distribution are spread out or clustered together

36
Q

Interquartile Range (IQR)

A

the range covered by the middle 50% of the
data - this measure is much more resistant to extreme/outlier values
because it does not count in the lower and upper extreme values

37
Q

How do you calculate (IQR)?

A

1) Arrange data from lowest to highest

2) Find quartile index ((n+1)/4) and round to the nearest whole # if needed

3) With index number, count to max to get 3rd quartile

4) Count to min with index number to get 1st quartile

5) IQR: 3rd quartile - 1st quartile

38
Q

Standard Deviation

what does it measure? and how does it measure whats being measured lol?

A

measures variability by measuring how the scores differ from the mean

39
Q

What is Variance and how do you calculate it?

A
  • variance is the mean of all deviation scores
  • the equation is SS/N
40
Q

Sum of Squares (SS)

A

the sum of all values in a data set with the mean
subtracted from it and then squared