Basic Statistics Flashcards

0
Q

What is an independent variable (IV)?

A
  • variable examined to determine its effect on outcome of interest (DV)
  • under control of experimenter - manipulated variable
    e. g., dose of a drug
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What is a variable?

A

measurable characteristic that changes with person, environment, experiment
e.g., blood pressure, A1c levels, cholesterol LDL/HDL levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a dependent variable (DV)?

A
  • outcome of interest measured to assess effects of IV
  • not under experimenter control
    e. g., how a person reacts to the drug
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a subject, or organismic variable?

A

naturally occurring IV characteristic of people but not controlled
e.g., gender, race, BRCA1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four different types of data?

A
  • nominal
  • ordinal
  • interval
  • ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is nominal data?

A
  • qualitative (name)
  • mutually exclusive without logical order
    e. g., types of physical activity a diabetic patient engages in - walking, swimming, hiking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ordinal data?

A
  • qualitative
  • mutually exclusive with logical rank ordering ()
    e. g., ratings of how a patient feels: very poor, poor, average, good, very good
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is interval data?

A
  • quantitative with equal units of measurement allowing for the distance between two pairs to be equivalent in some way
  • there is an arbitrary (no meaningful) zero point
    e. g., cancer patients rate their level of energy on a 1-10 scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ratio data?

A
  • quantitative with equal units of measurement where numbers can be compared as multiples of one another
  • meaningful zero point
    e. g., height, weight, length
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two different types of numbers?

A
  • discrete/discontinuous data

- continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is characteristic of discrete data?

A

only whole numbers allowed

e.g., # of manic episodes in a week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is characteristic of continuous data?

A

any values allowed

e.g., weight, height, fasting blood glucose levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

On which axis is the independent variable typically plotted on?

A

x-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

On which axis is the dependent variable typically plotted on?

A

y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some features of a bar graph?

A
  • nominal, sometimes ordinal data
  • each bar = category
  • height = frequency (proportion or %)
  • bars do not touch for categories (but if have two+ groups the groups within each category (males and females) can touch)
  • if ordinal data, must preserve order
  • can be vertical or horizontal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some features of a histogram?

A
  • interval, ratio date; sometimes ordinal
  • same rules as bar, BUT bars touch
  • usually for discrete data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some features of a line/frequency graph?

A
  • interval, ratio, sometimes ordinal data
  • usually for continuous data
  • points represent data and lines connect the data points showing the continuous nature of data (i.e., can have any value between)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the different forms of graphs?

A
  • normal: bell-shaped or symmetric about a line drawn through the center
  • skewed: not symmetric, shifted to one side or the other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are two types of skewed graphs?

A
  • negative skew: fewer scores at the low end, peak shifted to the right
  • positive skew: fewer scores at the high end, peak shifted to the left
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the mean of data?

A
  • a.k.a. average
  • a single value meant to typify a list of values
  • most common measure of central tendency
  • most appropriate when data are normally distributed (affected significantly by outliers or extreme values)
  • symbolized as u (population data) or Xbar (for sample data)
  • basic arithmetic mean calculated by adding up all data values and dividing by the number you have (e.g., 4+5+12 = 24/4 = 6 = mean)
20
Q

What is kurtosis?

A

the sharpness of the peak of a frequency-distribution curve

21
Q

Why is mean most appropriate for data that is normally distributed?

A

it is affected significantly by outliers or extreme values

22
Q

How do you calculate the mean?

A

basic arithmetic mean calculated by adding up all data values and dividing by the number you have
e.g., 4+5+12 = 24/4 = 6 = mean

23
Q

What is the median?

A
  • midpoint of a distribution of scores so 1/2 fall above and 1/2 fall below (50th percentile)
  • appropriate measure of central tendency with skewed distributions and those with outliers or extreme values
  • if you have an odd array of values, put them in ascending order and the median is the humber in the middle
  • if even array, put in ascending order, take mean of the two middle values
24
Q

What is the mode?

A
  • most common score in a distribution
    e. g., scores are 2 3 4 4 4 5 ; mode is 4
  • can have more than one mode
    e. g., scores are 2 4 4 5 6 6 7 ; mode = 4,6
25
Q

What is variance?

A
  • measure of dispersion which is calculated by taking the average of the squared differences between the mean and all the scores contributing to the mean
  • used with the mean
  • tells you, on average, how far the score varies from the mean
  • symbolized as sigma^2 for a population, s^2 for a sample
  • outliers and extreme values increase variance
  • typically shown with the mean and reported as the mean +/- the variance
26
Q

How do outliers and extreme values affect the variance?

A

increase variance

27
Q

What is standard deviation?

A
  • square root of the variance
  • used with the mean
  • converts variance to a score interpretable in terms of measurement scale
  • outliers and extreme values increase standard deviation
  • typically shown with the mean and reported as the mean +/- the standard deviation
  • sigma for population, s for sample
28
Q

What is the measure of dispersion with the median?

A

interquartile range (IQR)

29
Q

What is the interquartile range (IQR)?

A
  • the middle 50% of scores in a distribution
  • used with the median
  • not affected by outliers or extreme values
  • the range of scores between the 25th and 75th percentiles of a distribution
30
Q

What is characteristic of sampling distribution?

A

fill in

32
Q

What is characteristic of sampling distribution?

A

the sampling distribution of a statistic is the distribution of all values of that statistic for every sample of a particular size from the population

33
Q

In a normal curve, how do the mean, median, and mode relate?

A

they are all equal

34
Q

What is the Central Limit Theorem (CLT)?

A
  • relationship between a population mean and its sampling distribution
  • it describes the conditions under which the sum of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed
  • justifies the approximation of large-sample statistics to the normal distribution in controlled experiments
  • provides a way to analyze data and test hypotheses
35
Q

Central Limit Theorem:

When random samples of fixed size are drawn from a population and as the sample size gets larger three things are assumed to occur:

A
  • the distribution of sample means approaches normality
  • the overall mean of the samples approaches the mean of the population
  • the standard deviation of the sample means equals the standard deviation of the population divided by the square root of the sample size
36
Q

accuracy

A

the degree of closeness to which a measured quantity is to its actual/true value

37
Q

reliability

A

the extent to which a measurement procedure yields a consistent outcome when done repeatedly

38
Q

Think of a target:
What is known as your ability to hit the target itself?
What is known as your ability to repeated hit the bulls-eye?

A
  • accuracy

- precision/reliability

39
Q

If a study has both high accuracy and high precision, what does this say about the sample estimate?

A
  • the sample estimate of the population will be close

- repeated studies will show little variability

40
Q

validity

A

does the measurement tool really measure what it is intended to measure

41
Q

The question of whether standardized tests (e.g., SAT, GRE, MCAT) really measure someone’s ability to perform well in college, medical school, etc. is a question of these tests’ what?

A

validity

42
Q

What three things affect the quality of the data collected and thus the quality of the decisions made based on those data?

A
  • accuracy
  • reliability/precision
  • validity
43
Q

population

A

complete set of people/objects having some common characteristic

44
Q

parameter

A
  • value summarizing characteristic of population
  • constants
  • use Greek letters to represent
45
Q

sample

A
  • subset of population

- share same characteristics

46
Q

statistic

A
  • value summarizing characteristic of a sample
  • are variable
  • use Roman letters to represent
47
Q

simple random sample

A

subset of population selected so that each population member has equal and independent chance of being chosen

48
Q

random assignment

A

assign subjects to treatments in equal and independent manner to avoid bias