Week 9: Descriptive and comparative stats Flashcards

1
Q

Define each:
1. Biostatistics
2. Univariate analysis
3. Bivariable analysis
4. Multivariable analysis

A
  1. Biostatistics = analyzing data and interpreting the results for problems related to biology and health
  2. Univariate analysis = describes ONE variable in a data set using simple stats like counts (frequencies), proportions and averages; describes the study pop
  3. Bivariable analysis = uses rate ratios, OR, etc. to examine the associations between 2 variables (exposure –> outcome); compares groups
  4. Multivariable analysis = uses stat tests like multiple regression models to examine relationships between 3 or more variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Identify what type of analysis (univariate, bivariate and/or multivariate) is/are involved in each type of study:
a) case series
b) cross-sectional survey
c) case-control study
d) cohort study
e) experimental study

A

a) case series = univariate
b) cross-sectional survey = univariate, sometimes bivariate
c) case-control study = uni,bi variate, sometimes multivariate
d) cohort study = uni,bi variate, sometimes multivariate
e) experimental study = uni,bi variate, sometimes multivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a variable?

A

a quantity that can have different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe how variables are used in the study process?

A

variable + measurement –> data + analysis –> evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What are the 2 types of quantitative variables?
  2. what are the 2 types of qualitative variables?
A
  1. quantitative = discrete, continuous
  2. qualitative = nominal, ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. What are nominal variables?
  2. Do nominal variables have intrinsic value (measure of somethings worth)
A
  1. Have values that represent no inherent rank or order. Can assign numbers to different categories, but these categories do not have any other numeric properties. Ex: amount of fruits, amount of countries
    - basically a variable that has no order. So if someone asked how many fruits are on this table, you would give a nominal variable as the answer like 4.
  2. NO
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What are ordinal variables?
  2. Do ordinal variables have intrinsic value?
A
  1. ordering things with numbers using a scale. Ex: 1= poor, 2= fair, 3=good, 4 = very good.
  2. Yes - because things are being assigned worth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are ways of displaying nominal or ordinal data?

A
  • pie charts
  • bar graphs
  • frequency tables - ex: what pets went missing this past year
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Quantitative variables are _______
  2. What are some examples of data that quantitative variables measure?
  3. assigned numbers have _________ meaning. For example, 5>4>3 or 4 is two times larger than 2
A
  1. numeric
  2. age, blood pressure, temperature
  3. mathematical meaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Answer based on continuous vs. discrete quantitative variables:
1. How many values can the variables take on?
2. What do they measure?
3. How can they be plotted?

A
  1. Continuous = any value
    discrete = finite # or limited values
  2. continuous = blood pressure, temperature
    discrete = age in year, number of drinks consumed
  3. continuous = plotted as a line
    discrete = plotted as dots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between intervals and ratios?

A

The difference between interval and ratio scales comes from their ability to dip below zero. Interval scales hold no true zero and can represent values below zero. For example, you can measure temperature below 0 degrees Celsius, such as -10 degrees.
Ratio variables, on the other hand, never fall below zero. A zero would mean absence of attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. Calculate the mean of these set of numbers:
    2, 9, 11, 6, 6, 26
  2. calculate the median of these set of numbers: 9,5,11,6,6,26
  3. calculate the mode of these set of numbers:
    2,9,11,6,6,26
A
  1. mean = add them up and divide by amount of #’s = 10
  2. median = rank the data by smallest to largest and pick the middle number.
    5,6,6,9,11,26; median = (6+9)/2 = 7.5
  3. mode = # that occurs the most = 6
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between a histogram and a bar chart?

A

Histograms visualize quantitative data or numerical data (age), whereas bar charts display categorical variables (people who like soccer) . In most instances, the numerical data in a histogram will be continuous (having infinite values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. What is a skewed histogram? why would it be skewed or not?
  2. What happens to the mean, median and mode in a negatively skewed, symmetrical and positively skewed histogram
A
  1. skewed means that there are outliers in the data. It can skew left or right.
  2. negatively skewed: mean<median<mode
    symmetrical: mean = median = mode
    positively skewed: mode>median>mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. What is range?
  2. what are quartiles?
  3. What is interquartile range (IQR)?
  4. Identify Q1-Q3 and the IQR of this set of data:
    62,63,64,64,70,72,76,77,81,81
A
  1. difference between the minimum (lowest) and the maximum (highest) value in a data set
  2. mark the 3 values that divide a data into 4 equal parts.
  3. IQR = captures the middle 50% of values for a numeric variable–> difference between Q1 and Q3
  4. Q1 = 64, Q2 = 71, Q3= 77
    IRQ = Q3 - Q1 = 72 - 64 = 13
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. Where can you find the median value in a box plot?
  2. where can you find the IQR on a box plot
  3. Where can you find outliers on a box plot
A
  1. median is the line inside the box
  2. the 75th is the line at the top of the box, the 25th is the line at the bottom of the box. Do 75th-25th = IQR
  3. outliers are the separate dots not in the box
17
Q
  1. What is variance and the symbol?
  2. What is the formula?
  3. the standard deviation is the _______ of the variance
  4. what is the standard error?
A
  1. the extent of deviation (how far off) from the average value of that variable in the data set
  2. Calc by adding together the squares of the differences between each observation and the sample mean and then dividing by the total # of observations
  3. square root
  4. adjusts for the number of observations in the data set by dividing the variance by the total number of observations and then taking the square root of that number
18
Q
  1. What are confidence intervals?
  2. T/F: a smaller sample size will yield a narrower confidence interval
  3. a ___% estimate is reported for statistical estimates, meaning that ___% of the time, the confidence interval is expected to miss capturing the true value of a measure in the source pop
A
  1. provide info about the expected value of a measure in a source population based on the value of that measure in a study population
  2. F. a larger sample size will yield a narrow confidence interval
  3. 95%, 5%
19
Q

What are comparative statistics?

A

comparing main factors between exposed and unexposed in cohort studies. Ex: % males exposed to % males unexposed

20
Q

What is inferential statistics?

A

techniques that use stats from a random sample of a population to make evidence based assumptions (inferences) about the values of parameters in the population as a whole

21
Q
  1. What is the purpose of hypothesis testing?
  2. what is the null hypothesis Ho?
  3. What is the alternative hypothesis Ha?
A
  1. to test hypotheses about a population parameter/ compare 2 hypotheses
  2. Ho= when there is NO difference between the 2 or more variables being compared –> ex: no association between exposure and outcome
  3. Ha= when there is a DIFFERENCE between the 2 or more variables being compared –> ex: strong association between exposure and outcome
22
Q

What are the steps in hypothesis testing?

A
  1. take a random sample from the pop of interest
  2. set up the competing hypotheses (based on research questions): Ho, Ha1 and Ha2
  3. use sample statistics (mean, freq) to decide whether to support or reject the null by calculation of a test statistics
  4. determine: what if the null is TRUE? what will the observed sample statistics be?
23
Q
  1. What is the purpose of the p. value?
  2. if p = 0.1-0.9 what does this mean?
  3. if p = <0.02 what does this mean?
  4. if p = 0.05 what does this mean? What is significant about this p value?
A
  1. to determine whether the observed sample supports the null. To see how confidence we are that the 2 variables being compared are DIFFERENT. The closer a p-value is to 0 the more confidence we are that the variables are diff and that the null is REJECTED
  2. The null should be true (10%-90% true) because the p value is far from 0.
  3. sufficiently strong evidence to conclude the null is unlikely to be true (less than 2% true)–> therefore difference in variables because p is closer to 0. null should be false
  4. statistically significant. Means that if we did the same experiment a bunch of times, only 5% of those experiments would result in the wrong decision about whether to except or reject null. This is the convention commonly used in health research (the line between true and not true)
24
Q

Compare parametric vs nonparametric tests

A

parametric = assumes the variables being examined have particular distributions in a population

non-parametric = does not make assumptions about distributions of responses in populations –> usually for ranked variables and when the distribution of a ratio or interval variable is non-normal

25
Q

Find the range, 25th, 50th, 75th percentile and IQR for this set of measurements:
2,9,4,5,7,11,8,6,6,26,18,11

A

range = 26-2 = 24
25th = 5+6/2 = 5.5
50th = 7+8/2 = 7.5
75th = 11+11/2 = 11
IQR = 75th-25th = 11-5.5 = 5.5

26
Q

calculate the variance, standard deviation and standard error for this data set: 5, 3, 6, 2
*don’t forget to include symbols

A

step 1: calculate the mean = (5+3+6+2)/4 = 4
step 2: calc variance
= (each number - mean)squared + … and divided by the amount of numbers (n)
= 2.5

step 3: calculate standard deviation = square root of the variance = square root of 2.5 = 1.58

step 4: calculate standard error:
= standard deviation / square root of the amount of numbers in the data set (n)
= 1.58/ square root of 4 = 0.79