L5 - Descriptive and Inferential Stats Flashcards
List the type of variables and an example of each
- Nominal/ Categoriacal
- Race, Gender - Ordinal
- Pain rating - Continuous
Describe how you would present numerical data for the following variables:
- Highest Education Qualification
- Pain Score
- Mean Plasma Concentration of Ciprofloxacin
- n (%)
- n (%), or % or Median (IQR) where appropriate
- Mean (SD) or Median (IQR)
Common graphical displays that describe the data for the different types of variables
- Nominal/ Categorical: N.A.
- Ordinal: pie char, bar chart
- Continuous: Box plot, Histogram
Define Inferential statistics. What is its main assumption?
Stats methods to draw conclusions from sample and imake inferences to an entire ppn
Assumption: Sample represents a RANDOM SAMPLE from underlying (unobserved) ppn
Describe the two main approaches to inferential stats
- Param Estimation: estimate ppn params from sample stats (Point estimate and interval estimate)
- Hyp testing:to validate supposition based on limited evi, inferred using sample from ppn (Null and Alternative Hyp)
What is Sampling Distribution of sample means?
Under Param Estimation:
- Repeated random samples taken
- Means computed for each sample
- Then means of all samples used as data to get point estimate and CI
Describe the properties of sampling distribution of the means
- Mean ≈ ppn mean
- SD of sample means ≈ Population SD/rt(n), is also SEM (Std error of mean), which is used to get CI
(note: SD only look at one sample, while SEM look at SD of all the samples of the mean)
What is SEM?
Estimate precision/reliability of sample as it relates to ppn from which sample was drawn.
i.e. Tells as where TRUE ppn mean may lie, via the CI
What is Central Limit Theorem?
Large sample sizes: Mean approx normally distributed, even if each sample amy not be noramlly distributed
Describe point estimate and interval estimate (CI)
- Point estimate: single number to estimate param of interest
- CI: Range of reasonable values intended to contain param of interest. Usually 95% confidence
What is the width of CI influenced by?
- Confidence Level (Higher = increased width)
- Increased n = Decreased width
- Increased SD = Increased Width
Describe the Null and Alternative Hyp
H0: No difference/ relationship/ effect
H1: opposite of H0
Define p-value
Probability that observed result occur by chance, assuming H0 true
Define the following:
- a (alpha)
- B (beta)
- Type I error
- Type II error
- Stats Power
- a: Significance level of stats test, or chance to commit type I error
- B: Probability of failing to reject when there is a difference
- Type I error = a
- Type II error = B
- Stats power: Probability of correctly rejecting false H0 when there is effect = 1-B
Does statsig = clisig?
NO
Considerations for deciding on appropriate stats test when comparing data between / among groups
- Number of groups being compared
- Whether groups are independent or pairedrelated
- Type of variable the data is
- For continuous: Whether data is normally distributed - Whether assumptions for specific stats test are met
Describe the two statistical tests for normality and state when to use each of them.
State their Hypotheses as well
- Shapiro-Wilk test for n<50
- Kolmogorov-Smirnov test for n≥50
H0: Distribution is normal
H1: Distribution NOT normal
p < 0.05: Sample data is NOT normal
Considerations for deciding on appropriate stats test when examining and quantifying degree of LINEAR relationship between two NUMERICAL variables
- Whether one or both are continuous or ordinal
- For continuous: check for normality - Whether assumptions are met
Considerations for deciding on appropriate stats test when ESTIMATING effect of an INDEPENDENT VARIABLE (i.e. predictor variable) (x) on DEPENDENT VARIABLE (i.e. Outcome variable) (y)
- Whether DEPENDENT variable is continuous or ordinal or nominal
- Whether assumptions are met