Advanced Statistics Flashcards
Survival analysis is used when:
- considering long-term effects
- understanding prognosis and trx effectiveness.
All questions regarding survival analysis will have a ______ component.
Time
What type of distribution does survival time data typically have?
positive skewness
What are the components of censored observation?
- Includes those who have not reached the terminal event by end of the study
- leads to data that is incomplete
- leads to underestimation of event occurrences.
What are the important characteristics of the Kaplan-Meier curve?
- Analyzes the probability of an event at specific time intervals
- Generates a step function that will change survival estimate each time a pt reaches a terminal event
- accounts for the censored observations
- Often reported with confidence intervals to better apply population parameters
What is the median survival time?
the point at which cumulative survival function = .5 (50th percentile)
the mean survival time is estimated as:
an area under the K - M curve
What is a hazard rate? how is it estimated?
the rate at which how rapidly a subject will experience a terminal event
estimated by the slope of the line fitted to K - M curve
What is a hazard ratio?
compares how often a particular terminal event happens between 2 groups.
Hazard ratio will tell us if a group is:
faster, slower, or event rates are the same in both groups.
When is a log-rank test at its best?
When there are equal proportions
What are the limitations of the Log Rank Test?
- can only test one variable at a time
- cannot control for confounders or other risk factors
- cannot include interaction terms
What would be a null and alternative hypothesis for survival analyses?
null: the groups have identical distribution curves
alternative: the groups have different distribution curves
The COX proportional hazard model is better than the LOG rank if you want to what?
control for confounding variables
What is the goal of the factor analysis?
- Seeks to understand whether and to what extent items from a survey or scale reflect specific contracts
- Provides information about reliability, item quality, and construct validity
- High sensitivity to identify problematic items and assess the number of factors
What are the 2 types of factor analysis?
exploratory factor analysis
confirmatory factor analysis
factor analysis allows us to ___________ complex variables.
simplify
factor analysis will function to find _____________ between variables.
similarities
What are the 4 uses and goals of exploratory factor analysis?
- explore the possible underlying factor structure of a set of observed variables
- identify the underlying factor structure
- describe and identify the number of factors
- provide a means of explaining variation amongst variables.
What assumptions must we make to run an exploratory factorial analysis?
- continuous data
- normal distribution
- sample size is large enough (greater than 200 with more than 3 observations per variable).
- correlation is greater than .2 between variables.
What are the limitations of exploratory factor analysis?
- subjective analysis
- variables may not always be generalizable to the population
- no causal inferences can be made
A confirmatory factor analysis will test the hypothesis that a relationship exists between:
the observed variables and an identified factor.
What are limitatios to a confirmatory factorial analysis?
sample size must be large
very sensitive to outliers and missing data.
What is a cluster analysis?
an exploratory data analysis tool for organizing observed data into meaningful clusters based on combinations of variables.
What is agglomerative hierarchical clustering?
bottom-up –> 1 piece of data set and merges it with others to form larger groups.
What is divisive hierarchial clustering?
top down –> starts w/ whole data set and partitions data step by step.
what are limitations to hierarchial clustering?
arbitrary decisions (subjective)
consideration of data types
misinterpretation possible
What is non - hierarchical clustering?
data points are grouped into non - overlapping subsets (clusters) such that each object is in exactly one cluster.
What is the most widely used clustering?
K - mean clustering
What are limitations to K mean clustering?
subjective test
how does k - mean clustering occur?
data is classified into K # clusters, and each data point is mapped into the clusters with its nearest mean.
What type of cluster analysis woudl you use if you have categorical and continuous data?
2 step clustering or hybrid approach
What are some benefits to 2 step clustering?
- allows for the ability to create clusters on both categorical and continuous variables.
- number of clusters is automatically determined
- makes the analysis of a large data set very efficient
The cluster quality validation index measures:
how well the general goal of clustering is achieved.
The cluster quality validation index measures:
how well the general goal of clustering is achieved.