Objective 4 - Predictive Modeling Flashcards by Rachel Kullman

Risk factors that indicate whether a person may have high claims

Inherent risk factors, such as age, sex, and race
Medical condition-related factors, such as diabetes or cancer
Family history (for conditions that are inheritable)
Lifestyle risk factors, such as smoking, lack of exercise, and poor nutrition
External risk factors, such as industry, location, and education

How well did you know this?

Not at all

Perfectly

Types of medical management interventions

Care coordination (focuses on the system) - includes case management, discharge planning, and in-hospital care coordination
Condition management (focuses on the patient) - includes disease management and risk factor management
Provider management (focuses on the provider) - includes provider profiling, pay-for-performance, and accountable care organizations

How well did you know this?

Not at all

Perfectly

Areas where condition-based models are used in healthcare financial applications

Program management - identifying high-risk individuals, financial modeling and resource allocation, and program evaluation (eg, calculating savings)
Provider or health plan reimbursement - normalizing populations to pay providers or plans for the risks they accept and to evaluate provider effectiveness. Profiling providers to assess quality and efficiency
Actuarial and U/W functions - pricing health plans, underwriting groups, and projecting future claims costs

How well did you know this?

Not at all

Perfectly

Types of predictive models that are not based on medical conditions (traditional “non-condition risk-based” models)

Age/sex - rates are established for a group based on the average age/sex factor of the members in the group (works best for large groups w/ age/sex factors close to 1.0)
Prior cost - the prior year’s claims are used to project future costs (is reasonably accurate for large groups, but not for smaller groups)
Combination of age/sex and prior cost - often used for rating smaller groups

How well did you know this?

Not at all

Perfectly

Sources of data for developing risk factors

Claims data - for medical condition-related risk factors such as diabetes or cancer
Self-reported data - for lifestyle related risk factors such as smoking, stress, lack of exercise, poor nutrition, etc. (see separate list of risk factors identified by a health risk assessment)
External data - for lifestyle-related risk factors such as industry, geography, education, and income level

How well did you know this?

Not at all

Perfectly

Risk factors identified by a health risk assessment

Personal disease history
Family disease history
Health screenings and immunizations
Alcohol consumption
Injury prevention behavior
Nutrition
Physical activity
Skin protection
Stress and well-being
Tobacco use
Weight management
Women’s health (eg, pregnancy status)
General health assessment
Functional health status
Mental health status

How well did you know this?

Not at all

Perfectly

Types of data sources for predictive modeling

Physician referral/chart (high reliability, low practicality) - medical charts provide the most information, but have serious drawbacks (see separate list)
Enrollment (high reliability, high practicality) - can be used to convert claims data into PMPM amounts
Claims (medium reliability, high practicality) - usually available to health plans and continually refreshed as events occur. Data quality varies greatly (must check for accuracy). Lots of info is provided in claim forms for hospital (UB04) and professional (CMS 1500) claims.
Pharmacy (medium reliability, high practicality) - high quality data that completes quickly. But there is no diagnosis on the claims, and prescriptions that aren’t filled won’t generate claims
Laboratory values (high reliability, low practicality) - can be difficult to obtain, and vendors do not use a standard format
Self-reported (low/medium reliability, low practicality) - will become important since members can report info that isn’t available elsewhere, but there are drawbacks (see separate list)

How well did you know this?

Not at all

Perfectly

Drawbacks of using data from medical charts

They do not cover OON services or drugs prescribed by OON providers
They do not record the patient’s compliance with physician orders (such as prescription filling)
Transcribing the data and transferring it to a uniform format is time consuming and requires highly-trained staff
There is not uniformity in how physicians code conditions and their severity
Charts are typically unavailable to the health plan or the actuary

How well did you know this?

Not at all

Perfectly

Advantages and disadvantages of using diagnosis codes for identifying member conditions

Advantages:
1. Codes are almost always present on medical claims
2. A uniform format exists
3. Usefulness for identifying conditions
Disadvantages:
1. Usually only the primary and secondary codes are populated in the claims data
2. Coding errors may occur
3. Codes may sometimes be selected to drive maximum reimbursement
4. Different physicians may follow different coding practices

How well did you know this?

Not at all

Perfectly

Drawbacks of using survey data

Surveys must be commissioned, budgeted, and executed in order to generate the data
Data isn’t updated as medical events occur, so it can become stale unless the survey is updated periodically
Response bias can make it dangerous to draw conclusions from survey responses
Respondents may submit untruthful answers

How well did you know this?

Not at all

Perfectly

Questions to answer when building a clinical identification algorithm

A clinical identification algorithm is a set of rules that is applied to a claims data set to identify the conditions present in the population

Where are the diagnoses?
What is the source of the diagnosis (claims, medical charts, etc.)?
If the source is claims, what claims should be considered (inpatient, outpatient, lab, etc.)?
If the claim contains more than one diagnosis, how many diagnoses will be considered for identification?
Over what time span, and how often, will a diagnosis have to appear in claims for that diagnosis to be incorporated?
What procedures may be useful for determining severity of a diagnosis?
What prescription drugs may be used to identify conditions?

How well did you know this?

Not at all

Perfectly

Challenges when constructing a condition-based model

The large # of procedure and drug codes
Deciding the severity level at which to recognize the condition
The impact of co-morbidities for conditions that are often found together
The degree of certainty with which the diagnosis has been identified
The extent of the data (claims data will cover all members, but self-reported data will not)
The type of benefit design that underlies the data

How well did you know this?

Not at all

Perfectly

Definitions of sensitivity and specificity

When building clinical identification algorithms, the proper balance between sensitivity and specificity must be found
1. Sensitivity - the % of members correctly identified as having a condition (“true positives”)
2. Specificity - the % of members correctly identified as not having a condition (“true negatives”)
Specificity may be more important for underwriting, while sensitivity may be more important for care management, since clinicians can verify the presence of a condition.

How well did you know this?

Not at all

Perfectly

External sources of clinical identification algorithms

HEDIS (from the NCQA) has algorithms for identifying some conditions (eg, asthma, high blood pressure, diabetes)
Disease Management Association of America (now Care Continuum Alliance) developed algorithms for identifying chronic diseases
Grouper models - commercially-available models that identify member conditions and score them for relative risk and cost
Literature - articles will sometimes report the codes that are used for analysis

How well did you know this?

Not at all

Perfectly

Reasons for using commercially-available grouper models

Building algorithms from scratch requires a considerable amount of work
Models must be maintained to accommodate new codes, which requires even more work
Commercially-available models are accessible to many users. Providers and plans often require that payments be based on a model that is available for review and validation

How well did you know this?

Not at all

Perfectly

Common features of Medicare prospective payment systems

A system of averages - providers cannot expect to make a profit on each case, but efficient providers can make a reasonable return on average
Increased complexity - DRGs are more complicated than a system based on per diem payments
Relative weights - associated with each patient group to reflect the average resources used by efficient providers
Conversion factor (base price) - the dollar amount for a unit of services. Is multiplied by the relative weight to determine payment
Outliers - unusual cases that require above-average resources and receive extra payments
Updates - the conversion factor and relative weights are adjusted annually to reflect new technologies and changing practice patterns
Access and quality - policymakers monitor PPSs and survey patients to ensure that beneficiaries have adequate access to high quality care and that providers are compensated adequately

Challenges with patient classification systems based on coding systems

Need for new DRGs - due to new diseases and new procedures
ICD coding - some codes may not be sufficiently precise as diseases and procedures are refined
Upcoding - providers may be tempted to exaggerate a patient’s secondary diagnoses to get paid more
New coding systems - adopting the new ICD-10 systems will be a major challenge for hospitals and CMS

Factors for choosing the right predictive model

Correlation structure - more complicated models may be needed for data containing correlated variables
Purposed of the analysis
The nature of the available data
Characteristics of the outcome variable (eg, quantitative vs. qualitative, unrestricted vs. truncated, binary choice vs. unrestricted choice)
Distribution of the outcome variable (eg, normal vs. skewed)
Functional relationship (eg, linear vs. non-linear) - when the equation cannot be transformed into a linear form, iterative processes or a maximum likelihood procedure may be used instead of ordinary regression methods
Complex decision model - whether a single equation model is sufficient or a simultaneous equation model is needed (if there is more than one dependent variable)

Steps of the data warehousing process

Identify which patients to include in the dataset
Identify which data elements to merge with the patient list
Identify what the data says about the patient (eg, create flags that describe the patient’s health and risk status)
Attach the derived variables and flags to the patient identifiers to create a picture of the patient history

Characteristics for assessing the quality of a model

Parsimony - should introduce as few variables as are necessary to produce the desired results
Identifiability - if there are more dependent variables than independent equations, then issues such as bias will result
Goodness of fit - variations in the outcomes variable should be explained to a high degree by the explanatory variables (measured by R^2 and other statistics)
Theoretical consistency - results should be consistent with the analyst’s prior knowledge of the relationships between variables
Predictive power - should predict well when applied to data that was not used in building the model

Statistics for determining whether a model is good

R^2 - measures how much of the variation in the dependent variable is explained by the variation in the independent variables. A more valid measure may be Adjusted R^2 = 1 - (1 - R^2) * (N - 1) / (N - k - 1), where N = # of observations and k = # of parameters.
Regression coefficients - examine the signs of the parameter estimates to ensure they make sense, then determine whether the value of the parameter estimate is statistically significant
F-Test - ratio of variance explained by the model divided by unexplained or error variance
Statistics used for logistic models:
a. Hosmer-Lemeshow statistic
b. Somers’ D statistic
c. C-statistic
Multicollinearity - occurs when a linear relationship exists between the independent variables. May be addressed by removing one of the collinear variables.
Heteroscedasticity - occurs when the error terms do not have a constant variance
Autocorrelation - occurs when there is a correlation to the error term in the regression function

Re-sampling methods for validating a model

These approaches help test the model’s predictive power

Bootstrap - the sampling distribution of an estimator is estimated by sampling with replacement from an original sample
Jackknife - the estimate of a statistic is systematically re-computed, leaving out one observation at a time from the sample set
Cross-validation - subsets of data are held out for use as validating sets
Permutation test - a reference distribution is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points

Factors used in developing risk scores in the CMS-HCC risk model

HCC = hierarchical condition category

Demographics - age and gender factors are the starting point. Higher risk scores are assigned to beneficiaries who are eligible for both Medicaid and Medicare.
Disabled indicators - a separate set of age and gender factors are used for beneficiaries under age 65 who are eligible for Medicare due to disability
Separate models are used for beneficiaries who:
a. Reside in a long-term care institution, or
b. Suffer from end-stage renal disease
New enrollees - since no claim history exists, only age and gender factors are used. Separate factors are developed for new enrollees
A prospective risk adjustment methodology is used to risk-adjust future payments based on actual historical medical experience
Calibration - every 2 yrs, CMS re-calibrates by updating the model weights to reflect new prescription drugs and changes in medical technologies, practice patterns, and provider coding practices
Health status risk factors are developed from the beneficiary’s diseases (using ICD-9 codes and grouping into HCCs)

Central features of Massachusetts health care reform

Establishment of an exchange (purchasing pool)
A Requirement that all employers establish Section 125 accounts (so employees could pay premiums on a pre-tax basis)
Large subsidies for families living below 300% of FPL
For those above 300% of FPL, availability of a more limited plan (so insurance would be affordable even outside the subsidy range)
A mandate that all individuals must purchase health insurance coverage
Funding through use of federal funds previously paid to safety net hospitals or paid for uncompensated care

Steps for developing and using predictive models for care management programs

1. Choose a disease or condition - programs should focus on diseases that: a. Are reasonably prevalent in typical commercial populations b. Can lead to costly exacerbations if not appropriately treated, and c. Have treatments that are relatively low cost that are within the control of the member 2. Rank conditions based on intervenability (the susceptibility of the condition to external management). Prioritize interventions based on an intervenability score rather than the highest risk scores. 3. Identify the population - construct algorithms to identify members who are at risk 4. Plan the intervention - identify the issue to address and the mechanism by which it will be addressed. Use care management nurses to assess and design a care plan for patients identified by the predictive model. 5. Perform economic modeling of the proposed program - must decide the best population penetration level to achieve the most savings. The Risk Management Economic Model can be used for this 6. Develop the predictive model. 7. Test actual outcomes against predictions, and use this info to modify the model and the program

Metrics that should be recognized in the Risk Management Economic Model

1. The # and risk-intensity of members to be targeted - the # must be large enough to produce savings that offset implementation costs, but not so large that marginal costs exceed marginal savings 2. Types of interventions to be used in the program - such as mail or automated outbound dialing 3. The # of nurses and other staff needed for the program, and program costs 4. The methodology for contacting and enrolling members 5. The rules for integrating the program with the rest of the care management system 6. The timing and #s of contacts, enrollments, and interventions 7. The predicted behavior of the target population if there were no intervention, and the predicted effectiveness of the intervention at modifying that behavior

Most common types of health risk

1. Pricing risk - made up of severity and frequency of events (known risk) 2. Underwriting risk - risk that overall pool will perform worse than expected (unknown risk)

Definition of a high-risk member

A member who has a significant probability of experiencing higher-than-average costs in the near future (such as next 12 mos). There is not a consistently successful method for identifying these members.

Commercially-available grouper models

1. Johns Hopkins Adjusted Clinical Groups (ACG) System - Case-mix adjustment measure for ambulatory and inpatient diagnoses, based on Aggregated Diagnosis Groups, age, and sex 2. Diagnosis Related Groups (DRGs) - Used by CMS and some commercial payers to ensure consistent reimbursement of hospitals for patients with the same risk profile; accounts for complications and co-morbidities 3. Chronic Illness and Disability Payment System (CDPS) - diagnosis-based risk adjustment model used by states to adjust payments for Medicaid beneficiaries 4. Clinical Risk Groups (CRGs) - identify groups of individuals requiring similar amounts and types of resources; similar to DRGs, but for all care over an extended time period 5. Diagnostic Cost Groups/Hierarchical Condition Category (DCG/HCC) - Developed as a health adjuster for Medicare inpatient and ambulatory care based on age, gender, dual eligible status, disabled status and diagnosis, with a focus on high-cost diagnoses 6. Sightlines DxCG Risk Solutions - uses demographics and claim (medical & pharmacy) to quantify the illness burden of a population for commercial, Medicare and Medicaid populations; a relative risk score is developed. 7. Episode Treatment Groups (ETGs) - case-mix adjustment and episode-building system used to develop a relative risk score based on complete treatment episodes

Drug grouper models

1. Therapeutic class groupers - use these models to group drugs into a hierarchy of therapeutic classes a. American Hospital Formulary Service (AHFS) b. Generic Product Identifier (GPI) 2. Drug-based risk adjustment models - infers the member's diagnosis from the therapeutic class of drugs the member uses and generates a relative risk score a. Medicaid Rx b. Pharmacy Risk Groups (PRGs) c. RxGroups (DxCG)

Worksheets required for Medicare Advantage bid

1. Bid-specific base period experience and key assumptions for contract year projection 2. Calculates projected allowed costs for contract year. (credibility blended if necessary) 3. Projected cost sharing by medical service category for the contract year 4. Development of net medical costs, including expenses and margin and supplemental benefits 5. Calculates benchmark and evaluates whether the plan realizes a savings or needs to charge a basic member premium 6. Summary of results 7. Pricing for optional supplemental benefit packages

CommCare risk sharing arrangements

1. Risk adjustment methodology applied to the medical capitation rate is a form of risk sharing between the plans and the Connector 2. Aggregate risk sharing corridors apply to all health plans. The Connector shares 50% of the risk for claims more than 2% above or below the capitation payment. 3. Specific outlier stop-loss pool that pays for 75% of specific claims above a $150K threshold. Funded by health plans at 1.25% of the capitation rate.