T,E+S P2 Flashcards
For the data quality assessment which scores are quantitative ? What percentage should they exceed ?
Accessibility
Consistency
Completeness
Relevance
50%
What is the formula for accessibility ? Data fields set can provide without further modification
A= (Nexplicit + 0.8Nimplicit + 0.2Ninferred)/N variables * 100
Nexplicit = variables which already exist
Ninferred = variables modified for analysis
Nimplicit = variables newly created
This is assessed for each data field
Accessibility - data fields dataset can provide without modifications
What is the formula for consistency ? (No of data provided in same manner in a data field )
C = nconsistent + 0.8nrecoded + 0.2nnewlycoded / nvariables * 100
No of data provided in same manner in data field
What is the formula for completeness (diff in percent of complete data and missing data)?
Completeness = 1 - arithmetic average of missing values percentage
What is the formula for relevance ? (Data fields relevant to analysis )
Relevance = 1/Ncategories * sum (no of relevant data fields in each category / no of requested data fields in each category * 100 )
What are the qualitative analysis variables ?
Accuracy
Credentials
Interpretability
Timeliness
What is accuracy?
Extent to which data values free from error
What is credentials ?
The extent to which data are sourced from reliable sources and can be measured through a survey
What is interpretability ?
Extent to which data are completed in unambiguous manner so data fields are interpreted in correct and similar manner
What is timeliness ?
Elapsed time between incident occurring and recording
How is the level of narrative detail defined ?
The information included in the narrative is key for finding contextual and contributory factors
Use smaller sample of 2-5% to determine if narratives > 50 words provide key info about parameters in dataset (1 if yes 0 if no)
What is the formula for narrative ?
1/Nparams * sum(Nincidents where j=1 /Nincidents ) * 100
Sum across the parameters