Data Analysis Flashcards
must be based on a solid understanding of statistical analysis and epidemiological concepts.
Definitions used in data analysis
The data include all positive cases, taking into account variables and decreasing the number of false-negatives.
Sensitivity
The data include only those cases specific to the needs of the measurement, excluding those from a different population thereby decreasing the number of false-positives.
Specificity
Data are classified according to subsets, taking variables into consideration.
Stratification
The tool/indicator collects and measures the necessary data
Recordability
Results should be reproducible.
Reliability
The tool or indicator should be easy to use and understand.
Usability
Collection measures the target adequately, so that the results have predictive value.
Validity
a method by which to identify patterns and relationships in large amounts of data, such as the identification of risk factors or the effectiveness of interventions.
Knowledge discovery in database (KDD)
he steps to KDD include
selecting data, preprocessing (e.g., assembling target data set, cleaning data of noise), transforming data, data mining, and interpreting results.
the analysis (often automatic) of large amounts of data to identify underlying or hidden patterns.
Data Mining
may be applied to multiple patients’ electronic health records to generate information about the need for further examination or interventions.
Data Mining
The steps to data mining include
detecting anomalies, identifying relationships, clustering, classifying, regressing, and summarizing.
involves electronically searching through large amounts of information to find relevant items.
Data Mining
Data mining uses several tools to look for patterns:
Association rule mining
Classification
Clustering
This tool looks for patterns in which a certain data object shows up repeatedly (more than randomly) and is associated with an unrelated data object.
Association Rule mining
This tool looks for data group membership. An example would be the number of sunny days in a year.
Classification
This tool organizes data objects according to their similar characteristics. This results in a natural pattern or clustering of similar data.
Clustering
Data mining can also be called
Knowledge discovery
refers to the collection and summation of data for further use, such as for statistical analysis.
Data aggregation
may be used to collect information about an individual from multiple sources, often for targeted marketing purposes.
Data aggregation
show the spread or dispersion of data.
Measures of distribution