Data Analysis Flashcards
must be based on a solid understanding of statistical analysis and epidemiological concepts.
Definitions used in data analysis
The data include all positive cases, taking into account variables and decreasing the number of false-negatives.
Sensitivity
The data include only those cases specific to the needs of the measurement, excluding those from a different population thereby decreasing the number of false-positives.
Specificity
Data are classified according to subsets, taking variables into consideration.
Stratification
The tool/indicator collects and measures the necessary data
Recordability
Results should be reproducible.
Reliability
The tool or indicator should be easy to use and understand.
Usability
Collection measures the target adequately, so that the results have predictive value.
Validity
a method by which to identify patterns and relationships in large amounts of data, such as the identification of risk factors or the effectiveness of interventions.
Knowledge discovery in database (KDD)
he steps to KDD include
selecting data, preprocessing (e.g., assembling target data set, cleaning data of noise), transforming data, data mining, and interpreting results.
the analysis (often automatic) of large amounts of data to identify underlying or hidden patterns.
Data Mining
may be applied to multiple patients’ electronic health records to generate information about the need for further examination or interventions.
Data Mining
The steps to data mining include
detecting anomalies, identifying relationships, clustering, classifying, regressing, and summarizing.
involves electronically searching through large amounts of information to find relevant items.
Data Mining
Data mining uses several tools to look for patterns:
Association rule mining
Classification
Clustering
This tool looks for patterns in which a certain data object shows up repeatedly (more than randomly) and is associated with an unrelated data object.
Association Rule mining
This tool looks for data group membership. An example would be the number of sunny days in a year.
Classification
This tool organizes data objects according to their similar characteristics. This results in a natural pattern or clustering of similar data.
Clustering
Data mining can also be called
Knowledge discovery
refers to the collection and summation of data for further use, such as for statistical analysis.
Data aggregation
may be used to collect information about an individual from multiple sources, often for targeted marketing purposes.
Data aggregation
show the spread or dispersion of data.
Measures of distribution
is the distance from the highest to the lowest number
Range
measures the distribution spread around an average value.
Variance
is the square root of the variance and shows the dispersion of data above and below the mean in equally measured distances
standard deviation
a method of comparing rates or ratios.
Chi-square (X2)
a means by which to establish if a variance in categorical data (as opposed to numerical data) is of statistical significance.
Chi-square test
generally used to show whether there is a significant difference between groups or conditions being analyzed.
Chi- square testing
used to analyze data to determine if there is a statistically significant difference in the means of both groups. examines two sets of data that are similar,
The “t” test
used to evaluate the data sets found in scattergrams; it compares the relationship between the dependent variable and the independent variable to determine if the relationship correlates.
Regression analysis
attempting performance improvement and developing practice guidelines without data can be problematic.
Integrating the results of data analysis
should assist with case management, decision-making about individual care, improvement of critical pathways related to clinical performance, staff performance evaluations, credentialing, and privileging.
Integration of information
the process of changing information from a given source (such as a data entry terminal) into information that can be understood by a destination point (such as a large database)
Data transformation
Data transformation is performed in two steps:
Data mapping
code generation
This process develops a map of how information flows from one place to another and figures out which parts of the information needs to be transformed.
Data Mapping
This is when the actual transformation occurs and the data is converted into a form compatible with its destination.
Code generation
can be verbal (e.g., spoken/written representations), analog (e.g., television, radio, telephone, recorded), or digital (e.g., coded).
Data Representation
uses continuous waveform signals varying in intensity.
Analog representation
uses codes (usually numeric), such as the binary code (base 2) to represent values.
Computerized representation of data
comprised of strings of 1s and 0s with 1s stored in magnetized areas of disks and 0s stored in non-magnetized areas; thus, 1 represents “on,” and 0 represents “off.”
binary code
Each representation (0 or 1) is referred to as a
Bit - binary digit
8 bits =
1 byte
1 byte can represent
256 characters
1,000 bytes =
1 kilobyte
1 million bytes =
1 megabyte
1 billion bytes =
1 gigabyte
1 trillion bytes =
1 terabyte
the pattern of 0s and 1s used to represent characters.
The coding scheme
the most common binary coding scheme is
American Standard Code for Information Interchange
characters represent 4 binary bits; thus, 1 byte can be represented by 2 hexadecimal characters.
Hexadecimal
Uses a base of 16 and 16 symbols (usually the numeral 1–9, representing values 0 to 9 and Arabic letters A through F, representing values 10–15).
Hexadecimal coding
One digit (4 bits) is referred to as a
nibble
8 bits/ 1 byte are referred to as
octet
used with the Universal Character Set, is a standardized coding system that has a large capacity and can be used to represent text for most languages, including Asian languages.
The unicode standard coding scheme
provides a specific numeric value for each character and can be used across multiple platforms.
Unicode
representing all alphabets of the world languages, ideographic sets, symbols, and 100 scripts, and is particularly valuable for making coding accessible internationally.
Unicode