Course 4: Module 1 Flashcards
Data integrity
The accuracy, completeness, consistency, and trustworthiness of data throughout its lifecycle.
Data replication
The process of storing data in multiple locations
Data transfer
The process of copying data from a storage device to memory, or from one computer to another
Data manipulation
The process of changing data to make it more organized and easier to read
Other threats to data integrity
- Human error
- Viruses
- Malware
- Hacking
- System failures
Types of insufficient data
- Data from only one source
- Data that keeps updating
-Outdated data
-Geographically limited
Ways to address insufficient data
-Identify trends with the available data
- Wait for more data if time allows
- Talk with stakeholders and adjust your objective
-Look for a new dataset
Population
All possible data values in a certain dataset
Sample size
A part of a population that is representative of the population
Sampling bias
A sample isn’t representative of the population as a whole
Random sampling
A way of selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen
Margin of error
Since a sample is used to represent a population, the sample’s results are expected to differ from what the result would have been if you had surveyed the entire population. This difference is called the margin of error. The smaller the margin of error, the closer the results of the sample are to what the result would have been if you had surveyed the entire population.
Confidence level
How confident you are in the survey results. For example, a 95% confidence level means that if you were to run the same survey 100 times, you would get similar results 95 of those 100 times. Confidence level is targeted before you start your study because it will affect how big your margin of error is at the end of your study.
Confidence interval
The range of possible values that the population’s result would be at the confidence level of the study. This range is the sample result +/- the margin of error.
Statistical significance
The determination of whether your result could be due to random chance or not. The greater the significance, the less due to chance.
Statistical power
The probability of getting meaningful results from a test