WEEK 1: The importance of integrity Flashcards
Good alignment
Means that the data is relevant and can help you solve a business problem or determine a course of action to achieve a given business objective.
Some of those limitations you might come across
data from just one source.
data set keeps updating
not enough data to know if this number is too low or too high.
Outdated data
Data that’s geographically-limited
How you can handle different types of insufficient data.
You can identify trends with the available data or
wait for more data if time allows;
you can talk with stakeholders and adjust your objective;
or you can look for a new data set.
Things to remember when determining the size of your sample
Don’t use a sample size less than 30.
The confidence level most commonly used is 95%, but 90% can work in some cases.
Increase the sample size to meet specific needs of your project:
For a higher confidence level, use a larger sample size
To decrease the margin of error, use a larger sample size
For greater statistical significance, use a larger sample size
Data integrity
is the accuracy, completeness, consistency, and trustworthiness of data throughout its lifecycle.
Data replication
is the process of storing data in multiple locations.
Data can lacks integrity
Because different people might not be using the same data for their findings, which can cause inconsistencies.
data transfer,
which is the process of copying data from a storage device to memory, or from one computer to another.
data manipulation
Is the process that involves changing the data to make it more organized and easier to read.
the data warehouse or data engineering team
takes care of ensuring data integrity
Statistical power
can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate the number of observations or sample size required in order to detect an effect in an experiment.
You need a statistical power of at least 0.8 or 80% to consider your results statistically significant.
Statistical power
Is the probability of getting meaningful results from a test.