WEEK 2 Flashcards
Bias
Has evolved to become a preference in favor of or against a person, group of people, or thing.
Data bias
Is a type of error that systematically skews results in a certain direction.
What do you have to think about when you collect data ?
As a data analyst, you have to think about bias and fairness from the moment you start collecting data to the time you present your conclusions.
Sampling bias
is when a sample isn’t representative of the population as a whole.
Unbiased sampling
Results in a sample that’s representative of the population being measured.
3 DATA BIAS
observer bias, interpretation bias, and confirmation bias
Observer bias
It’s the tendency for different people to observe things differently.
Interpretation bias
The tendency to always interpret ambiguous situations in a positive, or negative way.
Confirmation bias
Is the tendency to search for, or interpret information in a way that confirms preexisting beliefs.
Observer bias is somtimes refered to
Experimental bias
Research bias
ROCCC process
Reliable
Original
Current
Comprehensive
Cited
Reliable
With this data you can trust that you’re getting accurate, complete and unbiased information that’s been vetted and proven fit for use
Original
There’s a good chance you’ll discover data through a second or third party source. To make sure you’re dealing with good data, be sure to validate it with the original source
Comprehensive
The best data sources contain all critical information needed to answer the question or find the solution.
Current
The usefulness of data decreases as time passes. If you wanted to invite all current clients to a business event, you wouldn’t use a 10-year-old client list. The same goes for data.
Cited
If you’ve ever told a friend where you heard that a new movie sequel was in the works, you’ve cited a source. Citing makes the information you’re providing more credible.
There’s lots of places that are known for having good data.
Your best bet is to go with the vetted public data sets, academic papers, financial data, and governmental agency data.
Bad data
Not Reliable
Not Original
Not Current
Not Comprehensive
Not Cited
Not Reliable
Bad data can’t be trusted because it’s inaccurate, incomplete, or biased.
This could be data that has sample selection bias because it doesn’t reflect the overall population.
Or it could be data visualizations and graphs that are just misleading.
Not Original
If you can’t locate the original data source and you’re just relying on second or third party information, that can signal you may need to be extra careful in understanding your data.
Not comprehensive.
Bad data sources are missing important information needed to answer the question or find the solution. What’s worse, they may contain human error, too.
Not current
Bad data sources are out of date and irrelevant. Many respected sources refresh their data regularly, giving you confidence that it’s the most current info available.
Not cited.
If your source hasn’t been cited or vetted, it’s a no-go.
Data ethics
Refers to well- founded standards of right and wrong that dictate how data is collected, shared, and used.