Course 3: Module 2 Flashcards
Bias
A preference in favor of or against a person, group of people, or thing
Data bias
A type of error that systematically skew results in a certain direction
Sampling bias
When a sample isn’t representative of the population as a whole
Unbiased sampling
When a sample is representative of the population being measured
Observer bias
The tendency for different people to observe things differently
Interpretation bias
The tendency to always interpret ambiguous situations in a positive or negative way
Confirmation bias
The tendency to search for or interpret information in a way that confirms pre-existing beliefs
ROCCC
Reliable
Original
Comprehensive
Current
Cited
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues
Data ethics
Well-founded standards of right and wrong that dictate how data is collected, shared, and used
GDPR
General Data Protection Regulation of the European Union
Aspects of data ethics
Ownership
Transaction transparency
Consent
Currency
Privacy
Openness
Ownership
Individuals own the raw data they provide and they have primary control over its usage, how it’s processed, and how it’s shared
Transaction transparency
All data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
Consent
An individual’s right to know explicit details about how and why their data will be used before agreeing to provide it
Currency
Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions
Privacy
Preserving a data subject’s information and activity any time a data transaction occurs
Openness
Free access, usage, and sharing of data
Data anonymization
the process of protecting people’s private or sensitive data by eliminating that kind of information. Typically, data anonymization involves blanking, hashing, or masking personal information, often by using fixed-length codes to represent data columns, or hiding data with altered values.
Data that is often anonymized
- Telephone numbers
- Names
- License plates and license numbers
- Social security numbers
- IP addresses
- Medical records
- Email addresses
- Photographs
- Account numbers
Data interoperability
The ability of data systems and services to openly connect and share data
For data to be considered open, it has to:
Be available and accessible to the public as a complete dataset
Be provided under terms that allow it to be reused and redistributed
Allow universal participation so that anyone can use, reuse, and redistribute the data
Resources for open data
- U.S government data site
- U. S Census Bureau
- Open Data Network
- Google Cloud Public Datasets
- Dataset Search