Course 3 Flashcards
Name a few examples of how data can be collected.
interview, surveys, observations, questionnariers, cookies
Knowing how our data was generated adds ____?
context
What are the main data sources you can use, if you dont use a first party method?
second party - data collceted by a group from its audience and then sold
third party - data collected from outside sources who did not collect it directly
What are the 8 types of data formatting? Give definition of each, and an example.
Discrete, continuous, nominal, ordinal, internal, external, structured, unstructured
What is a data model?
a data model is used for organizing data elements and how they relate to one another.
What is a data element?
Piences of information, such as peoples names, account numbers and adresses
What is a data type?
a specific kind of data attribute that tells you what kind of value the data is
What are the three main data types youll use as a data analyst?
number, text, or string, boolean
What is wide data?
every data subject has a single row, and many columns
What is long data?
Subjects have multiple rows of data
What are the examples of data transformation?
Adding, copying or replicating data
Deleting fields or records
Standardizing the names of variables
Renaming, moving, or combining columns in a database
Joining one set of data with another
Saving a file in a different format, such as spreadsheet to CSV(comma separated value file)
What are the 6 reasons to transform data?
- Data organization: organizing the data will make it easier to use
- Data Compatibility: different applications or systems can then use the same data
- Data migration: data with matching formats can be moved from one system to another
- Data Merging: data with the same organization can be merged together
- Data enhancement: Data can be displayed with more detailed fields
- Data comparison: apples to apples comparisons of the data can then me made
When is wide data preffered?
Creating tables and charts with a few variables about each subject
comparing straightfoward line graphs
When is Long data preffered?
Storing a lot of variables about each subject.
Performing advanced statistical analysis or graphing
What is bias?
a preference in favor of something
What is Data bias?
a type of error that systematically skews results in a certain direction
What is Sampling Bias?
a sample that isnt representative of the population being measured
What were the three types of common bias in data?
observer - the tendency for people to observe things differently
Interpretation bias - the tendency to always interpret ambigous situations in a positive or negative way
Confirmation bias - the tendency to search for or interpret information in a way that confirms pre-existing beliefs
Explain the ROCCC model for identifying good data sources
Reliable - accurate, complete and unbiased information thats been vetted and proven fit for use
Original - be sure to validate data with original source
Comprehensive - contain all critical information needed to find the solution, or answer the question
Current - the usefulness of data decreases as time passes
Cited - makes the information more credible. Who created the data set, is it part of a credible organization, when has the data been refreshed
What is ethics?
well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benifits to society, fairness, or specific values
What is Data ethics?
Well-founded standards of right and wrong that dictate how data is collected shared and used
What are the six aspects of data ethics?
Ownership - who owns the data? - individuals own the raw data they provide and they have primary control over its usage, how its processed, and how its shared
Transaction transparency - all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
This allows the individuals providing data, to see for themselves if the data or outcomes of analysis were biased and fair, and bring up further questions or problems
Consent - an individual’s right to know explicit details about how and why their data will be used before agreeing to provide it
Currency - individuals should be aware of financial transactions resulting from the use of tier personal data and the scale of these transactions
Privacy - Preserving a data subjects information and activity any time a data transaction occurs
Protection from unauthorized access to our private data
Freedom from inappropriate use of our data
The right to inspect, update, or correct our data
Ability to give conscience to use our data
Legal right to access the data
Openness - Free access, usage, and sharing of data
What is data anonymization?
The process of removing personally identifying information
What types of data needs to be anonymized?
Phone numbers Names License plates and numbers Social security numbers Ip addresses Medical records Email addresses Photographs Account numbers
How do we anonymize data, roughly speaking?
Blanking
Hashing
Masking personal information
Hiding altered values
What are the characteristcs of open data?
Availability and access(complete and publicy accessable datasets)
reuse and distribution
universal participation
What is metadata?
data about data, where it comes from, how and when its created, what its about
What are the characteristics of a relational database?
each table must have overlapping fields of at leas t 1
primary keys
foreign keys
What is metadata used for?
used in database management to help data analysts interpret the contents of the data within the database
what are the three types of metadata?
descriptive - what does it mean, who owns it, what does it contain, when was it published?
structural - describes the types, versions, relationships
administrative - indicated the technical source of a digital asset (time file was used, or created, type of file, etc)