Course 3 Flashcards

1
Q

Name a few examples of how data can be collected.

A

interview, surveys, observations, questionnariers, cookies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Knowing how our data was generated adds ____?

A

context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main data sources you can use, if you dont use a first party method?

A

second party - data collceted by a group from its audience and then sold

third party - data collected from outside sources who did not collect it directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 8 types of data formatting? Give definition of each, and an example.

A

Discrete, continuous, nominal, ordinal, internal, external, structured, unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a data model?

A

a data model is used for organizing data elements and how they relate to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a data element?

A

Piences of information, such as peoples names, account numbers and adresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a data type?

A

a specific kind of data attribute that tells you what kind of value the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three main data types youll use as a data analyst?

A

number, text, or string, boolean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is wide data?

A

every data subject has a single row, and many columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is long data?

A

Subjects have multiple rows of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the examples of data transformation?

A

Adding, copying or replicating data

Deleting fields or records

Standardizing the names of variables

Renaming, moving, or combining columns in a database

Joining one set of data with another

Saving a file in a different format, such as spreadsheet to CSV(comma separated value file)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 6 reasons to transform data?

A
  1. Data organization: organizing the data will make it easier to use
  2. Data Compatibility: different applications or systems can then use the same data
  3. Data migration: data with matching formats can be moved from one system to another
  4. Data Merging: data with the same organization can be merged together
  5. Data enhancement: Data can be displayed with more detailed fields
  6. Data comparison: apples to apples comparisons of the data can then me made
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is wide data preffered?

A

Creating tables and charts with a few variables about each subject

comparing straightfoward line graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is Long data preffered?

A

Storing a lot of variables about each subject.

Performing advanced statistical analysis or graphing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is bias?

A

a preference in favor of something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Data bias?

A

a type of error that systematically skews results in a certain direction

17
Q

What is Sampling Bias?

A

a sample that isnt representative of the population being measured

18
Q

What were the three types of common bias in data?

A

observer - the tendency for people to observe things differently

Interpretation bias - the tendency to always interpret ambigous situations in a positive or negative way

Confirmation bias - the tendency to search for or interpret information in a way that confirms pre-existing beliefs

19
Q

Explain the ROCCC model for identifying good data sources

A

Reliable - accurate, complete and unbiased information thats been vetted and proven fit for use

Original - be sure to validate data with original source

Comprehensive - contain all critical information needed to find the solution, or answer the question

Current - the usefulness of data decreases as time passes

Cited - makes the information more credible. Who created the data set, is it part of a credible organization, when has the data been refreshed

20
Q

What is ethics?

A

well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benifits to society, fairness, or specific values

21
Q

What is Data ethics?

A

Well-founded standards of right and wrong that dictate how data is collected shared and used

22
Q

What are the six aspects of data ethics?

A

Ownership - who owns the data? - individuals own the raw data they provide and they have primary control over its usage, how its processed, and how its shared

Transaction transparency - all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data
This allows the individuals providing data, to see for themselves if the data or outcomes of analysis were biased and fair, and bring up further questions or problems

Consent - an individual’s right to know explicit details about how and why their data will be used before agreeing to provide it

Currency - individuals should be aware of financial transactions resulting from the use of tier personal data and the scale of these transactions

Privacy - Preserving a data subjects information and activity any time a data transaction occurs
Protection from unauthorized access to our private data
Freedom from inappropriate use of our data
The right to inspect, update, or correct our data
Ability to give conscience to use our data
Legal right to access the data

Openness - Free access, usage, and sharing of data

23
Q

What is data anonymization?

A

The process of removing personally identifying information

24
Q

What types of data needs to be anonymized?

A
Phone numbers
Names
License plates and numbers
Social security numbers
Ip addresses
Medical records
Email addresses
Photographs
Account numbers
25
Q

How do we anonymize data, roughly speaking?

A

Blanking
Hashing
Masking personal information
Hiding altered values

26
Q

What are the characteristcs of open data?

A

Availability and access(complete and publicy accessable datasets)

reuse and distribution

universal participation

27
Q

What is metadata?

A

data about data, where it comes from, how and when its created, what its about

28
Q

What are the characteristics of a relational database?

A

each table must have overlapping fields of at leas t 1
primary keys
foreign keys

29
Q

What is metadata used for?

A

used in database management to help data analysts interpret the contents of the data within the database

30
Q

what are the three types of metadata?

A

descriptive - what does it mean, who owns it, what does it contain, when was it published?

structural - describes the types, versions, relationships

administrative - indicated the technical source of a digital asset (time file was used, or created, type of file, etc)