Data 3 Flashcards

1
Q

Continuous data

A

Data that is measured and can have almost any numeric value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete data

A

Data that is counted and has a limited number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal data

A

A type of qualitative data that is categorized without a set order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Nominal data

A

A type of qualitative data that is categorized without a set order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal data

A

A type of qualitative data with a set order or scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinal data

A

A type of qualitative data with a set order or scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Long data

A

data where each row contains a single data point for a particular item. In the long data example below, individual stock prices (data points) have been collected for Apple (AAPL), Amazon (AMZN), and Google (GOOGL) (particular items) on the given dates.

Preferred when Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank
Performing advanced statistical analysis or graphing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Wide data

A

is data where each row contains multiple data points for the particular items identified in the columns.
Preferred when Creating tables and charts with a few variables about each subject
Comparing straightforward line graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data transformation

A

Data transformation is the process of changing the data’s format, structure, or values.
Adding, copying, or replicating data

Deleting fields or records

Standardizing the names of variables

Renaming, moving, or combining columns in a database

Joining one set of data with another

Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (.csv) file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you know you have good data?

A

R reliable
O original
C comprehensive
C cited
C current

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pii

A

Personally identifiable information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interoperability

A

Multiple parties can access and share info. Like psychiatrist sending prescription to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Currency

A

The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ethics

A

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ethics

A

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Observer bias

A

The tendency for different people to observe things differently (also called experimenter bias)

17
Q

Transaction transparency

A

The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data

18
Q

Relational database

A

is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.

In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.

19
Q

Relational database

A

is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.

In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.

20
Q

Normalization

A

is a process of organizing data in a relational database. For example, creating tables and establishing relationships between those tables. It is applied to eliminate data redundancy, increase data integrity, and reduce complexity in a database.

21
Q

Primary key

A

is an identifier that references a column in which each value is unique. In other words, it’s a column of a table that is used to uniquely identify each record within that table. The value assigned to the primary key in a particular row must be unique within the entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever have the same customer_id.

22
Q

Primary key

A

is an identifier that references a column in which each value is unique. In other words, it’s a column of a table that is used to uniquely identify each record within that table. The value assigned to the primary key in a particular row must be unique within the entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever have the same customer_id.

23
Q

Csv

A

Comma separated values

24
Q

Cleaning data

A

Sort by
Find and replace blanks or errors
Remove blank rows.
Formats: value types

25
Q

Camel case

A

CamelCase capitalization means that you capitalize the start of each word, like a two-humped (Bactrian) camel. So the table TicketsByOccasion uses CamelCase capitalization

26
Q

Snake case

A

snake_case

27
Q

Sql comments

A

– is used in all platforms, # can be used in some
if you have more than two lines of comments, it might be cleaner and easier is to use /* to start the comment and */ to close the comment.

28
Q

Schema

A

Schema: A way of describing how something, such as data, is organized

29
Q

Schema

A

Schema: A way of describing how something, such as data, is organized

30
Q

Types of metadata

A

Administrative
Metadata that indicates the technical source of a digital asset

Descriptive
Metadata that describes a piece of data and can be used to identify it at a later point in time

Structural
Metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection