Data 3 Flashcards by Abigail Harrison

Continuous data

Data that is measured and can have almost any numeric value

How well did you know this?

Not at all

Perfectly

Discrete data

Data that is counted and has a limited number of values

How well did you know this?

Not at all

Perfectly

Nominal data

A type of qualitative data that is categorized without a set order

How well did you know this?

Not at all

Perfectly

Nominal data

A type of qualitative data that is categorized without a set order

How well did you know this?

Not at all

Perfectly

Ordinal data

A type of qualitative data with a set order or scale

How well did you know this?

Not at all

Perfectly

Ordinal data

A type of qualitative data with a set order or scale

How well did you know this?

Not at all

Perfectly

Long data

data where each row contains a single data point for a particular item. In the long data example below, individual stock prices (data points) have been collected for Apple (AAPL), Amazon (AMZN), and Google (GOOGL) (particular items) on the given dates.

Preferred when Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank
Performing advanced statistical analysis or graphing

How well did you know this?

Not at all

Perfectly

Wide data

is data where each row contains multiple data points for the particular items identified in the columns.
Preferred when Creating tables and charts with a few variables about each subject
Comparing straightforward line graphs

How well did you know this?

Not at all

Perfectly

Data transformation

Data transformation is the process of changing the data’s format, structure, or values.
Adding, copying, or replicating data

Deleting fields or records

Standardizing the names of variables

Renaming, moving, or combining columns in a database

Joining one set of data with another

Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (.csv) file.

How well did you know this?

Not at all

Perfectly

How do you know you have good data?

R reliable
O original
C comprehensive
C cited
C current

How well did you know this?

Not at all

Perfectly

Pii

Personally identifiable information

How well did you know this?

Not at all

Perfectly

Interoperability

Multiple parties can access and share info. Like psychiatrist sending prescription to

How well did you know this?

Not at all

Perfectly

Currency

The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions

How well did you know this?

Not at all

Perfectly

Ethics

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

How well did you know this?

Not at all

Perfectly

Ethics

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

How well did you know this?

Not at all

Perfectly

Observer bias

Study These Flashcards

The tendency for different people to observe things differently (also called experimenter bias)

Transaction transparency

Study These Flashcards

The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data

Relational database

Study These Flashcards

is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.

In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.

Relational database

Study These Flashcards

is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.

Normalization

Study These Flashcards

is a process of organizing data in a relational database. For example, creating tables and establishing relationships between those tables. It is applied to eliminate data redundancy, increase data integrity, and reduce complexity in a database.

Primary key

Study These Flashcards

is an identifier that references a column in which each value is unique. In other words, it’s a column of a table that is used to uniquely identify each record within that table. The value assigned to the primary key in a particular row must be unique within the entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever have the same customer_id.

Primary key

Study These Flashcards

Csv

Study These Flashcards

Comma separated values

Cleaning data

Study These Flashcards

Sort by
Find and replace blanks or errors
Remove blank rows.
Formats: value types

Camel case

CamelCase capitalization means that you capitalize the start of each word, like a two-humped (Bactrian) camel. So the table TicketsByOccasion uses CamelCase capitalization

Snake case

snake_case

Sql comments

-- is used in all platforms, # can be used in some if you have more than two lines of comments, it might be cleaner and easier is to use /* to start the comment and */ to close the comment.

Schema

Schema: A way of describing how something, such as data, is organized

Schema

Schema: A way of describing how something, such as data, is organized

Types of metadata

Administrative Metadata that indicates the technical source of a digital asset Descriptive Metadata that describes a piece of data and can be used to identify it at a later point in time Structural Metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection

Data 3 Flashcards

(30 cards)