Data 3 Flashcards
Continuous data
Data that is measured and can have almost any numeric value
Discrete data
Data that is counted and has a limited number of values
Nominal data
A type of qualitative data that is categorized without a set order
Nominal data
A type of qualitative data that is categorized without a set order
Ordinal data
A type of qualitative data with a set order or scale
Ordinal data
A type of qualitative data with a set order or scale
Long data
data where each row contains a single data point for a particular item. In the long data example below, individual stock prices (data points) have been collected for Apple (AAPL), Amazon (AMZN), and Google (GOOGL) (particular items) on the given dates.
Preferred when Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank
Performing advanced statistical analysis or graphing
Wide data
is data where each row contains multiple data points for the particular items identified in the columns.
Preferred when Creating tables and charts with a few variables about each subject
Comparing straightforward line graphs
Data transformation
Data transformation is the process of changing the data’s format, structure, or values.
Adding, copying, or replicating data
Deleting fields or records
Standardizing the names of variables
Renaming, moving, or combining columns in a database
Joining one set of data with another
Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (.csv) file.
How do you know you have good data?
R reliable
O original
C comprehensive
C cited
C current
Pii
Personally identifiable information
Interoperability
Multiple parties can access and share info. Like psychiatrist sending prescription to
Currency
The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues
Ethics
Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues
Observer bias
The tendency for different people to observe things differently (also called experimenter bias)
Transaction transparency
The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data
Relational database
is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.
In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.
Relational database
is a database that contains a series of tables that can be connected to form relationships. Basically, they allow data analysts to organize and link data based on what the data has in common.
In a non-relational table, you will find all of the possible variables you might be interested in analyzing all grouped together. This can make it really hard to sort through. This is one reason why relational databases are so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use across an entire database.
Normalization
is a process of organizing data in a relational database. For example, creating tables and establishing relationships between those tables. It is applied to eliminate data redundancy, increase data integrity, and reduce complexity in a database.
Primary key
is an identifier that references a column in which each value is unique. In other words, it’s a column of a table that is used to uniquely identify each record within that table. The value assigned to the primary key in a particular row must be unique within the entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever have the same customer_id.
Primary key
is an identifier that references a column in which each value is unique. In other words, it’s a column of a table that is used to uniquely identify each record within that table. The value assigned to the primary key in a particular row must be unique within the entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever have the same customer_id.
Csv
Comma separated values
Cleaning data
Sort by
Find and replace blanks or errors
Remove blank rows.
Formats: value types
Camel case
CamelCase capitalization means that you capitalize the start of each word, like a two-humped (Bactrian) camel. So the table TicketsByOccasion uses CamelCase capitalization
Snake case
snake_case
Sql comments
– is used in all platforms, # can be used in some
if you have more than two lines of comments, it might be cleaner and easier is to use /* to start the comment and */ to close the comment.
Schema
Schema: A way of describing how something, such as data, is organized
Schema
Schema: A way of describing how something, such as data, is organized
Types of metadata
Administrative
Metadata that indicates the technical source of a digital asset
Descriptive
Metadata that describes a piece of data and can be used to identify it at a later point in time
Structural
Metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection