Course1-M3 Flashcards

Data Collection and Data Wrangling

1
Q

Non-relational databases can be queried using ____ or ____ query tools. Some non-relational databases come with their own querying tools such as CQL for Cassandra and GraphQL for Neo4J.

A

SQL
SQL-like

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Exchange platforms allow the exchange of data between data providers and data consumers. They provide data licensing workflows, de-identification and protection of personal information, legal frameworks, and a quarantined analytics environment. Examples of popular data exchange platforms include ____, ____, ____, and ____.

A

AWS Data Exchange
Crunchbase
Lotame
Snowflake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Structured data can’t also be stored in NoSQL databases. True/False

A

False, it can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Semi-structured data is data that has some organizational properties but not a rigid schema, such as, data from emails, XML, zipped files, binary executables, and TCP/IP protocols. Semi-structured can be stored in NoSQL clusters. ____ and ____ are commonly used for storing and exchanging semi-structured data.

A

XML
JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unstructured data is data that does not have a structure and cannot be organized into a schema, such as data from web pages, social media feeds, images, videos, documents, media logs, and surveys. ____ and ____ provide a good option to store and manipulate large volumes of unstructured data.

A

NoSQL databases
Data Lakes

Data lakes can accommodate all data types and schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

____ and ____ provide automated functions that facilitate the process of importing data. Tools such as Talend and Informatica, and programming languages such as Python and R, and their libraries, are widely used for importing data.

A

ETL tools
data pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Joins combine ____ (columns/rows). Unions combine ____ (columns/rows).

A

Columns: When two tables are joined together, columns from the first source table are combined with columns from the second source table—in the same row. So, each row in the resultant table contains columns from both tables.
rows: Rows of data from the first source table are combined with rows of data from the second source table into a single table. Each row in the resultant table is from one source table or another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normalization focuses on ____

Denormalization is used to ____

A

cleaning the database of unused data and reducing redundancy and inconsistency. Data coming from transactional systems, for example, where a number of insert, update, and delete operations are performed on an ongoing basis, are highly normalized.

combine data from multiple tables into a single table so that it can be queried faster. For example, normalized data coming from transactional systems is typically denormalized before running queries for reporting and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

popularly used data wrangling software and tools, such as: ____, ____ , ____, ____, ____, ____, ____, ____

A

Excel
Power Query / Spreadsheets and Add-ins
OpenRefine
Google DataPrep
Watson Studio Refinery
Trifacta Wrangler
Python
R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly