DATA Flashcards

1
Q

DATA ARCHITECTURE

A

Standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data
systems and in organizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

data scientist ( 🧐 to find insight and deals with…)

A
- perform an exploratory analysis to discover insights from the data. Deals with an enormous
mass of
structure/unstructured
data and use their skills
in math, statistics,
programming, machine
learning, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data engineers 🏛

A
Develops, constructs,
tests & maintains the
complete architecture
of large-scale
processing systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

data analyst

A
Takes data and uses it to
help companies make better
business decision:
- Analyze and translate to
the “English language.”
- This data is used by upper
management to make
business decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DATA LAKE:

A

Is a storage that holds a vast amount of raw data in its natural form until it is needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DATA PROCESSING: (us des)

A

Is the conversion of data into a usable and desired form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

APACHE Hadoop:

A

Is an open-source framework that is used to efficiently store and process large datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

APACHE Spark:

A

Is a data processing framework that can quickly perform processing tasks on very large data sets
and can also distribute data processing tasks across multiple computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

APACHE Hive:

A

Is an open-source data warehouse software for reading, writing and managing large data set files
that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems
such as Apache HBase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SQL:

A

Is a domain-specific language used in programming and designed for managing data held in a relational
database management system, or for stream processing in a relational data stream management system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NoSQL

A
NoSQL databases (aka "not only SQL") store data differently than relational tables . They provide flexible schemas and scale easily with large amounts of data and high user
loads.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DATA WAREHOUSE:

A

A Data Warehousing (DW) is process for collecting and managing data from varied sources to
provide meaningful business insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SOCIAL DATA

A

IS THE INFORMATION ABOUT YOU, SUCH AS YOUR
MOVEMENTS, BEHAVIOR, AND INTEREST, AS WELL
AS INFORMATION ABOUT YOUR RELATIONSHIPS
WITH OTHER PEOPLE, PLACES, PRODUCTS, EVEN
IDEOLOGIES.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BIG DATA is a phrase used that means

A

massive volume of both structured and
unstructured data that is so large it is difficult to process using
traditional database and software techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

BIG DATA has the potential to help

A

companies improve operations and make

faster, more intelligent decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DATA SCIENCE IS A (🔨 💹 🥅🏘 in data)

A

blend of various tools, algorithms, and machine
learning principles with the goal to discover hidden patterns
from the raw data.

17
Q

DATA ANALYTICS IS THE SCIENCE OF

A

examining
raw data with the purpose of drawing
conclusions about that information.

18
Q

BIG DATA-> 4V

A

Volumen
Velocidad
Variedad
Veracidad

19
Q

BIG DATA PROFESSIONAL (🃏w/ , from, at high )

A

Dealing with huge amount of heterogeneous data, which is gathered from various sources coming in at a high velocity.

20
Q

TYPES OF DATA

A

STRUCTURED UNSTRUCTURED QUALITATIVE AND QUANTITATIVE

21
Q

QUALITATIVE DATA

A

Qualitative data is descriptive and conceptual. Qualitative data can be categorized based on
traits and characteristics.
✓ Is non-statistical and is typically unstructured or semi-structured in nature.

22
Q

QUANTITATIVE DATA

A

can be counted, measured, and expressed using numbers.

23
Q

STRUCTURED DATA

A

is highly-organized and formatted in a way so it’s easily searchable in
relational databases.

24
Q

UNSTRUCTURED DATA

A

has no pre-defined format or organization, making it much more
difficult to collect, process, and analyze.
UNSTRUCTURED

• Is most often categorized as qualitative data, and it cannot be processed and
analyzed using conventional tools and methods. (VIDEO AUDIO, MOBILE ACTIVITY ETC)

25
Q

80% OF DATA IS

A

UNSTRUCTURED