Book - Chapter 1 intro to big data analytics Flashcards

1
Q

What are the vs of big data

A

Volume. Velocity. Variety.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meta data

A

The minimum you should know about the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is paraders

A

How has the data been processed. What are the artefacts left in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is velocity

A

It is speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three attributes that stand out of defining big data characteristics

A

Huge volume of data
Complexity of data types and structures
Speed of new date of creation and growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is huge volume of data

A

Rather than thousands of rows, big data can be billions of rows and millions of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is complexity of data types and structures

A

It reflects the variety of new data sources, formats and structures, including digital traces been left on the web and other digital repositories for subsequent analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is speed of new data creation and growth

A

If you describe high velocity data, the rapid data ingestion in near real-time analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What way is big data sometimes described as having

A

The big free v’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the big three Vs

A

Volume, variety and velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can big data be Efficiently analysed using only traditional database or methods

A

No it requires new tools and technologies to store, manage and realise the business benefits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What main two forms can big data come from

A

Structured and nonstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is most of the big data formed

A

Usually unstructured or semistructured in nature Which requires different techniques and tools to process and analyse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where does 80 to 90% of future data growth come from

A

Non-structured data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What sort of data in addition could the RDBMS have

A

Quasi-or semistructured data, such as three form cell log information taking from an email ticket of the problem, customer chat history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the four parts of big data characteristics: data structures

A

Bottom: unstructured
Third: “is the structured
Second: semistructured
Top: structured

17
Q

What is quasi structured

A

Erratic structure, Webb click

18
Q

What is semistructured

A

Structure definition is embedded in the data

19
Q

What is structured

A

External definition of structure

20
Q

What does structured data consist of

A

A defined data type, format, and structure (transaction data online analytical processing data cubes, traditional RDBMS, CSV files and even simple spreadsheet) Excel

21
Q

What does semistructured data consist of

A

Textual data files with a discernible pattern that enables passing (such as extensible markup language XML data files that are self describing and find by an XML schema)

Scripts

22
Q

What does quasi-structured data consist of

A

Textual data with erratic data formats that can be formatted with effort, and time, and tools (for instance, web clckstreams data that may contain inconsistencies in data values and format)

23
Q

What does unstructured data consist of

A

Text documents, PDFs, images and video i.e. data has no inherent structure

24
Q

How can a clickstream be used

A

It can be passed in mind by data scientist to discover usage patterns I don’t have a relationship someone clicks and areas of interest on the website a group of sites

25
Q

How does big data describe data

A

It describes new kinds of data with which most organisations may not be used to working

26
Q

Is database administration training required to create spreadsheets

A

No

27
Q

What are EDW

A

Enterprise data warehouse

28
Q

What are enterprise data warehouse is critical for

A

Reporting and B I tasks and solve many other problems that proliferating spreadsheets introduce such as which of multiple versions of a spreadsheet is correct

29
Q

Despite the benefits of EDW and PI what do these systems tend to restrict

A

The flexibility need to perform robust or exploratory data analysis

30
Q

With the EDW model who is the data managed and controlled by

A

IT groups and database administrators (DBA) And data analysts who depend on IT for access and changes to the data of schemas

31
Q

What new problems do EDW and B I introduce

A

Flexibility and agility which were less pronounced when dealing with spreadsheets

32
Q

What is the solution to the problems faced with EDW and PI when dealing with spreadsheets

A

The analytic sandbox

33
Q

What does the analytic sandbox attempt to resolve

A

The conflict for analysis and data scientists with EDw and more formally managed corporate data

34
Q

How are analytic sandboxes purposely designed

A

To enable robust analytics well being centrally managed and secured

35
Q

How are analytic sandboxes often referred to as

A

Work spaces as they are designed to enable teams to explore more data set in a controlled fashion and are not typically use the enterprise level financial reporting and sales databases

36
Q

What do Analytic sandboxes enable

A

High-performance Computering using in database processing