LM 11: Introduction to Big Data Techniques Flashcards

1
Q

What are the three Vs that Big Data is characterised by and when is a Fourth V relevant?

A

Volume, Velocity and variety.

Fourth V is relevant when Big Data is used for inference or prediction: Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main sources of alternative data?

A

Individuals, business processes and sensors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the definition of AI?

A

Computer systems capable of performing tasks that traditionally required human intelligence at levels comparable to those of human beings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Machine Learning is defined as?

A

Extracts knowledge from large amounts of data by learning from known examples and then generating structure or predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 main types of ML?

A

Supervised learning, unsupervised learning and deep learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Natural Language Processing (NLP)?

A

Application of text analytics that uses insight into the structure of human language to analyse and interpret text and voice based data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are traditional sources of big data?

A

Stock exchanges; companies and governments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are non-traditional data types?

A

Electronic devised, social media, sensor networks and company exhaust

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Veracity of data?

A

Relates to the credibility and reliability of different data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is structured data?

A

Organised into tables and commonly stored in a database where each field represents the same type of info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is unstructured data?

A

Disparate, unorganised data that cannot be organised in tubular form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is semi-structured data?

A

Has elements of both structured and unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an example of unstructured data?

A

Voice recordings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Internet of Things IoT?

A

Data generated by smart devices about a wide range of information. E.g. Smart building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Overfitting?

A

When data discovers false relationships that are unsubstantiated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Underfitting?

A

When ML treats important parameters as if they are noise and discards them, making the data set too simplistic

17
Q

What is the key basis of supervised learning?

A

The data is based on labelled training data

18
Q

What type of data is given in unsupervised learning?

A

Not given labelled data but instead are given only data from which the algorithm seeks to describe the data and its structure

19
Q

What is deep learning?

A

Neural networks with many hidden layers to perform multistage non linear data processing to identify patterns

20
Q

What are the the 5 Data Processing methods?

A

Capture
Curation
Storage
Search
Transfer

21
Q

What is Data Visualisation?

A

Refers to how the data will be formatted, displayed and summarised in graphical form

22
Q

How do Text analytics work?

A

Use computer programs to analyse and derive meaning typically from large unstructured text or voice based datasets

23
Q

What is a main characteristic of Big data?

A

It involved formats with diverse structures