Big Data Flashcards

1
Q

What is FinTech?

A

Technology-driven innovation in finance industry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What were early forms of FinTech?

A

Data processing and automation of routine tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two important applications of FinTech in quantitative analysis?

A

Analysis of large (alternative datasets)
Analytical tools such as AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by Big Data?

A

The vast amount of information being generated by the industry, government, individuals, and electronic devices. Includes data from traditional sources (stock exchanges, companies) as well as from non-traditional sources (social media, sensor networks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four characteristics of Big Data?

A

Volume (large amounts of data)
Velocity (high speed and frequency, real-time data)
Variety (many different sources)
Veracity (credibility, reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are sources of Big Data?

A

Financial markets
Businesses
Governments
Individuals
Sensors
Internet of Things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between traditional business intelligence and Big Data?

A

Big Data incorporates the use of alternative data sources as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three broad main sources of alternative data?

A

Individuals
Business processes
Sensors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are challenges of Big Data?

A

Quality of data
Volume of data
Appropriateness of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Artificial Intelligence?

A

Computer systems that are capable of performing tasks that traditionally have required human intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are neural networks?

A

Programming based on how our brain learns and processes information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Machine Learning?

A

Computer-based techniques that seek to extract knowledge from large amounts of data without making any assumptions on the data’s underlying probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the expert system?

A

Type of computer programming that attempted to simulate the knowledge base and analytical abilities of human experts in specific problem-solving context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the goal of Machine Learning?

A

Generate structure or predictions from data without any help from a human. Find the pattern, apply the pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three datasets involved in Machine Learning?

A

Training dataset: identify relationships between inputs and outputs.
Validation dataset: Validate relationships and tune the model.
Test dataset: Test the model’s ability to predict well on new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is overfitting of the data in ML?

A

When the ML model learns the input and target dataset too precisely, the model has been “overtrained”. Treats noise in the data as true parameters.

17
Q

What is underfitting of the data in ML?

A

Model is too simplistic, it treats true parameters as noise and cannot recognize relationships.

18
Q

What are three classes of techniques of ML?

A

Supervised learning: model receives labeled data
Unsupervised learning: non-labeled data
Deep learning: use of neural networks

19
Q

What is data science?

A

An interdisciplinary field that harnesses advances in computer science, statistics and other disciplines to extract information from Big Data.

20
Q

What are the five data processing methods used by data scientists?

A

Capture: how data is collected into a format to be analyzed.
Curation: ensuring data quality and accuracy through data cleaning.
Storage: how data will be recorded, archived and accessed.
Search: how to query data.
Transfer: how to move data.

21
Q

What is data visualization?

A

How the data will be formatted, displayed, and summarized in graphical form.

22
Q

What are text analytics?

A

The use of computer programs to analyze and derive meaning typically from large, unstructured text- or voice-based datasets, such as filings or social media.

23
Q

What is Natural Language Processing (NLP)?

A

Field of research at the intersection of computer science, AI and linguistics that focuses on developing computer programs to analyze and interpret human language.

24
Q

Name five programming languages.

A

Python
R
Java
C
Excel VBA

25
Q

Name three common databases.

A

SQL
SQLite
NoSQL