Lecture 1 introduction Flashcards

Question 1

Q

What percentage of data produced is currently analyzed?

Answer

A

Only 0.5% of data is analyzed.

Question 2

Q

What unit measures the total volume of data produced globally?

Answer

A

Zettabytes (ZB) (1 ZB ≈ 1 trillion gigabytes).

Question 3

Q

What are the three Vs of big data?

Answer

A

Volume (size), Variety (heterogeneous sources), Velocity (speed of creation/analysis).

Question 4

Q

What does Velocity in big data refer to?

Answer

A

The speed at which data is created/analyzed (e.g., ‘data-in-motion’ vs. ‘data-at-rest’).

Question 5

Q

What is the fourth V in big data?

Answer

A

Veracity (data reliability/credibility).

Question 6

Q

What did the 2018 Twitter study find about falsehoods?

Answer

A

Falsehoods are 70% more likely to be retweeted than accurate news.

Question 7

Q

What is TAQ data in finance?

Answer

A

Trades and Quotes data: millisecond-level records of all trades/quotes on exchanges (e.g., NYSE, Nasdaq). Daily TAQ can exceed 100 million rows.

Question 8

Q

Define machine learning (ML).

Answer

A

Extracting knowledge from data; improves performance on tasks through experience.

Question 9

Q

What are supervised ML tasks?

Answer

A

Regression (predicting values) and classification (predicting classes).

Question 10

Q

Name 3 applications of ML in finance.

Answer

A

Fraud prevention
Algorithmic trading
Loan underwriting
Risk management

Question 11

Q

What is unsupervised ML?

Answer

A

Finds hidden patterns in unlabeled data via clustering or dimensionality reduction (e.g., PCA).

Question 12

Q

In reinforcement learning (RL), what is an agent?

Answer

A

The entity that performs actions in an environment to maximize rewards.

Question 13

Q

Name 3 RL applications in finance.

Answer

A

Algorithmic trading
Derivatives hedging
Portfolio allocation

Question 14

Q

What are common supervised ML algorithms?

Answer

A

Linear regression
Decision trees
Neural networks
SVMs
Random forests

Question 15

Q

How does Volume in big data evolve?

Answer

A

The threshold for ‘big’ data size is revised upward every year.

Question 16

Q

What are the key characteristics of supervised learning?

Answer

A

Uses labeled data (input-output pairs)
Direct feedback (errors are corrected during training)
Predict outcomes (regression) or classify data (classification)
Examples: Linear regression, decision trees, neural networks

Question 17

Q

What defines unsupervised learning?

Answer

A

Works with unlabeled data
No direct feedback (no ‘right answer’ provided)
Discover hidden patterns or groupings
Tasks: Clustering (e.g., customer segmentation) and dimensionality reduction (e.g., PCA)

Question 18

Q

How does reinforcement learning (RL) work?

Answer

A

Learns via trial-and-error interactions with an environment
Uses rewards/punishments (delayed feedback)
Learn optimal policies to maximize long-term rewards
Components: Agent, actions, environment, state, reward
Finance use cases: Algorithmic trading, portfolio optimization

Question 19

Q

Dimensionality reduction ?

Answer

A

process of reducing the number of
features, or variables, in a dataset while preserving information and
overall model performance.

Question 20

Q

Clustering

Answer

A

allows us to discover hidden structures in data. The goal of
clustering is to find a natural grouping in data so that items in the same
cluster are more similar to each other than to those from different
clusters.

Question 21

Q

basic concept of RL

Answer

A

Agent: is the entity that performs actions.
2 Action: is what an agent can do in each state.
3 Environment: is the world in which the agent resides.
4 State: describes the current situation of the agent.
5 Reward: The immediate return sent by the environment to
evaluate the last action by the agent. A reward can be positive
(reward) or negative (punishment).

Question 22

Q

Question 23

Q

What is Artificial Intelligence (AI)?

Answer

A

Theory and development of computer systems capable of performing tasks requiring human intelligence

Examples include visual perception, speech recognition, decision-making, and language translation.

Question 24

Q

Define Machine Learning (ML).

Answer

A

A computer program learns from experience (E) related to a class of tasks (T) and performance measure (P)

Performance on tasks T (measured by P) improves with increased experience E.