Lecture 1 introduction Flashcards

1
Q

What percentage of data produced is currently analyzed?

A

Only 0.5% of data is analyzed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What unit measures the total volume of data produced globally?

A

Zettabytes (ZB) (1 ZB ≈ 1 trillion gigabytes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three Vs of big data?

A

Volume (size), Variety (heterogeneous sources), Velocity (speed of creation/analysis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does Velocity in big data refer to?

A

The speed at which data is created/analyzed (e.g., ‘data-in-motion’ vs. ‘data-at-rest’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the fourth V in big data?

A

Veracity (data reliability/credibility).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did the 2018 Twitter study find about falsehoods?

A

Falsehoods are 70% more likely to be retweeted than accurate news.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is TAQ data in finance?

A

Trades and Quotes data: millisecond-level records of all trades/quotes on exchanges (e.g., NYSE, Nasdaq). Daily TAQ can exceed 100 million rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define machine learning (ML).

A

Extracting knowledge from data; improves performance on tasks through experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are supervised ML tasks?

A

Regression (predicting values) and classification (predicting classes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name 3 applications of ML in finance.

A
  • Fraud prevention
  • Algorithmic trading
  • Loan underwriting
  • Risk management
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is unsupervised ML?

A

Finds hidden patterns in unlabeled data via clustering or dimensionality reduction (e.g., PCA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In reinforcement learning (RL), what is an agent?

A

The entity that performs actions in an environment to maximize rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name 3 RL applications in finance.

A
  • Algorithmic trading
  • Derivatives hedging
  • Portfolio allocation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are common supervised ML algorithms?

A
  • Linear regression
  • Decision trees
  • Neural networks
  • SVMs
  • Random forests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does Volume in big data evolve?

A

The threshold for ‘big’ data size is revised upward every year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the key characteristics of supervised learning?

A
  • Uses labeled data (input-output pairs)
  • Direct feedback (errors are corrected during training)
  • Predict outcomes (regression) or classify data (classification)
  • Examples: Linear regression, decision trees, neural networks
17
Q

What defines unsupervised learning?

A
  • Works with unlabeled data
  • No direct feedback (no ‘right answer’ provided)
  • Discover hidden patterns or groupings
  • Tasks: Clustering (e.g., customer segmentation) and dimensionality reduction (e.g., PCA)
18
Q

How does reinforcement learning (RL) work?

A
  • Learns via trial-and-error interactions with an environment
  • Uses rewards/punishments (delayed feedback)
  • Learn optimal policies to maximize long-term rewards
  • Components: Agent, actions, environment, state, reward
  • Finance use cases: Algorithmic trading, portfolio optimization
19
Q

Dimensionality reduction ?

A

process of reducing the number of
features, or variables, in a dataset while preserving information and
overall model performance.

20
Q

Clustering

A

allows us to discover hidden structures in data. The goal of
clustering is to find a natural grouping in data so that items in the same
cluster are more similar to each other than to those from different
clusters.

21
Q

basic concept of RL

A

Agent: is the entity that performs actions.
2 Action: is what an agent can do in each state.
3 Environment: is the world in which the agent resides.
4 State: describes the current situation of the agent.
5 Reward: The immediate return sent by the environment to
evaluate the last action by the agent. A reward can be positive
(reward) or negative (punishment).

23
Q

What is Artificial Intelligence (AI)?

A

Theory and development of computer systems capable of performing tasks requiring human intelligence

Examples include visual perception, speech recognition, decision-making, and language translation.

24
Q

Define Machine Learning (ML).

A

A computer program learns from experience (E) related to a class of tasks (T) and performance measure (P)

Performance on tasks T (measured by P) improves with increased experience E.