Big Data And Machine Learning Part 2 Flashcards

1
Q

Continuous data generation

A

Every financial institution generates data on a continuous basis in intervals smaller than one second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A legacy of software systems

A

Typical bank has undergone a series of M&As over the past 20 years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A complex and globally spanning entity structure

A

Every bank offers unique products, requiring different IT systems across its regional areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Typical data landscape

A

1) data sources (credit cards, mortgages, savings) systems capturing information which can be bundled to add value
2) ETL - extract, transform , load (process for moving data from sources to central storage )
3) Data warehouse - bundling and storing data from different sources , building system of records to capture meta data or time stamps
4) data usage - performing analytics, visualisation or generating reports for supervisory or management purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Introducing a distributed storage layer which is

A

Schemaless (no predefined structure )

Durable (once data is writtten it shouldn’t be lost )

Capable of handling comment failure (without human intervention)

Automatically rebalanced (to even out disk space throughout cluster)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Big data today

A

Hadoop- open source collection of software tools for processing and computing on multiple modes

Apache spark - distributed cluster computing framework

Knowledge graph - describing and storing relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Areas of concern in EBa paper

A

Access to info transformation

Cyber security risk

Market distortions caused by widespread automation

Limited data portability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Opportunities and challenges in risk management

A

Challenges :
Increasing model complexity and lack of explanatory insight
¥
How to audit or understand the model in a regulatory context
¥
Data availability and quality

Opportunities :

More granular and in depth analytic capabilities to predict :
default probabilities (credit risk)
And
Prepayment rates (lapse risk)

Money laundering and fraud detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Algorithms

A

Random Forrest

Neutral network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random Forrest

A

Collection of N randomly generated decision trees

Goal of a decision tree is to predict value of a target based on several input variables

1) Each tree is trained on a randomly drawn subset of the training set, process known as bootstrap aggregating
2) each candidate split or node has its own randomly generated subset of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages of random

Forrest

A

Handles multiple data sets, performs well with large data sets

Able to model non linear relationships

Finds interactions between variables

No assumptions about the distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Neutral network

A

Predicts based on certain input variables, what the outcome category will be

Loosely resembles the network of neurone that make up the human brain

Consists of connected modes, taking in a signal and passing on a different signal after them

Network learns from past experiences by modifying internal parameters and adapting itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adv and dis adv of neutral network

A

Adv

Able to maintain high performance without tendency to overfit

Can detect all possible interactions between predictor variables

Able to detect complex non linear relationships and to model surfaces of any shape (theoretically)

Disadvantage
Black box
Computationally intensive
Hyper parameter tuning is considered an art

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data quality and governance

A

Data quality - checks, specifications, requirements

Data governance - manage master data consistently, clean data and adhere to policies and standards

Data lineage - record used data sources, track user interaction with any data systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly