4.11/12 Big data and functional programming Flashcards by Tom FORSTER

What is a function?

A mapping of values from a domain to a set of values from a co-domain. Not all of the co-domains members needs to be outputs.

How well did you know this?

Not at all

Perfectly

What is the domain?

The set from which the function’s input values are chosen

How well did you know this?

Not at all

Perfectly

What is the co-domain?

The set from which the function’s output values are chosen. Not all of the co-domains members needs to be outputs.

How well did you know this?

Not at all

Perfectly

What is Big Data?

A catch-all term for data that won’t fit the usual containers, cannot be stored/processed on a single server, and that must be processed at very high speeds.

How well did you know this?

Not at all

Perfectly

What are the three Vs of big data?

(very large) Volume (of data)
Velocity (at which data is generated)
Variety (of data types in the data)

How well did you know this?

Not at all

Perfectly

What does Volume mean?
Why is it a problem?
How is it solved?

what it means
- Data is too big to be stored/processed on a single server

why it’s a problem
- relational databases don’t scale well across multiple machines
- and the processing associated with the data must be split across multiple machines

how it’s solved
- Functional programming is a solution

How well did you know this?

Not at all

Perfectly

What does Velocity mean?

The data is generated and/or processed at very high speed - need to respond in seconds or milliseconds

How well did you know this?

Not at all

Perfectly

What does Variety mean?

The data is in many forms such as structured, unstructured, text, multimedia.
The most difficult aspect of Big Data involves its lack of structure.

How well did you know this?

Not at all

Perfectly

What is the most difficult aspect of Big Data? Why?

Its lack of structure (under Variety). This poses challenges because:

Analysing the data is made significantly more difficult
Relational databases are not appropriate because they require data to fit into a row-and-column format

How well did you know this?

Not at all

Perfectly

What technique is used to discern patterns in data and to extract useful information?

Machine learning

How well did you know this?

Not at all

Perfectly

What is the advantage of functional programming for big data?

Its features make it easier to write
- Correct code
- Code that can be distributed to run across more than one server

How well did you know this?

Not at all

Perfectly

4 features of functional programming that make it suitable for Big Data

Immutable data structures
Statelessness
Higher-order functions
Programs do not specify order of execution (meaning they work well on parallel processing systems)

How well did you know this?

Not at all

Perfectly

What about immutable data structures makes them suitable for Big Data?

Immutable data structures cannot be changed during program execution
Same input always gives same output
Makes parallel processing extremely easy

How well did you know this?

Not at all

Perfectly

What about statelessness makes it suitable for Big Data?

Statelessness means there are no side-effects of computations
so code is easy to write correctly, and it is easy to understand and predict how the program will behave

How well did you know this?

Not at all

Perfectly

What about higher order functions makes them suitable for Big Data?

Higher-order functions take a function as an argument, return a function as a result, or both.
Higher-order functions can be easily parallelised

How well did you know this?

Not at all

Perfectly

What is a fact in a fact-based model?

Study These Flashcards

Each fact within a fact based model captures a single piece of information.
Each fact is immutable and timestamped

What is a graph schema?

Study These Flashcards

Graph schemas can be used to capture the structure of a dataset. They can be easily extended, without impacting existing facts (because the facts are immutable)
Nodes are used to represents the core entities in the data set
Edges are used to represent the relationships between the nodes
Properties are used to capture information about the nodes

What is a first class object?

Study These Flashcards

First class objects are objects which may:
- R - be returned in function calls
- A - be assigned as arguments
- V - be assigned to a variable
- E - appear in expressions

Functions are first-class objects in functional programming languages

What does function application mean?

Study These Flashcards

Applying a function to its arguments

What does partial function application mean?

Study These Flashcards

Parțial function application means only applying a function to some of its arguments. The result is a function.

What is functional composition?

Study These Flashcards

Combining two functions to get a new function
g*f means apply f first, then g

Describe in words what map does

Study These Flashcards

Applies a given function to each element of a list, returning a list of results

Describe in words what filter does

Study These Flashcards

Processes a list to produce a new list containing exactly those elements that match a given condition

Describe in words what reduce or fold does

Study These Flashcards

Reduces a list of values to a single value by repeatedly applying a combining function to the list values

How is machine learning used in Big Data?

Machine learning is used to discern patterns in data and to extract useful information

What is meant by "parallel processing" in Big Data?

When more than one processor can work on different parts of a large data set at the same time without changing any other part

4.11/12 Big data and functional programming Flashcards

(26 cards)