4.11/12 Big data and functional programming Flashcards
What is a function?
A mapping of values from a domain to a set of values from a co-domain. Not all of the co-domains members needs to be outputs.
What is the domain?
The set from which the function’s input values are chosen
What is the co-domain?
The set from which the function’s output values are chosen. Not all of the co-domains members needs to be outputs.
What is Big Data?
A catch-all term for data that won’t fit the usual containers, cannot be stored/processed on a single server, and that must be processed at very high speeds.
What are the three Vs of big data?
- (very large) Volume (of data)
- Velocity (at which data is generated)
- Variety (of data types in the data)
What does Volume mean?
Why is it a problem?
How is it solved?
what it means
- Data is too big to be stored/processed on a single server
why it’s a problem
- relational databases don’t scale well across multiple machines
- and the processing associated with the data must be split across multiple machines
how it’s solved
- Functional programming is a solution
What does Velocity mean?
The data is generated and/or processed at very high speed - need to respond in seconds or milliseconds
What does Variety mean?
- The data is in many forms such as structured, unstructured, text, multimedia.
- The most difficult aspect of Big Data involves its lack of structure.
What is the most difficult aspect of Big Data? Why?
Its lack of structure (under Variety). This poses challenges because:
- Analysing the data is made significantly more difficult
- Relational databases are not appropriate because they require data to fit into a row-and-column format
What technique is used to discern patterns in data and to extract useful information?
Machine learning
What is the advantage of functional programming for big data?
Its features make it easier to write
- Correct code
- Code that can be distributed to run across more than one server
4 features of functional programming that make it suitable for Big Data
- Immutable data structures
- Statelessness
- Higher-order functions
- Programs do not specify order of execution (meaning they work well on parallel processing systems)
What about immutable data structures makes them suitable for Big Data?
- Immutable data structures cannot be changed during program execution
- Same input always gives same output
- Makes parallel processing extremely easy
What about statelessness makes it suitable for Big Data?
- Statelessness means there are no side-effects of computations
- so code is easy to write correctly, and it is easy to understand and predict how the program will behave
What about higher order functions makes them suitable for Big Data?
- Higher-order functions take a function as an argument, return a function as a result, or both.
- Higher-order functions can be easily parallelised
What is a fact in a fact-based model?
- Each fact within a fact based model captures a single piece of information.
- Each fact is immutable and timestamped
What is a graph schema?
- Graph schemas can be used to capture the structure of a dataset. They can be easily extended, without impacting existing facts (because the facts are immutable)
- Nodes are used to represents the core entities in the data set
- Edges are used to represent the relationships between the nodes
- Properties are used to capture information about the nodes
What is a first class object?
First class objects are objects which may:
- R - be returned in function calls
- A - be assigned as arguments
- V - be assigned to a variable
- E - appear in expressions
Functions are first-class objects in functional programming languages
What does function application mean?
Applying a function to its arguments
What does partial function application mean?
Parțial function application means only applying a function to some of its arguments. The result is a function.
What is functional composition?
- Combining two functions to get a new function
- g*f means apply f first, then g
Describe in words what map
does
Applies a given function to each element of a list, returning a list of results
Describe in words what filter
does
Processes a list to produce a new list containing exactly those elements that match a given condition
Describe in words what reduce
or fold
does
Reduces a list of values to a single value by repeatedly applying a combining function to the list values
How is machine learning used in Big Data?
Machine learning is used to discern patterns in data and to extract useful information
What is meant by “parallel processing” in Big Data?
When more than one processor can work on different parts of a large data set at the same time without changing any other part