4.11/12 Big data and functional programming Flashcards
What is a function?
A mapping of values from a domain to a set of values from a co-domain. Not all of the co-domains members needs to be outputs.
What is the domain?
The set from which the function’s input values are chosen
What is the co-domain?
The set from which the function’s output values are chosen. Not all of the co-domains members needs to be outputs.
What is Big Data?
A catch-all term for data that won’t fit the usual containers, cannot be stored/processed on a single server, and that must be processed at very high speeds.
What are the three Vs of big data?
- (very large) Volume (of data)
- Velocity (at which data is generated)
- Variety (of data types in the data)
What does Volume mean?
Why is it a problem?
How is it solved?
what it means
- Data is too big to be stored/processed on a single server
why it’s a problem
- relational databases don’t scale well across multiple machines
- and the processing associated with the data must be split across multiple machines
how it’s solved
- Functional programming is a solution
What does Velocity mean?
The data is generated and/or processed at very high speed - need to respond in seconds or milliseconds
What does Variety mean?
- The data is in many forms such as structured, unstructured, text, multimedia.
- The most difficult aspect of Big Data involves its lack of structure.
What is the most difficult aspect of Big Data? Why?
Its lack of structure (under Variety). This poses challenges because:
- Analysing the data is made significantly more difficult
- Relational databases are not appropriate because they require data to fit into a row-and-column format
What technique is used to discern patterns in data and to extract useful information?
Machine learning
What is the advantage of functional programming for big data?
Its features make it easier to write
- Correct code
- Code that can be distributed to run across more than one server
4 features of functional programming that make it suitable for Big Data
- Immutable data structures
- Statelessness
- Higher-order functions
- Programs do not specify order of execution (meaning they work well on parallel processing systems)
What about immutable data structures makes them suitable for Big Data?
- Immutable data structures cannot be changed during program execution
- Same input always gives same output
- Makes parallel processing extremely easy
What about statelessness makes it suitable for Big Data?
- Statelessness means there are no side-effects of computations
- so code is easy to write correctly, and it is easy to understand and predict how the program will behave
What about higher order functions makes them suitable for Big Data?
- Higher-order functions take a function as an argument, return a function as a result, or both.
- Higher-order functions can be easily parallelised