big data, functions, and more paper 2 stuff Flashcards
a database is stored at an office. staff use a client server database system. describe an example of a problem that could occur if no systems were in place to manage concurrent access to the database
- two users edit the same data simultaneously
- one user writes the data then the other user writes the data
- only one users update is lost
- the performance of a system is unsatisfactory. the time delay between a client sending a query to the server and the client receiving the results is long
- explain how the performance of the system might be improved, include the following factors:
- the hardware of the server
- the design of the computer network
- the database and software running on the server
- Server hardware
- replace the processor with one that had more cores
- replace processor with one that has more cache memory
- use a processor with a bigger word size
- faster clock speed
- replace HDD’s with SSDs
- use the harvard architecture - Network
- replace the network cable with a cable that has higher bandwidth
- use a star topology instead of bus
- consider using a more efficient protocol for the data across the network
- add additional wireless access points - Database and Software
- use a more efficient technique for controlling concurrent access to the database
- ensure the software is compiled rather than executed by an interpreter
- use a non relational database
- distribute the data across multiple servers
- archive data tat is no longer necessary
explain some of the challenges that face legislators in the digital age
- information can be combined/transferred in ways that were not previously possible
- technology evolves quickly so it’s difficult for the law to keep up with changes
- different laws in different countries
- some crimes may be committed by states rather than individuals
- methods such as encryption makes it harder to monitor criminal activity
- individuals may have access to large amounts of sensitive information that may be of public interest
what is a programming paradigm
a style of computer programming
what is procedural programming
- supported by languages like Python/Pascal
- have a series of instructions that tell the computer what to do with the input in order to solve the problem
what is structured programming
a type of procedural programming which uses the programming constructs of sequence , selection, iteration and recursion
- uses modular techniques to split large programs into manageable chunks
what is oop
- supported by languages like Java, Python, Delphi
- type of programming that organises software design around data or objects
what is declarative programming
- used by sql
- you write statements that describe the problem to be solved
what is functional programming
- supported by Haskell, Python, Java, C++ and more
- statements are written as a series of functions which accept data as arguments and return an output
what is a function
- a mapping from a set of inputs
what is the domain
the set of input values
what is the codomain
set of output values
the domain or codomain have to be…
part of a data type
isEqual :: int -> int -> bool
int = input time
isequal = function name
bool = what gets returned
isEqual :: x y = x == y
return True if x = y
what is a first class object
an object which may:
- appear in expressions
- be assigned to a variable
- be assigned as an argument
- be returned in a function call
differences between functional
and procedural
- in functional, the value of a variable cannot change. program is said to be stateless
- the only thing a function can do is calculate something and return a result, no side effects
what is functional composition
when we combine two functions to get a new function
partial function application
- process + one of the variables
eg (add 6) 4
what are high order functions
- any function that takes a function as an argument or returns a function as a result or does both
what is map
- a higher order function that takes a list and the function to be applied to the elements in the list
- makes a list by applying the function to each element in old list
eg map (max 3) [1,2,3,4,5] = [3,3,3,4,5]
what is filter
- another high order function which takes a predicate(to define a Booleon condition) and a list
- it returns the elements within a list that satisfies the Bool condition
eg
filter (==5) [2,5,7,2,5] = [5,5]
what is the fold function
- refuse a list to a single value using recursion, by applying an operator
eg to add all the elements in a list, function will be foldl (+) 0[2,3,4,5]
foldl foldr
foldl = fold left
eg, foldl (/) 100[2,5] = (100/2) / 5 = 5
foldr = fold right
eg foldr (/) 100[2,5] = 2/(5/100) = 40
heads and tails of a list
- heads is the first item in the list
- tails is the rest of the list
eg [1,2,3,4] - 1 is the head
- [2,3,4] is the tail
why is fold high order function
- the operations are functions
what is big data
data that is collected on such a large scale that it cannot easily be analysed
characteristics of big data
- velocity , the data is generated at high velocity
- high VOLUME of data to process, cannot all be stored on one server
- variety , big data comes in many forms, structured, unstructured, text, video, image
features of functional programming that make it easier to write
- statelessness, the programs behaviour doesnt depend on the order in which the functions are called
- immutable data structures, which cannot be invertedly altered in a function
- higheer order functions such as map/fold allow functions to be input as arguments
map and fold/reduce are efficiently parallelised, what does this mean
many processors can work simultaneously on parts of a dataset without affecting other parts
what do the circles represent in big data
entities
what do the squares represent in big data
fields associated with entities
what are facts based models
- models that captures a single piece of information, eg sensor location
how can a function be partially applied to an argument (add for example)
- the function is applied to one of the arguments
- the output of this function is a new function
- eg (add 4) 6
Describe what Big Data is, using examples
- data collected on a large scale
characteristics: - Variety of different forms of information, eg :Email messages, Videos
- high volume of data, eg large medical datasets for diagnosis
- The data is must be processed ay high velocity, eg :Thousands of items to process per
second.
challenges that could arise with big data
- Data cannot be stored on one server
- Not possible to process data quickly enough with one computer.
- Data cannot be represented in a table
- Some forms of data / unstructured data are difficult to analyse.
how challenges with big data can be overcome
- Distributed database systems distributed across multiple servers.
- Use of functional programming.
- parallelising the execution of programs.
- input split into parts then mapper executed on each part then all results combined by reducers
- Functional programming makes it easier to write distributable code
- Functional programming makes it easier to write correct code
- Use of many thousands of commodity servers.
- Use of servers with multiple CPUs / cores / drives.
- Machine learning can identify patterns / the value in the data // use of predictive
data models. - Use of languages such as XML or JSON to describe semi-structured data.