Big Data Flashcards
What is Big Data?
Large data sets that are too difficult to store on one server and too varied and complex to easily analyse
What are the three qualities of Big Data?
Volume, velocity and variety
What does volume mean in Big Data?
The gathering/storing of large amounts of data
What does velocity mean in Big Data?
Data streams are collected in a near-to-real-time fashion making processing the data challenging.
What does variety mean in Big Data?
Data comes in a wide variety of formats e.g. text, video, audio, image and unstructured or structured
What is structured data?
- Data that can be entered into a relational database in a row and column format
- Data that can be analysed and queried
What is unstructured data?
Data that is:
- Difficult to organise
- Not appropriate to store in a database in a row and column format
- Comes in a vast range of formats so is difficult to perform data analysis on.
When is distributed programming used?
When data is too big to be processed on a single machine, the processing is distributed across several machines.
What is a computer cluster?
Used in distributed programming to share the big data processing task.
Big Data –> Computer Cluster (Master Computer and computers) –> Client machines
What does the master computer do?
It uses specialist software to control each networked computer as they perform their sub-tasks