Lecture 28 Big Data Flashcards
Define big data
Big data is a very generic term to indicate datasets that are so large or complex that traditional data processing applications (e.g. desktop computer, small server or statistical tools normally used in a small scale) are inadequate for mining it.
What are the 3V of big data
- High volume
- High velocity
- High variety
Recently which 3 more V have been added?
- Highly variable
- High veracity (variation in quality)
- High value (which comes with complexity)
What is big data a combination of what sort of data
- unstructured
- semi-structured
- structured
Why is data collected?
It is collected so that it can be mined and used to build predictive models and other advanced analytics applications.
What is structured data?
Able to catalogue
What is unstructured data?
Behavioural or ambiguous data
Does big data associate to any specific volume of data?
No it can be deployed in terabytes (TB), petabytes (PB) and even exabytes (EB) of data, captured over time.
Whys is big data important to companies?
- use it to improve operations
- provide better customer service
- create personalized marketing campaigns
- faster decisions
- more-informed decisions
- they can become more customer-centric
Examples of big data?
- Business transaction systems
- Customer databases
- Medical records
- Internet clickstream logs
- Mobile applications
- Social networks
Examples of big data in SCIENCE?
- Scientific research repositories
- Machine-generated data
- Clinical records e.g. life-style, not just medical records
How is data left?
The data may be left in its raw form in big servers or preprocessed using data mining tools or data preparation software to be analysed e.g. Google/Amazon
How is Big Data used in Life Science?
- allows identification of risk factors in disease
- helps diagnose illnesses and conditions in individual patients
What is Big Data derived from?
Big data is derived from genomics, transcriptomes and epigenomics (OMICS) data of many individuals. It is also derived from electronic health records, social media, the web and other sources provides healthcare organisations and government agencies with up-to-the-minute information on infectious disease threats or outbreaks.
How is big data being used in the COVID-19 pandemic?
AI and big data and playing a key role in modelling as well as making predictions for the effect of the measures enforces as well as the science of the virus itself.