LM 11: Introduction to Big Data Techniques Flashcards
What are the three Vs that Big Data is characterised by and when is a Fourth V relevant?
Volume, Velocity and variety.
Fourth V is relevant when Big Data is used for inference or prediction: Veracity
What are the main sources of alternative data?
Individuals, business processes and sensors
What is the definition of AI?
Computer systems capable of performing tasks that traditionally required human intelligence at levels comparable to those of human beings
Machine Learning is defined as?
Extracts knowledge from large amounts of data by learning from known examples and then generating structure or predictions.
What are the 3 main types of ML?
Supervised learning, unsupervised learning and deep learning
What is Natural Language Processing (NLP)?
Application of text analytics that uses insight into the structure of human language to analyse and interpret text and voice based data
What are traditional sources of big data?
Stock exchanges; companies and governments
What are non-traditional data types?
Electronic devised, social media, sensor networks and company exhaust
What is Veracity of data?
Relates to the credibility and reliability of different data sources
What is structured data?
Organised into tables and commonly stored in a database where each field represents the same type of info
What is unstructured data?
Disparate, unorganised data that cannot be organised in tubular form
What is semi-structured data?
Has elements of both structured and unstructured data
What is an example of unstructured data?
Voice recordings
What is the Internet of Things IoT?
Data generated by smart devices about a wide range of information. E.g. Smart building
What is Overfitting?
When data discovers false relationships that are unsubstantiated