class 4 Flashcards
what is big data?
extremely large and complex data collections that traditional data management software, hardware and analysis processes are incapable of dealing with them
what are the 3 characteristics of big data?
volume (amount of data)
velocity (speed at which we collect data)
variety (in what format we receive the data)
what are the 2 formats of big data?
structured
unstructured
what is and example of structured big data?
corporate databases containing customer, product and inventory data in tables
what is an example of unstructured big data?
word-processing documents, social media, email, photos, surveillance video
what are the 8 sources of big data?
documents (word, email, powerpoint)
data from business apps
social media (twitter, Facebook, likedin, Pinterest)
sensor data (process control devices)
media (images, audio, video, live data feeds, podcasts)
machine log data (business process logs)
public data (local, state, federal government websites)
archives (historical records)
what are the five Vs of big data?
volume (how much data is generated)
velocity (how fast data is generated)
variety (the different forms of data)
value
veracity
explain the V of big data “value”?
having access to good quality data
explain the V of big data “veracity”?
how often there are discrepancies found in the data
what is the importance of big data?
data can be fetched from any source and analyzed to solve problems that can lead to cost reduction, time reduction, new product development, smart decision making
the combination of big data with high-powered analytics can have great impact on business strategy
what are the 4 business strategy that big data can have great impact on?
finding the root cause of failures in real time operations
generating coupons at the point of sale using customers habit of buying goods
recalculating entire risk portfolio
detecting fraudulent behaviour
what are 3 examples of people who use big data?
retail organizations (monitor social networks)
hospitals (analyze medical data and patient records to get their medical history)
advertising and marketing agencies (track comments on social media to understand consumers responsiveness to ads)
what are the 5 challenges of big data?
how to choose what subset of the data to store
where and how to store the data
how to find the nuggets of data that are relevant
how to derive value from the relevant data
how to identify which data needs to be protected
what are 3 technologies are used to process big data?
data warehouse
data marts
data lakes
what are data warehouses?
a large databases that collects business information from any sources in the enterprise in support of management decision making