Chapter 5 Flashcards
four v’s of big data
volume, velocity, variety, and veracity
data volume
amount of data created and stored by an organization
data velocity
pace at which data is created and stored
data variety
different forms data can take
data veracity
quality or trustworthiness of data
analytics mindset is ability to
ask right questions; extract, transform, and load relevant data; apply appropriate data analytic technique; interpret and share results with stakeholders
asking right questions is the 1st step of analytics mindset: establishing objectives that are smart
specific, measurable, achievable, relevant, timely
etl process
extracting, transforming, and loading data
structured data
data that is highly organized and fits into fixed fields
unstructured data
data that has no uniform structure
semi structured data
organized in some ways but not fully organized to be inserted into a relational database
data warehouses store
structured data
data lake
collection of structured, semi structured, and unstructured data in a single location
dark data
info the organization has collected and stored that would be useful for analysis but is not analyzed and is ignored
data swamps
data repositories that arent accurately documented so the stored data cant be properly identified and analyzed
data swamps
data repositories that arent accurately documented so the stored data cant be properly identified and analyzed
flat file
text file that contains data from multiple tables or sources and merges it into a single row
delimiter
character that marks end of 1 field and beginning of tect
text qualifier
2 characters that indicate the beginning and end of a field and tell program to ignore any delimiters contained btw the characters
4 steps for transforming data
understand the data and desired outcome
standardize, structure, and clean data
validate data uality and verify data meets data requirements
document the transformation process
descriptive analytics
info that results from examination of data to understand the pasts
“what happened?”
diagnostic analytics
build on descriptive to answer “why did this happen?”
attempt to determine causal relationships
predictive analytics
answers “what might happen in the future?”
prescriptive analytics
info that provide a recommendation of what should happen
“what should be done?’