GA - The Value of Data Flashcards
What is the role of a data scientist when dealing with stored data?
> Data on its own isn’t interesting or important. A data analyst must turn it into something meaningful.
> Data analysts do this by uncovering insights concealed within data.
> They connect multiple datasets together to show how data interacts, and they generally make the world more understandable.
Why are companies storing more data?
Companies use it to answer strategic questions, make informed decisions, and drive growth
What are the four V’s of big data?
- Volume - Scale of data
- Variety - Different forms of data
- Veracity - refers to the the trustworthiness of the data
- Velocity - the frequency of incoming data that needs to be processed
What are the six parts of the Analytics Workflow?
This is not a begin at the top and end at the bottom workflow, you will revisit steps along the way as needed
- Identify - you must understand the problem that you’re trying to answer
- Obtain - you must find or collect the right data to help answer your question
- Understand - You need to make sure you can correctly interpret the results and trust the data
- Prepare - Make sure the data doesn’t contain incorrect or missing values
- Analyze - uncover the answers to your questions
- Present - determine the best way to share your results with others
What is the most tangible goal when Identifying the problem? (Identifying the problem)
The most tangible goal is to transform a “business question: into a “data question.”
What are the most common places for data storage? (Obtaining the Data)
> Flat Files (e.g., comma-delimited text files, commonly called CSV)
> Spreadsheets
> Databases
What tools know how interact with data stored in the formats on the previous slide? (Obtaining the Data)
> Excel (or another spreadsheet software)
> Structured Query Language (SQL)
What are some steps we should take when understanding what data we have to work with? (Understand the Data)
> Define each column of data
> Think about potential usefulness
> Think about potential shortcomings
What are some items we are looking for when preparing data? (Prepare the Data)
> Incorrect values
Missing Data
Duplicate line items
Explore some of the ways that a dataset might contain bad data and some of the solutions to those problems. https://github.com/Quartz/bad-data-guide
What is aggregated data? (Analyze the data)
Refers to representing many data points with a single one.
> The most common are the sum, count, avg (mean), minimum, and maximum
> We might summarize data in other ways such as ranking values, or showing the range of values
What is raw data?
refers to any data object that hasn’t undergone thorough processing, either manually or through automated computer software