Unit 9 Flashcards
Two distinctions with data
What does the data show - fact
Why might this be the case - opinion
correlation
similarities, patterns
causation
this thing caused that thing
metadata
Data about other data
Can Help us uncover the why
questions. (sometimes auto gathered)
Metadata are data about data:
It can be changed without impacting the primary data
Used for finding, organizing, and managing information
Increases effective use of data by providing extra information
Allows data to be structured and organized
visualizations
Look at lots of data at once
See patterns that are “invisible” if you just look at the table
data analysis process
- collect or choose data
- clean and/or filter
- visualize and find patterns
- generate new information
bar chart
Count how many times each value in the column appears and make a bar at that height.
What value(s) are most common in this column?
What value(s) are least common in this column?
What is the unique list of values in this column?
histogram
Similar to a bar chart, but first all numbers in a range or “bucket” are grouped together. For example, the chart below has a bucket size of 20 so the numbers 41, 48, and 53 would all be placed in the same bucket between 40 and 60.
Histograms can only be created with numeric data but can be useful when a normal bar chart may be difficult to read.
What range of value(s) are most common in this column?
What range value(s) are least common in this column?
What ranges of values do or do not appear in this column?
visualization takeaways
Programs (like the Data Visualizer) can help process data so we can understand it and learn.
Charts and other visualizations can help both find and communicate what we’ve learned from data
Bar charts and histograms are two common chart types for exploring one column of data in a table.
when does data need to be cleaned?
Data is incomplete
Data is invalid
Multiple tables are combined into one
What leads to “messy” data?
Users enter in different types of data (“two”, 2)
Users use different abbreviations to represent the same information (“February”, “Feb”, “Febr”)
Data may have different spellings (“color”, “colour”) or inconsistent capitalization (“spring”, “Spring”)
cleaning data
Look through the data manually. Find and fix messy data.
Use a program to find and fix messy data.
filtering data
Filtering data allows the user to look at a subset of the data.
In Unit 5, we filtered data programmatically using traversals to gain insight into knowledge from data.
Software programs with built in tools (like the Data Visualizer) can also be used to filter data.
data stored in text files
old school PC games
.csv Comma Separated Values
date, level, score
01/11/2019, 9. 73
Common File Format
Require Spreadsheet Programs or Specific Programs to Iterate Through
Easy to mess up a file
No Standard ways to create file