Unit 9 Flashcards

Question 1

Q

Two distinctions with data

Answer

A

What does the data show - fact
Why might this be the case - opinion

Question 2

Q

correlation

Answer

A

similarities, patterns

Question 3

Q

causation

Answer

A

this thing caused that thing

Question 4

Q

metadata

Answer

A

Data about other data
Can Help us uncover the why
questions. (sometimes auto gathered)

Question 5

Q

Metadata are data about data:

Answer

A

It can be changed without impacting the primary data
Used for finding, organizing, and managing information
Increases effective use of data by providing extra information
Allows data to be structured and organized

Question 6

Q

visualizations

Answer

A

Look at lots of data at once
See patterns that are “invisible” if you just look at the table

Question 7

Q

data analysis process

Answer

A

collect or choose data
clean and/or filter
visualize and find patterns
generate new information

Question 8

Q

bar chart

Answer

A

Count how many times each value in the column appears and make a bar at that height.
What value(s) are most common in this column?
What value(s) are least common in this column?
What is the unique list of values in this column?

Question 9

Q

histogram

Answer

A

Similar to a bar chart, but first all numbers in a range or “bucket” are grouped together. For example, the chart below has a bucket size of 20 so the numbers 41, 48, and 53 would all be placed in the same bucket between 40 and 60.

Histograms can only be created with numeric data but can be useful when a normal bar chart may be difficult to read.
What range of value(s) are most common in this column?
What range value(s) are least common in this column?
What ranges of values do or do not appear in this column?

Question 10

Q

visualization takeaways

Answer

A

Programs (like the Data Visualizer) can help process data so we can understand it and learn.

Charts and other visualizations can help both find and communicate what we’ve learned from data

Bar charts and histograms are two common chart types for exploring one column of data in a table.

Question 11

Q

when does data need to be cleaned?

Answer

A

Data is incomplete
Data is invalid
Multiple tables are combined into one

Question 12

Q

What leads to “messy” data?

Answer

A

Users enter in different types of data (“two”, 2)
Users use different abbreviations to represent the same information (“February”, “Feb”, “Febr”)
Data may have different spellings (“color”, “colour”) or inconsistent capitalization (“spring”, “Spring”)

Question 13

Q

cleaning data

Answer

A

Look through the data manually. Find and fix messy data.
Use a program to find and fix messy data.

Question 14

Q

filtering data

Answer

A

Filtering data allows the user to look at a subset of the data.
In Unit 5, we filtered data programmatically using traversals to gain insight into knowledge from data.
Software programs with built in tools (like the Data Visualizer) can also be used to filter data.

Question 15

Q

data stored in text files

Answer

A

old school PC games
.csv Comma Separated Values
date, level, score
01/11/2019, 9. 73
Common File Format
Require Spreadsheet Programs or Specific Programs to Iterate Through

Easy to mess up a file
No Standard ways to create file

Question 16

Q

data storage through spreadsheets

Answer

Study These Flashcards

A

Designed for people to analyze data not for programs

Question 17

Q

data storage through databases

Answer

Study These Flashcards

A

Preferred method of storing data that will be used in programs

Programers use SQL (Structured Query Language) to interact with databases.

To be a Data Scientist You often need to learn programming languages like Python/R to analyze and visualize data.

You also need to learn SQL to be able to interact with databases

Question 18

Q

scatter plot

Answer

Study These Flashcards

A

Shows combinations of values from two columns

Useful for:
Seeing patterns and trends between two values
Numeric data with lots of different values

Not useful:
Lots of repeated values

Question 19

Q

crosstab chart

Answer

Study These Flashcards

A

counts how many times combinations of values appear. Arrows show where that row in the data table would be counted in the chart
Counts how often pairs of values in two columns appear.

Useful for:
Finding the most / least common combinations of values in two columns
Finding patterns across two columns
Exploring two columns when one or both are strings.

Not useful:
If either column has too many values (the chart would be enormous)

Question 20

Q

when to use what graph

Answer

Study These Flashcards

A

study slide 17 in 9.4

Question 21

Q

big data

Answer

Study These Flashcards

A

“Collect huge amounts of data so we can learn even more from it”
The size of the datasets we analyzed impacts how much information can be extracted
As a result, in business, science, and many other contexts people are working with increasingly big data sets
When data gets too big it can no longer be processed on one computer. Cloud computing or parallel systems are sometimes used to help process all that information.
In general scalability of your system is important to consider when working with big data. You want your system to be able to work even as you’re using more and more data.

Question 22

Q

citizen science and crowdsourcing

Answer

Study These Flashcards

A

“collecting data from others so you can analyze it”
Crowdsourcing is the practice of obtaining input or information from a large number of people via the Internet.
Citizen science is research where some of the data collection is done by members of the public using own computing devices which leads to solving scientific problems
Crowdsourcing offers new models for collaboration, such as connecting businesses or social causes with funding
Both are examples of how human capabilities can be enhanced by collaboration via computing

Question 23

Q

open data

Answer

Study These Flashcards

A

“sharing data with others so they can can analyze it”
Open data is publicly available data shared by governments, organizations, and others
Making data open help spread useful knowledge or creates opportunities for others to use it to solve problems

Unit 9 Flashcards

(23 cards)