Unit 5 Comp Sci Flashcards
What is Citizen Science?
Scientific research conducted in whole or part by distributed individuals, many of whom may not be scientists, who contribute relevant data to research using their own computing devices.
What is Cleaning Data?
A process that makes the data uniform without changing its meaning (e.g., replacing all equivalent abbreviations, spellings, and capitalizations with the same word).
What is Correlation?
A relationship between two pieces of data, typically referring to the amount that one.
What is Crowdsourcing?
The practice of obtaining input or information from a large number of people via the Internet.
What is Information?
The collection of facts and patterns extracted from data.
What is Data bias?
Data that does not accurately reflect the full population or phenomenon being studied.
What is Data filtering?
Choosing a smaller subset of a data set to use for analysis, for example by eliminating / keeping only certain rows in a table.
What is a Bar Chart?
Graph of bars that shows the number of times each value in a column of data appears.
What is a Histogram?
Similar to a bar chart, but all numbers within a range (bucket) are grouped together.
What is a Crosstab Chart?
Counts the number of times combinations of values appear (similar to a frequency table).
What is a Scatterplot?
Graph that shows the relationship between 2 sets of data.
What is Open Data?
Publicly available data shared by governments, organizations, and others so that anyone can analyze it.
What is Big Data?
Collection of huge amounts of data so we can learn from it often requiring cloud computing or parallel processing systems.
What is Metadata?
Data about data.
What is the primary purpose of cleaning data in the Data Analysis Process?
To make the data uniform without changing its meaning.
Which visualization would be most appropriate for examining the relationship between students’ study hours and their test scores?
Scatterplot.
What is ‘data bias’ as defined in the study guide?
Data that does not accurately reflect the full population or phenomenon being studied.
Which of the following best describes the relationship between data and information?
Information is the collection of facts and patterns extracted from data.
When working with Big Data, which computing approach is typically necessary?
Cloud computing or parallel processing systems.
What is the key difference between a bar chart and a histogram?
Bar charts show discrete values while histograms group numbers within ranges (buckets).
Which of the following is an example of metadata?
The timestamp of when a photo was taken.
What computing practice involves obtaining input or information from a large number of people via the Internet?
Crowdsourcing.
According to the AP standards covered in Unit 5, what can programs be used for in relation to data?
To process data, allowing users to discover information and create new knowledge.
Which type of data analysis focuses on examining how frequently each value appears in a single column?
One-column analysis.