Unit 5: Data Flashcards

1
Q

What is data?
I. Computer readable information
II. Information collected about the physical world
III. Programs that process images
IV. Graphs and charts
Source: CodeHS Data
A) II, IV
B) I, II
C) I, II, IV
D) I, II, III, IV

A

B
Notes: Right! Data is just information that is collected. Digital data must be in a computer readable form, like digital images.
Answer: B) I, II
Explanation:
Data is information that can be collected, stored, and analyzed.
I. Computer-readable information: True, as digital data must be encoded in a form that computers can process.
II. Information collected about the physical world: True, as data often comes from observations or measurements of real-world phenomena.
III. Programs that process images: False, these are not data but tools to analyze data.
IV. Graphs and charts: False, these are visual representations of data, not the data itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following statements is NOT a benefit of using computers to process data?
Source: CodeHS Data
A) People can use computers to find patterns in data and make predictions.
B) Computers help people visualize data so that it is easy to extract useful information.
C) Websites can spy on people and gather large amounts of personal data without the user knowing.
D) Computers are able to easily process, manipulate, and display large amounts of data in a short amount of time.

A

C
Websites can spy on people and gather large amounts of personal data without the user knowing. Notes: This is not exactly a benefit as this information can be used for evil purposes.
Answer: C) Websites can spy on people and gather large amounts of personal data without the user knowing.
Explanation:
While this is a factual statement, it is not a benefit of using computers for data processing.
The other options highlight positive uses of data processing, such as identifying patterns, creating visualizations, and handling large datasets efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following statements is an example of computer readable data?
Source: CodeHS Data
A) A handwritten note
B) Brain waves
C) A physical photograph
D) A digital spreadsheet filled with measurements about the air quality of different major cities

A

D
Answer: D) A digital spreadsheet filled with measurements about the air quality of different major cities.
Explanation:
A handwritten note and a physical photograph are analog forms of data and not computer-readable.
Brain waves require sensors and specialized software for conversion into computer-readable format.
A digital spreadsheet is already formatted for computer processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following statements is true about data visualizations?
Source: CodeHS Data
A) Visualizing data has only been possible since computers have become widespread.
B) Visualizations take many forms, from tables to charts to images.
C) There is always one exact visualization that should be used to show a particular aspect of a dataset.
D) The only way to extract information from data is by using a visualization.

A

B
Answer: B) Visualizations take many forms, from tables to charts to images.
Explanation:
Data visualizations are versatile and can represent information in various formats to highlight patterns and relationships.
The other options incorrectly limit the utility or types of visualizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suppose you want to make a visualization that shows how many students bought certain quantities of candy from the vending machine during the month of September. For example, this visualization should reveal the frequency of students who bought 3 candy bars versus the frequency of students bought 10 candy bars. Of the choices below, which chart would best convey this information to the person looking at the graph?
Source: CodeHS Data
A) Pie chart
B) Histogram
C) A map where the colors represent the number of candy bars bought
D) Line chart

A

B
Answer: B) Histogram
Explanation:
A histogram is specifically designed to display the distribution and frequency of numerical data, making it the ideal choice for this scenario.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

There are several different kinds of charts that we commonly use to visualize data. In which of the following would one of the charts described help to accomplish the accompanying task described?
Source: CodeHS Data
A) You want to track the number of times you say “hello” today.
B) Your school wants to track how many people attend the football games over time throughout the school year.
C) You are trying to figure out what happens when different colors are mixed.
D) You need instructions on how to bake a cake.

A

B
Answer: B) Your school wants to track how many people attend the football games over time throughout the school year.
Explanation:
Line charts are commonly used to track changes over time, such as attendance trends across a year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A natural science museum opened a new display that lets the visitors view animations of a coral reef. The animations show how the health of the coral reef varies based on water temperature, pollution levels, and the number of fish living around the reef. The visitors are able to choose a numerical value for each of the conditions. The exhibit’s animations are determined by using a database to look up how healthy the coral reef is at the particular settings the visitor chooses and displaying a corresponding picture.
What is the biggest advantage of using an interactive exhibit like this instead of showing a poster with the same information?
Source: CodeHS Data
A) The visitors will be more entertained by the exhibit, but won’t learn any more than they would have from just looking at a poster.
B) By allowing the visitors to interact with the exhibit, the visitors will be able to understand coral reefs better.
C) The interactive display will be more visually appealing than a static poster.
D) Scientists will be able to learn more about the coral reefs by tracking the visitors’ interactions.

A

B
Answer: B) By allowing the visitors to interact with the exhibit, the visitors will be able to understand coral reefs better.
Explanation:
Interactive exhibits engage visitors by allowing them to manipulate variables, enhancing understanding of relationships between factors like pollution levels and coral health.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Polly and Sergei are working on a project to explain how the rise in oil prices is leading to a rise in lunch prices at their school.
Polly wants to have a chart showing the oil prices every day over the past 18 months and a different chart showing the lunch prices every day over the past 18 months. Sergei argues that having two separate charts won’t show the relationship between oil prices and lunch prices. He also thinks that the charts are showing too many data points. Instead, he wants to use a program to make a chart that shows both the oil prices and the lunch prices on the same chart. In addition, rather than plotting the prices for every day, he only wants to chart the average monthly prices for oil and lunch.
Why would Sergei’s approach make it easier for other people to analyze the data than Polly’s approach?
Source: CodeHS Data
A) Sergei’s chart would be much smaller than Polly’s chart, so people wouldn’t be overloaded with visual information.
B) It is always better to put all of the data you want to analyze on the same chart.
C) Polly’s presentation is more likely to be misunderstood because it uses two charts.
D) By transforming and summarizing the available data, Serge’s chart would more effective in showing any trends that may have occurred.

A

D
Answer: D) By transforming and summarizing the available data, Sergei’s chart would be more effective in showing any trends that may have occurred.
Explanation:
Combining datasets (oil prices and lunch prices) into one chart and summarizing data (using averages) simplifies analysis and makes trends easier to identify.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following statements are true about using visualizations to display a dataset?
Source: CodeHS Data
I. Visualizations are visually appealing, but don’t help the viewer understand relationships that exist in the data
II. Visualizations like graphs, charts, or visualizations with pictures are useful for conveying information, while tables just filled with text are not useful.
III. Patterns that exist in the data can be found more easily by using a visualization
A) I and II
B) II and III
C) III only
D) I, II, and III

A

C
Answer: C) III only
Explanation:
Patterns in data are easier to identify through visualizations like charts and graphs, but statements I and II are overly restrictive or incorrect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following are ways that data is collected about you:
I - Websites store data that tracks how you use the website
II - Websites store cookies in your browser so that the next time you visit the website things like your profile login and recent activity are saved
III - Some apps store geolocation information from your phone to track your location
IV - Transaction data is stored by credit card companies when you purchase things with a credit card
Source: CodeHS Data
A) I only
B) II only
C) I, II, III, and IV
D) I, II, and IV

A

C
Answer: C) I, II, III, and IV
Explanation:
Websites track user behavior (I), store cookies (II), and some apps track geolocation (III).
Credit card companies collect transaction data (IV).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suppose a student named Marcus wants to learn about the sleeping habits of students in his class. Marcus wants to collect data from his classmates to learn how many hours of sleep his classmates get. He then wants to process this data with a computer and visualize it in a Histogram.
Which of the following would be the best technique for Marcus to collect this data?
Source: CodeHS Data
A) Marcus should ask each of his classmates to write down on a piece of paper how many hours of sleep they get per night and hand the paper to him.
B) Marcus should have them download an app that tracks their phone geolocation and activity so he can see when their phones are in their rooms and not being used. From this data he can figure out how long each student sleeps.
C) Follow the link to view picture: https://drive.google.com/file/d/1hTjFSfPs1Gzi1bGNOxXrlK59n3GrLZa6/view?usp=drive_link
D) Follow the link to view picture:
https://drive.google.com/file/d/1Jp3GaS02cKE_z2b9lHIgYXpj_DjoG8Ho/view?usp=drive_link

A

C
Correct! The simplest and most effective way to collect data is with an online survey. It is better to ask for numeric data rather than text data so that he can visualize the numbers later with a Histogram.
Answer: C) Follow the link to view picture.
Explanation:
The linked option likely refers to using a survey, which is the most efficient and reliable way to collect specific numeric data for visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

News reporting agencies often want to find the public’s opinion on current events. One particular agency is considering two different strategies to collect this data by collecting responses to online surveys. The two strategies are outlined below.
Strategy One
1. Uses a database to store all of the survey responses
2. Stores some data as text and some data as numbers
3. Will track extra information about the survey taker that won’t be publicly visible
Strategy Two
1. Uses a single spreadsheet to store all of the survey responses
2. Stores all data as numbers
3. Will not track any information other than the survey responses
Which of the following statements is the most accurate comparison of these strategies?
Source: CodeHS Data
A) Strategy One will make it easier to sort and filter the data, while Strategy Two will make it easier to graph the data
B) Strategy One will cause problems because of the mixed data types, while Strategy Two will make it very easy to find specific data.
C) Strategy One will allow the agency to conclude more about the public’s opinion because it tracks extra metadata, while Strategy Two will make it hard to find trends and access particular pieces of the data.
D) Strategy One will require less cleaning and manipulation of the data, while Strategy Two will require a significant amount of extra computation to use the data.

A

C
Answer: C) Strategy One will allow the agency to conclude more about the public’s opinion because it tracks extra metadata, while Strategy Two will make it hard to find trends and access particular pieces of the data.
Explanation:
Strategy One’s use of mixed data types and metadata enhances analytical capabilities, while Strategy Two’s limited structure reduces versatility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following statements describes how mobile devices, the use of computers in more and more everyday interactions, and the ability to connect with other devices almost anywhere are changing society?
I. People are able to use mobile devices for new applications such as finding directions or finding restaurants
II. Data can be collected from thousands of sources and can be combined to provide new services to individuals and companies
III. Buildings, cars, classrooms, and offices can now be engineered with sensors to automate tasks like adjusting the thermostat or even driving
IV. Data that is collected can be used to identify social problems
Source: CodeHS Data
A) II, IV
B) III
C) I, III
D) I, II, III, IV

A

D
Answer: D) I, II, III, IV
Explanation:
All listed options describe societal shifts, from new applications (I) to collecting data for services (II) and using sensors in infrastructure (III, IV).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Shown here is the google search trend data for the search term “flu vaccine”:
https://drive.google.com/file/d/1VA121rA6gosK9Z_O0-Uir6KZgsXGFsL2/view?usp=sharing
What can we reasonably conclude from this data visualization?
Source: CodeHS Data
A) Exactly 100 people had a flu vaccine in October 2009
B) The highest interest in flu vaccines occurs in October each year, and will likely continue to occur in October in future years
C) The highest number of flu infections happened in 2009
D) Taking vitamin B-12 can help reduce your risk of catching the flu

A

B
Answer: B) The highest interest in flu vaccines occurs in October each year, and will likely continue to occur in October in future years.
Explanation:
The data visualization shows peaks in interest during October, a recurring seasonal trend likely tied to flu season.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Shown here is a line graph showing the stock prices for Twitter, Inc.
https://drive.google.com/file/d/1UbY85DoSnrGfqjZCvRQYt-xZNrV75nkI/view?usp=drive_link
What is misleading about this visualization?
Source: CodeHS Data
A) The y-axis is upside down, so larger values are at the bottom and smaller values are at the top.
B) The y-axis is truncated making the graph seem like it is increasing a lot more than it actually is.
C) The graph is omitting data.
D) The graph makes it seem like the increase in stock prices is caused by the month that they are sold/bought.

A

B
Answer: B) The y-axis is truncated, making the graph seem like it is increasing a lot more than it actually is.
Explanation:
Truncated axes exaggerate visual trends, misleading viewers about the true scale of changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A digital photo file contains data representing the level of red, green, and blue for each pixel in the photo. The file also contains metadata that describes the date and geographic location where the photo was taken. For which of the following goals would analyzing the metadata be more appropriate than analyzing the data?
Source: AP Classroom Publicly Available Questions
(A) Determining the likelihood that the photo is a picture of the sky
(B) Determining the likelihood that the photo was taken at a particular public event
(C) Determining the number of people that appear in the photo
(D) Determining the usability of the photo for projection onto a particular color background

A

B
Answer: B) Determining the likelihood that the photo was taken at a particular public event.
Explanation:
Metadata like date and geographic location is ideal for identifying events, unlike visual data (pixels), which would analyze image content.

17
Q

Biologists often attach tracking collars to wild animals. For each animal, the following geolocation data is collected at frequent intervals.
* The time
* The date
* The location of the animal
Which of the following questions about a particular animal could NOT be answered using only the data collected from the tracking collars?
Source: AP Classroom Publicly Available Questions
(A) Approximately how many miles did the animal travel in one week?
(B) Does the animal travel in groups with other tracked animals?
(C) Do the movement patterns of the animal vary according to the weather?
(D) In what geographic locations does the animal typically travel?

A

C
Answer Choices:
(A) Approximate distance traveled in a week: Can be answered. Geolocation data provides coordinates, which can calculate distance over time.
(B) Group travel patterns: Can be inferred. By comparing the locations of multiple animals over time, one can see if their paths overlap.
(C) Movement patterns varying with weather: Cannot be directly answered. Geolocation data alone does not include weather conditions; external weather data would be needed to analyze this.
(D) Typical geographic locations: Can be answered. Frequent locations can be identified using geolocation data.
Why C is correct: Weather data is not part of the information collected by the tracking collars. Without weather conditions recorded, it is impossible to determine correlations between movement and weather patterns.

18
Q

A certain social media Web site allows users to post messages and to comment on other messages that have been posted. When a user posts a message, the message itself is considered data. In addition to the data, the site stores the following metadata.
* The time the message was posted
* The name of the user who posted the message
* The names of any users who comment on the message and the times the comments were made
For which of the following goals would it be more useful to analyze the data instead of the metadata?
Source: AP Classroom Publicly Available Questions
(A) To determine the users who post messages most frequently
(B) To determine the time of day that the site is most active
(C) To determine the topics that many users are posting about
(D) To determine which posts from a particular user have received the greatest number of comments

A

C
Answer Choices:
(A) Users who post frequently: Can be determined using metadata. Metadata tracks who posted and when.
(B) Site activity times: Can be determined using metadata. Post times indicate activity.
(C) Topics users are posting about: Requires content (data). Topics are in the message text and are not captured in metadata.
(D) Posts with the most comments: Can be determined using metadata. Metadata includes the number and timing of comments.
Why C is correct: The actual content (data) of the posts must be analyzed to determine topics. Metadata only tells you “who,” “when,” and “how many,” not “what.”

19
Q

The table below shows the time a computer system takes to complete a specified task on the customer data of different-sized companies. Based on the information in the table, which of the following tasks is likely to take the longest amount of time when scaled up for a very large company of approximately 100,000 customers?
https://drive.google.com/file/d/13ZCIij3j5-r_nUbcTXgqqvO_yRH6Ov5y/view?usp=drive_link
Source: AP Classroom Publicly Available Questions
(A) Backing up data
(B) Deleting entries from data
(C) Searching through data
(D) Sorting data

A

D
Answer Choices:
(A) Backup data: Grows linearly. Backup time increases steadily with the number of customers.
(B) Deleting entries: Grows linearly. Deletion times also increase steadily.
(C) Searching: Grows linearly or logarithmically. Searching is relatively efficient even for large data sets.
(D) Sorting: Grows exponentially. Sorting times increase much faster as the number of customers grows.
Why D is correct: Sorting typically has a time complexity of 𝑂(𝑛log𝑛) or higher. The time to sort large data sets increases disproportionately compared to other tasks. The table would show sorting times escalating more steeply than the other tasks for larger data sizes.

20
Q

Learning from Data
A dataset contains columns for “Height,” “Weight,” and “Favorite Color.” Which column is most likely to provide numerical data?
Reference: Code.org Unit 5, Lesson 1: Learning from Data
A) Height
B) Favorite Color
C) Weight
D) Both A and C

A

Answer: D) Both A and C
Explanation:
* D is correct because both “Height” and “Weight” are numerical (quantitative) variables.
* A and C individually are correct but incomplete, as both columns provide numerical data.
* B is incorrect because “Favorite Color” is categorical.

21
Q

Exploring One Column
Which of the following methods is most appropriate to determine the most common value in a single column of a dataset?
A) Calculating the mean
B) Filtering for values greater than the median
C) Sorting the column in ascending order
D) Identifying the mode

A

Answer: D) Identifying the mode
Explanation:
* D is correct because the mode represents the most frequently occurring value.
* A is incorrect because the mean represents the average.
* B is incorrect because filtering doesn’t help identify the most common value.
* C is incorrect because sorting helps organize data but does not identify frequency.

22
Q

Filtering and Cleaning Data
What is the primary purpose of cleaning a dataset before analysis?
Reference: Code.org Unit 5, Lesson 3: Filtering and Cleaning Data
A) To remove all duplicate rows
B) To ensure the dataset is in a standard format and errors are addressed
C) To apply machine learning algorithms
D) To visualize the dataset

A

Answer: B) To ensure the dataset is in a standard format and errors are addressed
Explanation:
* B is correct because cleaning ensures accuracy and consistency for analysis.
* A is incorrect because not all duplicates are errors.
* C is incorrect because cleaning precedes algorithm application.
* D is incorrect because visualizing a messy dataset is ineffective.

23
Q

Exploring Two Columns
A dataset includes columns for “Year of Birth” and “Income.” Which type of chart is most appropriate to display the relationship between these two variables?
Reference: Code.org Unit 5, Lesson 4: Exploring Two Columns
A) Line graph
B) Scatter plot
C) Pie chart
D) Box plot

A

Answer: B) Scatter plot
Explanation:
* B is correct because scatter plots show relationships between two numerical variables.
* A is incorrect because line graphs are better for trends over time.
* C is incorrect because pie charts are for proportions.
* D is incorrect because box plots summarize a single variable’s distribution.

24
Q

Big, Open, and Crowdsourced Data
What is the primary characteristic of big data?
A) It is always stored in spreadsheets.
B) It involves large, complex datasets that require special tools for processing.
C) It only includes numerical data.
D) It can be analyzed without cleaning or filtering.

A

Answer: B) It involves large, complex datasets that require special tools for processing.
Explanation:
* B is correct because big data is defined by volume, variety, and velocity.
* A is incorrect because spreadsheets cannot handle most big data.
* C is incorrect because big data includes categorical and unstructured data.
* D is incorrect because big data often requires significant preprocessing.

25
Q

Machine Learning
Which of the following best describes supervised machine learning?
Reference: Code.org Unit 5, Lesson 6: Machine Learning
A) Learning patterns without labeled data
B) Using labeled data to train a model and make predictions
C) Using reinforcement to learn from actions
D) A model that runs automatically without human interaction

A

Answer: B) Using labeled data to train a model and make predictions
Explanation:
* B is correct because supervised learning relies on labeled datasets.
* A is incorrect because that describes unsupervised learning.
* C is incorrect because that describes reinforcement learning.
* D is incorrect because automation is not unique to supervised learning.

26
Q

Algorithmic Bias
Which of the following is an example of algorithmic bias?
A) A dataset is missing data for certain categories.
B) A search engine provides results that favor one group over another.
C) A graph has incorrect labels.
D) A calculation contains an arithmetic error.

A

Answer: B) A search engine provides results that favor one group over another.
Explanation:
* B is correct because algorithmic bias occurs when algorithms systematically favor certain groups.
* A is incorrect because missing data leads to bias indirectly but isn’t an algorithmic bias itself.
* C and D are incorrect because these are errors, not biases.

27
Q

Exploring Two Columns
A dataset includes “City” and “Average Temperature.” What type of chart is most effective for comparing the temperatures of different cities?
A) Line chart
B) Pie chart
C) Bar chart
D) Scatter plot

A

Answer: C) Bar chart

28
Q

Learning from Data
Which of the following best describes metadata?
A) Data about data, such as file size or date created
B) A summary of data trends
C) Raw data before cleaning
D) A chart displaying data

A

Answer: A) Data about data, such as file size or date created

29
Q

Filtering and Cleaning Data
What is the best way to handle outliers in a dataset?
A) Always remove them.
B) Ignore them.
C) Investigate their causes and decide how to handle them.
D) Replace them with the average value.

A

Answer: C) Investigate their causes and decide how to handle them.

30
Q

Machine Learning
What is the purpose of a training dataset in machine learning?
A) To evaluate the performance of the model
B) To provide the model with data to learn patterns
C) To label new data automatically
D) To store results of predictions

A

Answer: B) To provide the model with data to learn patterns

31
Q

Algorithmic Bias
How can algorithmic bias be mitigated?
A) Use a larger dataset regardless of its quality.
B) Ensure diversity in training data and test results regularly.
C) Focus only on quantitative data.
D) Avoid using categorical data.

A

Answer: B) Ensure diversity in training data and test results regularly.

32
Q

Big, Open, and Crowdsourced Data
What is a key challenge of crowdsourced data?
A) It is never accurate.
B) It may lack standardization and require cleaning.
C) It cannot be analyzed in real-time.
D) It always contains duplicate information.

A

Answer: B) It may lack standardization and require cleaning.

33
Q

Filtering and Cleaning Data
A dataset includes empty cells in a “Salary” column. What is the most appropriate method to handle these cells?
A) Replace them with zeros.
B) Remove the entire column.
C) Replace them with the column average or median.
D) Leave them empty.

A

Answer: C) Replace them with the column average or median.

34
Q

Learning from Data
What is the role of data visualization?
A) To clean and filter data
B) To identify patterns and insights
C) To replace statistical analysis
D) To provide raw data

A

Answer: B) To identify patterns and insights