Big Idea 2: Data Flashcards
How can data be represented using bits? In the answer, explain the definition of bit, byte, abstraction, analog data, digital data, sampling technique, and samples.
Data values can be stored in variables, lists of items, or standalone constants and can be passed as input to (or output from) procedures.
**Computing devices represent data digitally, meaning that the lowest-level components of any value are bits.
Bit is shorthand for binary digit and is either 0 or 1.**
A byte is 8 bits.
Abstraction is the process of reducing complexity by focusing on the main idea. By hiding details irrelevant to the question at hand and bringing together related and useful details, abstraction reduces complexity and allows one to focus on the idea.
Bits are grouped to represent abstractions. These abstractions include, but are not limited to, numbers, characters, and color.
The same sequence of bits may represent different types of data in different contexts.
Analog data have values that change smoothly, rather than in discrete intervals, over time. Some examples of analog data include pitch and volume of music, colors of a painting, or position of a sprinter during a race.
The use of digital data to approximate real-world analog data is an example of abstraction.
Analog data can be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples. The samples are measured to figure out the exact bits required to store each sample.
Explain the consequences of using bits to represent data. In the answer, explain the definition of overflow.
In many programming languages, integers are represented by a fixed number of bits, which limits the range of integer values and mathematical operations on those values. This limitation can result in overflow or other errors.
Other programming languages provide an abstraction through which the size of representable integers is limited only by the size of the computer’s memory; this is the case for the language defined in the exam reference sheet.
In programming languages, the fixed number of bits used to represent real numbers limits the range and mathematical operations on these values; this limitation can result in round-off and other errors. Some real numbers are represented as approximations in computer storage.
How can you calculate the binary (base 2) equivalent of a positive integer (base 10) and vice versa, and how can you compare and order binary numbers? In the answer, explain the definition of number bases.
Number bases, including binary and decimal, are used to represent data.
Binary (base 2) uses only combinations of the digits zero and one.
Decimal (base 10) uses only combinations of the digits 0 – 9.
As with decimal, a digit’s position in the binary sequence determines its numeric value. The numeric value is equal to the bit’s value (0 or 1) multiplied by the place value of its position.
The place value of each position is determined by the base raised to the power of the position. Positions are numbered starting at the rightmost position with 0 and increasing by 1 for each subsequent position to the left.
Compare data compression algorithms to determine which is best in a particular context. In the answer, explain the definition of data compression, lossless data compression, lossy data compression.
Data compression can reduce the size (number of bits) of transmitted or stored data. Fewer bits does not necessarily mean less information.
The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied.
Lossless data compression algorithms can usually reduce the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
Lossy data compression algorithms can significantly reduce the number of bits stored or transmitted but only allow reconstruction of an approximation of the original data. Lossy algorithms can usually reduce more than lossless compression algorithms.
In situations where quality or ability to reconstruct the original is maximally important, lossless compression algorithms are typically chosen. In situations where minimizing data size or transmission time is maximally important, lossy compression algorithms are typically chosen.
Describe what information can be extracted from data. In the answer, explain the definition of information.
Information is the collection of facts and patterns extracted from data.
Data provide opportunities for identifying trends, making connections, and addressing problems.
Digitally processed data may show correlation between variables. A correlation found in data does not necessarily indicate that a causal relationship exists. Additional research is needed to understand the exact nature of the relationship.
Often, a single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.
Describe what information can be extracted from metadata. In the answer, explain the definition of metadata.
Metadata are data about data. For example, the piece of data may be an image, while the metadata may include the date of creation or the file size of the image.
Changes and deletions made to metadata do not change the primary data.
Metadata are used for finding, organizing, and managing information.
Metadata can increase the effective use of data or data sets by providing additional information.
Metadata allow data to be structured and organized.
Identify the challenges associated with processing data.
**The ability to process data depends on the capabilities of the users and their tools.
Data sets pose challenges regardless of size, such as:
- the need to clean data
- incomplete data
- invalid data
- the need to combine data sources**
Depending on how data were collected, they may not be uniform. For example, if users enter data into an open field, the way they choose to abbreviate, spell, or capitalize something may vary from user to user.
Cleaning data is a process that makes the data uniform without changing their meaning (e.g., replacing all equivalent abbreviations, spellings, and capitalizations with the same word).
Problems of bias are often created by the type or source of data being collected. Bias is not eliminated by simple data cleaning.
The size of a data set affects the amount of information that can be extracted from it.
Large data sets are difficult to process using a single computer and may require parallel systems.
Scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.
Explain how programs can be used to extract information from data.
Programs can be used to process data to acquire information.
Tables, diagrams, text, and other visual tools can be used to communicate insight and knowledge gained from data.
Search tools are useful for efficiently finding information.
Data filtering systems are important tools for finding information and recognizing patterns in data.
Programs such as spreadsheets help efficiently organize and find trends in information.
Some processes that can be used to extract or modify information from data include the following:
- Transforming every element of a data set, such as doubling every element in a list, or adding a parent’s email to every student record.
- Filtering a data set, such as keeping only the positive numbers from a list, or keeping only students who signed up for band from a record of all the students.
- Combining or comparing data in some way, such as adding up a list of numbers, or finding the student who has the highest GPA.
- Visualizing a data set through a chart, graph, or other visual representation.
Explain how programs can be used to gain insight and knowledge from data.
Programs are used in an iterative and interactive way when processing information to allow users to gain insight and knowledge about data.
Programmers can use programs to filter and clean digital data, thereby gaining insight and knowledge.
Combining data sources, clustering data, and classifying data are parts of the process of using programs to gain insight and knowledge from data.
Insight and knowledge can be obtained from translating and transforming digitally represented information.
Patterns can emerge when data are transformed using programs.