Big Idea 2 - Data Flashcards
Abstraction
The process of simplifying complex systems by focusing on the essential details and hiding unnecessary complexities.
Analog Data
Refers to continuous, real-world information that is represented by a range of values. It can take on any value within a given range and is often used to describe physical quantities like temperature or sound.
Binary Numbers
A base-2 number system that uses only two digits, 0 and 1, to represent all values. Each digit in a binary number is called a bit.
Byte
A unit of digital information that consists of 8 bits. It can represent a single character or a small amount of numerical data.
Cleaning Data
The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in a dataset. It involves tasks like removing duplicate entries, handling missing values, and standardizing formats to ensure data quality.
Data Compression
The process of reducing the size of data files while maintaining as much information as possible. This allows for more efficient storage and transmission of data.
Data Filtering
The process of selectively extracting or removing specific pieces of data from a larger dataset based on certain criteria or conditions. It allows you to focus on relevant information while excluding irrelevant or unwanted data.
Data Transformation
Refers to the process of converting data from one format or structure to another. It involves modifying, reorganizing, or manipulating data to make it more suitable for analysis or other purposes.
Digital Data
Refers to information that is represented using discrete, binary values (0s and 1s). It is commonly used in computers and other digital devices because it can be easily stored, processed, and transmitted.
Hexadecimal
A number system that uses base-16 instead of base-10. It uses digits from 0 to 9 and letters from A to F to represent values from 0 to 15.
Lossless Compression Algorithms
Methods used to compress data files without losing any information. The compressed file can be fully restored to its original form without any loss of data.
Lossy Compression Algorithms
Methods used to reduce the size of a file by permanently removing some data. These algorithms achieve high compression rates but result in a loss of quality or detail in the compressed file.
Metadata
Refers to descriptive information about data, such as its format, location, authorship, and creation date. It provides context and additional details that help organize and manage data effectively.
Overflow Error
Occurs when a computer program or system tries to store a value that is too large to be represented within the available memory or data type. This can lead to unexpected and incorrect results.
Rounding Error
Occurs when a number is approximated or rounded to a certain decimal place, resulting in a small discrepancy between the rounded value and the actual value.
ASCII code
Stands for American Standard Code for Information Interchange. It is a character encoding standard that assigns unique numeric values to represent characters such as letters, numbers, and symbols in computer systems.
Data
Refers to information that is collected, stored, and processed by computers. It can be in the form of numbers, text, images, or any other type of digital content.
Constant Value
A fixed value that does not change during the execution of a program.
Lists
Lists are ordered collections of items in computer programming. They allow you to store multiple values under one variable name and access them using their position or index.
Machine Code
A low-level programming language that consists of binary instructions directly understood by the computer’s hardware. It represents the most basic level of instructions that a computer can execute.
Number Bases
Also known as numeral systems, are methods of representing numbers using a specific set of symbols or digits. Each digit in a number represents a multiple of the base raised to a power.
Rounding Errors
Inaccuracies that occur when representing real numbers with finite precision. These errors happen because some numbers cannot be precisely represented using binary floating-point representation.
Unicode system
An international character encoding standard that assigns unique numeric values to represent characters from various languages and scripts around the world. It allows computers to handle text in multiple languages more effectively than previous standards like ASCII.
Variables
Containers that hold values or data in a computer program. They can store different types of information such as numbers, text, or boolean values.
Bits
The basic units of information in computing. They can represent either a 0 or a 1, and they are used to store and transmit data in binary form.
Lossless data compression
A method of reducing file size without losing any information. It allows for perfect reconstruction of the original data.
Lossy data compression
A method of reducing file size by removing unnecessary or less important information. Unlike lossless compression, some data may be permanently lost during this process.
LZW compression algorithm
A lossless data compression method that replaces repeated sequences of characters with shorter codes. It is commonly used in file formats like GIF and TIFF.
Redundancy
Refers to the duplication of critical components or information in a system to ensure reliability and fault tolerance. It involves having backup systems or data that can be used if the primary ones fail.
Run-Length Encoding
A simple form of data compression that replaces repeated consecutive characters or symbols with a count and the character itself. It reduces the size of data by representing long sequences of the same value with shorter codes.
Big Data
Refers to extremely large and complex sets of data that cannot be easily managed or analyzed using traditional methods.
Correlation
Refers to a statistical measure that indicates the extent to which two variables are related or move together in a consistent way.
Data Biases
Refer to systematic errors or prejudices present in a dataset that can lead to inaccurate or unfair conclusions when analyzing the data.
Data Centers
Large facilities that house computer systems and network infrastructure, used to store, manage, and process vast amounts of data.
Scalability
Refers to the ability of a system or network to handle an increasing amount of work or users without sacrificing performance or efficiency.
Server Farms
Large collections of interconnected servers housed in dedicated facilities designed for hosting websites, applications, or providing other computing services.
Correlations
Refer to the statistical relationship between two or more variables. It measures how closely these variables are related to each other, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation).
Data mining
Involves extracting useful patterns or knowledge from large datasets using techniques such as statistical analysis, machine learning, and pattern recognition.
Data Visualization
The process of representing data in a visual format, such as charts, graphs, or maps, to make it easier to understand and analyze.
Google Sheets
A web-based spreadsheet program offered by Google as part of its suite of productivity tools. It allows multiple users to collaborate on the same spreadsheet simultaneously, providing real-time updates and cloud storage.
Iterative and Interactive Process
Refers to a method of problem-solving or development where the steps are repeated multiple times, with each repetition building upon the previous one. It involves constant feedback and collaboration between the user and the system.
Microsoft Excel
A spreadsheet program that allows users to organize, analyze, and manipulate data using formulas, functions, and charts.
Outliers
Data points that significantly deviate from the overall pattern or trend of a dataset. They can skew statistical analyses and affect the accuracy of results.
Patterns
Refer to recurring solutions or designs that can be applied to solve similar problems. They provide a structured approach for solving problems efficiently by reusing proven methods.
Spreadsheet Program
Software that allows users to organize, analyze, and manipulate numerical data in rows and columns. It provides functions for calculations, graphing capabilities, and tools for creating charts or tables.
Text Analysis
Refers to the process of extracting meaningful information from written text by analyzing its content, structure, and context.
Text Mining
Involves extracting useful patterns or knowledge from large amounts of unstructured textual data using techniques such as natural language processing and machine learning.
Trends
Patterns that show changes over time. In computer science, analyzing trends can help identify patterns in data or predict future behavior.