Lesson 2.4: Data and Information Flashcards
Data pyramid
A concept that visualizes the data-information-knowledge-wisdom hierarchy.
Information
a collection of processed data from a variety of sources.
Knowledge
is a dynamic combination of experience, values, contextual information, expert insight, and grounded intuition that provides an environment and framework for evaluating and incorporating new experiences and information. It originates and is applied in the minds of knowers. In organizations, it often becomes embedded not only in documents and repositories but also in organizational routines, processes, practices, and norms.
Wisdom
knowing the right thing to do.
Structured data
Structured data is coded in a way that makes it easy to convert into a form usable for analysis. Examples of structured data include contact information such as first name, last name, email address, and phone number. In addition, quantitative fields like date of birth, date of transaction, and the amount received or amount due are forms of structured data.
Unstructured data
Unstructured data refers to data that is more complex and possibly stored in a format that is not easily decoded. Unstructured data takes more time to parse through to retrieve the essential information. Examples of unstructured data include data stored in text or video format, comments on a web page, text messages, and videos of presentations or conferences.
Big data
Big data is a large collection of data that is incapable of being processed by previous generations of analytical tools. Big data is rapidly changing the way businesses make decisions and understand customer behavior.
Information systems
Information systems are collections of data and information used to support decision-making in organizations. While information systems do not have to rely on intricate technologies, technology is typically assumed to be one of the components.
What is PII?
Personally Identifiable Information
three core variations of cloud-based systems
infrastructure as a service (IaaS)
platform as a service (PaaS)
software as a service (SaaS)
infrastructure as a service (IaaS)
The IaaS model provides access in a virtualized environment and the computing resources are composed of virtualized hardware. This includes things like network connections, virtual server space, and load balancers.
platform as a service (PaaS)
In the PaaS model, customers have access to a platform that supports the development and management of web applications. PaaS enables quicker development life cycles and reduced infrastructure requirements since the majority of processing happens in the cloud rather than on local storage and processor resources.
software as a service (SaaS)
With the SaaS model, the software is licensed to customers with subscriptions and central hosting. Some examples include Gmail, Google Docs, and Microsoft Office 365.
infrastructure as a service (IaaS) VS. platform as a service (PaaS)
PaaS is commonly confused with IaaS. The difference lies in who manages the switching, routing, and operating systems. If the client is responsible for licensing the operating system and managing the back-end networking, it is considered IaaS. If the cloud service provider is responsible for licensing the operating systems and back-end storage and networking, it is considered PaaS.
What is data hygiene?
The term data hygiene refers to the processes of ensuring the cleanliness of data (i.e., that the data is relatively error-free). Dirty data can be caused by things such as duplicate records, incomplete or outdated data, and mistakes introduced as data is entered, stored, and managed. Data quality is crucial to operational and transactional processes within an organization and to the reliability of analytical reporting.
Types of bad data
Duplicate data——Two or more identical records
Conflicting data——The same records with differing attributes
Incomplete data——-Missing attributes
Invalid data——Attributes not conforming to standardization
Unsynchronized data——Data not appropriately shared between two systems
Quality data
Quality data is typically defined as data that is precise, valid, reliable, timely, and complete. Different organizations will prioritize these attributes according to their needs. An organization might also include additional attributes.
Precision is an important attribute of data. Precision describes how precise the data is in the context of its intended use. Data collected in healthcare, for example, must be more precise than that of other industries.
Quality data is valid when it meets the requirements of the data collection process. If a data set that captures the age of patients contains values that are negative or greater than 120, using the invalid data might misinform the decision process.
Data should be reliable regardless of where it resides or how it was collected. Whether the patient’s age is in a paper folder in the medical office or in the electronic patient system, the values should be equal.
Data should be collected at the right time to be relevant to decision-making. When deciding how many boots could be sold in a city, data from purchases during fall and winter would be relevant to the decision. Because people do not usually buy boots in the summer, data on how many boots were sold during the summer is not relevant to the decision.
Quality data is complete and thorough, providing a full view of the relevant picture. Incomplete data can lead to serious situations. For example, if a medical office is trying to improve waiting time but leaves out data on various operational details, the results may skew understanding of the reasons for long waiting times, leading to decisions that do not address the real reasons.
Quality Data Attributes
Precise- how precise the data is in the context of its intended use.
Valid-meets the requirements of the data collection process.
Reliable-should be reliable regardless of where it resides or how it was collected.
Timely-Should be collected at the right time to be relevant to decision-making.
Complete- complete and thorough, providing a full view of the relevant picture.