Introduction to Data Flashcards
Two main formats of data?
Structured and Unstructured
Data is dericed from the word _______, which means _________.
Datum, given/facts
Structured data is ______, ______, _______, _______.
Organized, easy to manage
Tabular Format
Predefined structure
Text and Numbers
Unstructured data is ______, ______, _______, _______.
Unorganized, difficult to manage
No specific format
No predefined structure
Text, images, audio, video
Examples of unstructured data
Reports and email messages
Surveillance videos
Quantitative data is also called
Numerical Data
Quantitative data can be
count, measure, represent with numbers
Qualitative data is also called
categorical data
Qualitative data can be
group into categories
Data does not tell anything without context.
True
Data context refers to
information that provides meaning to data
Characteristics of data:
Time frame
Location and source
Characteristics of the data are also called
Metadata
What is the goal of data in organizations?
Support Business Objectives
Data can help organizations by?
Improving decision making
How can Data help business organizations?
Profitability
Social Good
Research
Customer Satisfaction and Employee Happiness
Measure ROI (Return on Investments)
Optimize processes and find new opportunities
How does data help in healthcare?
Monitor Personal Data
What is supply chain?
The sequence of processes involved in the production and distribution of a product.
What is the goal of data in healthcare?
Detect and prevent health problems
Turn patient care into precision medicine
Advancing healthcare research worldwide
What is the goal of data in supply chain?
Make sense of the massive amount of generated data
Metrics used to optimize supply chain?
Average Inventory
Inventory turnover ratio
What is inventory turnover ratio?
Calculates how often a company has sold and replaced inventory during a given period.
Common analytics technique used in supply chain to predict whether the right products will be in stock in time?
Demand Forecasting
How does data help in education?
User feedback
What does the DIKW pyramid stand for?
Data
Information
Knowledge
Wisdom
This pyramid highlights the journey data takes in order to become valuable wisdom.
DIKW pyramid
What is the foundation of the DIKW pyramid?
Data
Data without context?
Raw Data
_________ is (organized) data with context.
Information
______ is a higher level of understanding than data and is created by adding context to data.
Information
_________ is information with meaning.
Knowledge
Transforming data to _______ is the hardest part of the entire pyramid.
Wisdom
Add more meaning to the information at hand and understand the relationships between each piece of information.
Transforming knowledge into wisdom
Allows us to make decisions and apply our knowledge the world around us.
Wisdom
What is decision making?
Decision making is the process to make the right choices at the right time.
Data driven decision making is a five-step process, name the five steps.
01 - Ask Question
02 - Gather Data
03 - Prepare Data
04 - Conduct Analysis
05 - Make Decision
The journey of a data-driven process starts with ________________.
Identifying the question you are looking to answer.
A good question will
Outline exact what you are looking to answer
Prevent scope creep
Ensure success throughout the rest of the process
Collecting data is finding out _____________.
where you should source your data
Preparing data can mean many things:
Clean bad to good data
Arrange data into expected structure for analysis
In some cases the ___________ phase can be the most cumbersome taking up to 80% of the overall time for the entire decision making process.
data preparation
This step is critical because it is what transforms our data into something we can make decisions with.
Analyzing data
This step is ultimately interpreting the results and making a decision.
Making decisions
This whole process is also iterative in nature.
Making decisions
In order to solve the burden of overwhelming data, what should you do?
Summarize the data into smaller pieces of information to make informed decisions.
What translate raw data into summaries that are easier to understand?
Aggregations
Common aggregations:
Simple average (mean)
Totals aka sums
Minimums and Maximums
Modes
Aggregations allow you to focus on
A specific attribute of a dataset
Aggregations appear in many ways throughout organizations:
Metrics
Benchmarks
KPIs (Key Performance Indicators)
The field of ________ is responsible for overseeing and coordinating all the subdomains into one unifying structure
Data Management
seeks to ensure that data is consistent, trustworthy, and isn’t misused.
Data Governance
Ensures that data is accurate, valid, complete, and consistent.
Data Quality
go hand in hand to oversee data access, use and protection.
Data Privacy and Security
Ensures that data is collected, stored, and used ethically
Data ethics
Principles of data ethics:
- Permission for data collection
- Transparency about the plan
- Privacy of data
- Good Intentions
- Consider the outcome
Asking for user consent before collecting data. Users are in control of their data.
Permission
Being transparent of how you plan to use, store, and collect data
Transparency
Lack of transparency may lead to reputation and legal damage.
True
Refers to secluding (information about) yourself
Privacy
Requires individuals to be in control of how their data is collected and used.
Data Privacy
What does PII mean?
Personal Identifiable Information
Individual Responsibilities for Privacy Protection:
Strong Passwords
Up-to-date operating systems
Cautionary Internet Browsing
How to prevent data breaches:
Limit sharing sensitive data
Pseudo data anonymization
Data is collected for the right reasons
Question yourself about the reasons you collect data
Intentions
Are there consequences of my actions?
Protecting vulnerable populations
Outcomes
is a branch of ethics that deals with the moral problems related to data. It is a code of behavior that specifies what is right and wrong in the handling of data.
Data ethics
a framework to regulate data from its collection to its use, analysis, and disposal.
Data Life cycle
Steps in the Data Life Cycle?
Planning and Collecting
Storing and Managing
Cleaning and Processing
Analyzing and visualizing
Sharing
Archiving/destroying
Why is data life cycle important?
Ensure data is regulated responsibly
Identify potential areas for improvement
Improve efficiency and effectiveness of operations
What part of the Data Lifecycle stage focuses on sharing of roles and responsibilities?
Plan and collect stage
What part of the Data Life cycle stage is where you need to prepare a business question that answers the need of your stakeholders
Plan and collect stage
What part of the Data Life Cycle Stage seeks to achieve optimal results in terms of time and cost?
Plan and collect stage
What part of the Data Lifecycle stage also focuses on collecting or creating data?
Plan and collect stage
What Data Lifecycle stage manages data stored in databases or data warehouses?
Store and Manage stage
What part of the Data Lifecycle stage ensures that the data is easily accessible to the right person and can be managed overtime?
Store and Manage stage
What data lifecycle stage includes Removal of PII?
Store and Manage stage
How to clean and process data before proper data analysis
Formatting data
Dealing with missing values or errors
Transforming data into a more usable form
What part of the data lifecycle stage focuses on analyzing raw data for new insights?
Analyze and Visualize stage
Data is easier to interpret when visualized
True
What part of the data lifecycle stage focuses on Communicating your results with stakeholders?
Share stage
Example of sharing insights
Dashboards, reports, papers
What part of the data lifecycle stage focuses on when to kept or delete data?
Archive or destroy stage
Data archiving can be met with the following:
Data Backups
Documentations
Digitizing
Data destruction is done on rare cases. This is done to what?
Protect private information
Resources can be freed up
Common mistakes about data:
Not having a clear goal or question
Insufficient or wrong data
Lack of appropriate analysis
No clear communication of results
The data sample doesn’t represent all the data.
Data Bias