Data Resource Management Flashcards
the practices for achieving consistent access to and delivery of data across the spectrum of data subject areas and data structure types in the enterprise
data management
processed, organized, and structured data
information
often used to query tables
Structured Query Language (SQL)
concerned with data policies, data procedures, access control, backup and recovery, and data classification standards to govern business-critical data
data governance
looks for patterns and anomalies in data and then tries to discover meaningful patterns
Data discovery, or data mining
which is a broad range of tools and practices that endeavors to provide better business strategic decision-making and even claims to predict the future
business intelligence (BI)
store large amounts of unstructured data in their raw form and allow for flexible analysis
data lakes
well-thought-out collections of computer files that are storehouses of data for use by managers in making decisions
database
are where a database holds data
tables
rows
records
columns
fields
questioned
queried
an application software that is used to create a collection of related files that consist of records of data separated by fields that can be queried to produce populations of information
Database Management Systems (DBMS)
A field in a database table that uniquely identifies a record in the table
Primary key
A field in a database table that provides a link between two tables in a relational database
Foreign key
The organization or layout of a database that defines the tables, fields and constraints, keys, and integrity of the database
Schema
data that is collected from all over the internet and other data sources
big data
4 Vs of Big Data
- volume
- variety
- veracity
- velocity
amount of data
Volume
form of the data
Variety
quality of data
Veracity
speed the data is created
Velocity
the examination of huge sets of data to find patterns, connections, outliers, and hidden relationships
data mining (data discovery)
resides in fixed formats
structured data
unorganized data that cannot be easily read or processed by a computer because it is not stored in rows and columns like traditional data tables
Unstructured data
Data that can be converted into structured data with a lot of work
Semi-structured data
set of software that allows businesses to gather a large amount of data and use it to make business decisions based on what they find
Data mining tools
are used to consolidate disparate data in a central location
Digital warehouses
are one trillion terabytes of data
Yottabytes
one thousand gigabytes
terabyte
are smaller and the systems are designed to support the needs of that specific department
Data mart data sets
two main big data tools
ETL & Hadoop
What does ETL stand for?
*Extract
* Transform
* Load
describes tools that are used to standardize data across systems and allows the data to be queried
data integration process (type of software)
ETL
a tool that lets you ask your data questions that in turn lead to answers and assist in making decisions
Querying
Where are data often extracted
Big Data Tools
CRM or ERP systems
What is the next step after finding out where the data is coming from?
Big Data Tools
extract
What is the next step after extracting data?
Big Data Tools
Transform
may involve removing decimals and dollar signs from financial transactions so it will fit into the structured data table
Big Data Tools
transform
What is the next step after transforming data?
Big Data Tools
Load
The second main big data tool set
Big Data Tools
Hadoop
an infrastructure for storing and processing large sets of data across multiple servers
Big Data Tools (type of software)
hadoop
designed to handle unstructured and semi-structured data, which traditional databases may struggle with
Big Data Tools
Hadoop
uses a distributed file system that allows files to be stored on multiple servers
Big Data Tools
Hadoop
An alternative solution to Hadoop that has been getting a wider adoptability as of recently
Apache Sparks
the first step in data output
Sourcing data
the most widely used standard computer language for relational databases, as it allows a programmer to manipulate and query data
SQL
One commonly used tool for data output is software called…
Tableau
produces interactive data visualizations focused on business intelligence
Tableau
helps with simplifying raw data into information using different formats such as graphs, charts, and numerical analysis
Tableau
Which type of data is typically associated with social media posts?
unstructured data
What does the term variety refer to in the context of generating and collecting big data?
Forms of data
Which restriction applies to data located in the primary field of a database?
Each key must be unique
Which tool can a data analyst use to collect, process, and analyze unstructured data for storage in a company’s data warehouse?
type of data process (software)
Extract, Transform, Load (ETL) software
Which process can a data analyst use to identify useful patterns and hidden relationships in a large set of social media data?
Data mining
Which software tool is appropriate for a business analyst to use when creating visualizations to present social media data as business intelligence to an executive team?
Tableau
a business intelligence software used for creating interactive and visually appealing dashboards and visualizations that can be used to present data insights.
Tableau
Which approach should the company use to reduce the time associated with data management and better support the needs of individual departments, given that only specific departments are using 20% of the company’s data warehouse capacity?
Using a data mart
a smaller and more targeted version of a data warehouse, designed to meet the specific needs of a department or business unit
Data Mart
can be defined as acquiring data, ensuring the data are valid, and then storing and processing the data into usable information for a business
Data management processes
used to describe the process of transforming data into an accurate, clean, and error-free form
scrubbing the data
Three Steps for Collecting Data
- determine purpose and reason for obtaining data
- develop business-related questions
- determine tools to acquire data
having a good plan for organization and ensuring the integrity of your data
master data management (MDM)
a methodology or process used to define, organize, and manage all the data of an organization that provides a reference for decision-making
Master data management (MDM)
is managing the availability, integrity, and security of the data to ensure that the data remain high quality and valid for data analytics
Data Governance
Clean data starts when the database is created by including database field (column) controls
validity checks
requires the whole organization to buy into being stakeholders of the data, not just the database administrators or the programmers or the executives
Data governance
measures the gain or loss generated by intelligent data management relative to the amount of money invested
return on investment (ROI)
What is the purpose of data governance in an organization?
Manage and improve the quality of data
manage and improve the quality of data across an organization, ensuring it is accurate, complete, and consistent
data governance
involves establishing policies, procedures, and controls to ensure data integrity and reliability
data governance
Which function is included within the scope of data governance?
Maintaining updated data
The term that encompasses patterns, correlations, and hidden data relationships
data relationships
methodology of reviewing raw data using qualitative and quantitative methods
data analytics
It looks for patterns and hidden information to exploit for enhanced productivity and business success
data analytics
Benefits of Data Management
- find data relationships
- predictive analytics
- business intelligence
- data analysis
helps organizations make better decisions
business intelligence
a database technology that has been optimized for querying and reporting, instead of processing transactions
type of processing
Online Analytical Processing (OLAP)
are designed to speed up the retrieval of data
OLAP databases
is applying statistics and logic techniques to define, illustrate, and evaluate data
Data analysis
attempts to make sense of an organization’s collected data, turn those data into useful information, and validate the organization’s future decisions
data analysis
enables you to sift through large sets of data and identify the most common and most important topics in an easy, fast, and scalable way
Topic analytics
is the process of extracting information from written sources such as websites, e-books, and emails and inserting the data into a database to evaluate and interpret relevance or to understand customers’ feedback on products and services
Text analytics (text mining)
attempts to make connections between data so organizations can try to predict future trends that may give them a competitive advantage
Business analytics
builds on predictive analysis to make decisions about future industries and marketplaces
forms of business analytics
Prescriptive analytics
attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data correlations between one thing and any other things that pertain to it
forms of business analytics
Predictive analytics
defines past data you already have that can be grouped into significant pieces, like a department’s sales results, and starts to reveal trends
forms of business analytics
Descriptive analytics
looks at an organization’s internal data, analyzes external conditions like supply abundance, and endorses the best action
forms of business analytics
decision analytics
A company wants to improve its marketing strategies by analyzing customer data.
What is the purpose of data mining in this context?
To identify patterns and correlations in the data
A data analyst wants to analyze social media posts to discover patterns in customer behavior and sentiments.
What type of analytics is suitable for this task?
Text analytics
As the volume of data continues to grow exponentially, businesses face the challenge of managing diverse data types (structured, semi-structured, and unstructured) and processing them in real time
Challenges of Data Analytics and Business Intelligence
Handling the volume, variety, and velocity of data
Modern data analytics and business intelligence solutions increasingly rely on AI and machine learning algorithms to extract insights, make predictions, and automate decision-making
Challenges of Data Analytics and Business Intelligence
Incorporating AI and machine learning
As data volume and complexity grow, businesses must ensure that their data analytics and business intelligence solutions are scalable, capable of handling increased workloads and adapting to evolving needs
Challenges of Data Analytics and Business Intelligence
Scalability
With increasing data protection regulations, such as GDPR and CCPA, businesses must ensure they handle data securely and comply with relevant legislation
Challenges of Data Analytics and Business Intelligence
Data privacy and compliance
Effective data analytics and business intelligence initiatives require collaboration between different stakeholders, including data scientists, analysts, IT professionals, and business users
Challenges of Data Analytics and Business Intelligence
Collaboration and communication
Empowering business users with self-service analytics tools and easy access to data can improve decision-making across the organization
Challenges of Data Analytics and Business Intelligence
Democratizing data access
addresses the intangible values of data loss or a decrease in operating efficiencies
qualitative ROI
where businesses implement processes to protect the actual data from getting stolen or tampered with in the database computers
Data level security
encrypting the data so that only those with authorized access can know how to unencrypt
**
encryption
protecting the hardware that the database resides on and other communications equipment from malicious software that tries to enter the system
System level security
starts with log-on IDs and passwords but can go further in verification to restrict the user from visiting unauthorized websites or downloading from untrusted sources
User-level security
Which level of security protects the hardware that a database resides on?
System level security
What is the meaning of return on investment (ROI) based on qualitative investments in an organization?
Earnings from investments in intangible assets that are difficult to quantify but result in positive outcomes
basic processing technique used to determine counts of information from a database
OLAP
will reduce redundancy in the database
normalization
results in putting data into a consistent structure
scrubbing the data
are not located on a physical server within a corporation
type of database
cloud databases
is the field in a database that links two tables together
foreign key
the process of retrieving data to load into the database
extraction
Which process should a data analyst use to remove missing, misplaced, or duplicate data from a dataset?
Normalization
is the process of removing redundancies in data
normalization
provide a visual representation of data, making it easy to see patterns, trends, and relationships
spreadsheets
used for tracking inventory, project management, budgeting, and various other tasks that require data management
spreadsheets
the average value of a dataset
Mean
can be used to calculate the mean of a range of values
AVERAGE function
middle value of a dataset
Median
can be used to calculate the median of a range of value
MEDIAN function
is the value that occurs most frequently in a dataset
Mode
used to calculate the mode of a range of values
Mode Function
a measure of the amount of variation or dispersion in a dataset
Standard Deviation
can be used to calculate the standard deviation of a range of values
STDEV function
the lowest and highest values in a dataset
Minimum and Maximum
can be used to calculate the minimum and maximum values of a range of values
MIN and MAX functions
What describes an argument in an Excel “IF” statement?
A value used to determine the outcome
Why is it important to conduct data hygiene practices?
Because data become decayed and outdated
An analyst uses software to analyze data in a company’s data warehouse and produce information presented in understandable charts and graphs on a dashboard. This information is used to inform decisions in the organization.
Which software is used to conduct this data mining for presentation of the information?
Business intelligence
is used to simplify raw data into different formats that can be understood using graphs, charts, and numerical analyses
Business intelligence software
Which term describes the field that provides a link between two tables in a relational database table?
Foreign key
is a field in a table that links to the primary key in a different table in a database
foreign key
Which level of security is required to protect the hardware that supports a database?
system-level
A data analyst is using the ETL process to enter data into a company’s relational database. The data contain many redundancies.
Which process transforms the data into an accurate, clean, and error-free form?
normalization
the process of removing redundancies in datqa and can be part of the “transform” stem of ETL
normalization