Unit 3 Flashcards
Primary Key
field that uniquely identifies a record. ex: student id or employee id
Foreign Key
Field that is in a relational table, that relates to a primary key of another table
Schema
Blueprint on how the database is constructed.
Big Data
Data come from everywhere—including smartphone metadata, internet usage records, social media activity, computer usage records, and countless other data sources—to be sifted for patterns and trends.
Four Vs of Big Data
Volume (amount), Variety (Various forms), Veracity (quality and trustworthiness), Velocity (speed).
Data mining/Data discovery
examination of huge sets of data to find patterns, connections, outliers and hidden relationships. It is a BI tool used for decision making.
Structed data
resides in fixed formats, well labeled, easily queried and searched.
unstructured data
is unorganized data(social media/big data) cannot be easily read or or processed by a computer. not stored in rows and columns
semi structured data
in between structured and unstructured data, can be read but takes work. example is email.
Big data tools
ETL and Hadoop
Extract
Once you have determined where your data is coming from and where you want it to reside, you can start extracting. usually comes from CRM or ERP
Transform
Once you have extracted data, they need to be transformed to fit into the database table. This may involve removing decimals and dollar signs from financial transactions so it will fit into the structured data table.
Load
Once data are transformed, they are ready to finally be transferred into the data warehouse and data mart. The more often this is done, the more up-to-date analytic reports can be.
Hadoop
an infrastructure for storing and processing large sets of data across multiple servers. Instead of centralized files in one place like a data warehouse or data mart, Hadoop uses a distributed file system that allows files to be stored on multiple servers
Which restriction applies to the data in the primary field of a database?
The primary key has to be unique
The data are full of missing, misplaced, or duplicate data, which the data analyst needs to remove.
Which process can this data analyst use to remove such data?
Normalization
A data analyst wants to use software to look for useful patterns and hidden relationships in this large set of social media data. Which process can be used to look for these patterns and relationships?
Data Mining
A business analyst wants to use the social media data to create and present business intelligence. She will create visualizations that will be used by the executive team. Which tool is appropriate for creating and presenting this business intelligence?
Tableau
scrubbing the data
the process of transforming data into an accurate, clean, and error-free form
Master Data Management (MDM)
is a methodology or process used to define, organize, and manage all the data of an organization that provides a reference for decision-making. Master data management tools can be used to support master data management by removing duplicates, standardizing data, and incorporating rules to prevent incorrect data from entering the system, thus creating an accurate source of master data
Data Governance
is managing the availability, integrity, and security of the data to ensure that the data remain high quality and valid for data analytics. Policies and procedures are established that define the data governance program, such as who has access, who has update capabilities, when and how backups are made and stored, and who administers the policies to ensure that they are followed
Data Management Process
can be defined as acquiring data, making sure the data is valid, and then storing and processing the data into useable information for a business
data management return on investment
(ROI)
value of having the data available to process to make key decisions. ROI is negatively impacted by improper data management
Benefits of data management
predictive analytics, BI, and data analysis
data analytics
patterns, correlations, and hidden data relationships.
Online analytical processing (OLAP)
end users can submit queries against the database to gain insight into data relations such as trend analysis and also to create data models that guide future decisions. A common use of OLAP is the creation of what-if scenarios for budgeting and forecasting.
Business intelligence (BI)
is a set of software and services that turn data into information that helps leaders in an organization make wise decisions.
Topic analytics
enables you to sift through large sets of data and identify the most common and most important topics in an easy, fast, and scalable way. For example, if a customer said, “the barista was friendly,” that would be categorized under the topic “Employee Friendliness.”
Text analytics
sometimes called text mining, is the process of extracting information from written sources such as websites, e-books, and emails and inserting the data into a database to evaluate and interpret relevance or to understand customers’ feedback on products and services.
Business analytics
attempts to make connections between data so organizations can try to predict future trends that may give them a competitive advantage. Business analytics can also uncover computer system inadequacies within an organization
A data analyst wants to search through unstructured data from social media posts to look for useful customer behavior patterns and sentiments. Which type of analytics is appropriate for this task?
Text analytics
descriptive analytics
is the baseline that other types of analytics are built on.
predictive analytics
attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data correlations between one thing and any other things that pertain to it.
=SUM(number1,[number2],…)
Adds all the values selected in the argument.
=AVERAGE(number1,[number2],…)
Calculates the average of all the values selected in the argument.
=MAX (number1,[number2],…)
Finds the maximum value of all the values selected in the argument.
=MIN (number1,[number2],…)
Finds the minimum value of all the values selected in the argument.
=COUNT(value1,[value2],…)
Counts the number of values selected in the argument that contain numbers
arguments
number1,[number2],…
The argument number1 is required and is the first number, cell reference, or range for which you want to calculate what the functions return. The argument [number2] and the rest are optional (use of square brackets indicates these additional arguments are optional), including any additional numbers, cell references, or ranges that you want to calculate what the functions return.
value1,[value2],…
The argument value1 is required and is an item, cell reference, or range. The argument [value2] and the rest are optional (use of square brackets indicates these additional arguments are optional), including any additional numbers, cell references, or ranges.
Boolean logic
is based on only two values, 0 and 1. In spreadsheets, the values 0 and 1 are represented by the values FALSE and TRUE, respectively. Boolean logic and Boolean functions deal with expressions that result in either the value TRUE or the value FALSE. No other outcome is allowed in Boolean formulas or functions.
IF function =IF(logical_test,[value_if_true],[value_if_false])
logical_test~ A test or logical comparison of value that is either TRUE or FALSE.
[value_if_true] ~Used to specify the result of the IF function if the Boolean expression result is TRUE.
[value_if_false] ~Used to specify the result of the IF function if the Boolean expression is FALSE.