Terms Flashcards
ACID Test
A test applied to data for atomicity, consistency, isolation and durability.
Aggregation
A process of searching, gathering and presenting data.
Algorithm
A mathematical formula or statistical process used to perform analysis of data.
Anomaly Detection
The process of identifying rare or unexpected items or events in a dataset that do not conform to other items in the dataset and do not match a projected pattern or expected behavior. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and actionable information.
Big Data
Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.
Business Intelligence
The general term used for the identification, extraction and analysis of data
Classification Analysis
A systematic process for obtaining important and relevant information about data (metadata) and assigning data to a particular group or class.
Clustering Analysis
The process of identifying objects that are similar to each other and clustering them in order to understand the differences as well as the similarities within the data.
Columnar Database or Column-oriented Database
A database that stores data by column rather than by row. In a row-based database, a row might contain a name, address and phone number. In a column-oriented database, all names are in one column, addresses in another and so on. A key advantage of a columnar database is faster hard disk access.
Comparative Analysis
Data analysis that compares two or more data sets or processes to detect patterns within very large data sets.
Correlation Analysis
A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. A technique for quantifying the strength of the linear relationship between two variables.
Dashboard
A graphical representation of analyses performed by algorithms.
Data
A quantitative or qualitative value. Common types of data include sales figures, marketing research results, readings from monitoring equipment, user actions on a website, market growth projections, demographic information and customer lists.
Data Aggregation
The process of collecting data from multiple sources for the purpose of reporting or analysis.
Data Analyst
A person responsible for the tasks of modelling, preparing and cleaning data for the purpose of deriving actionable information from it.
Data Analytics
Behavioral Analytics: Using data about people’s behavior to understand intent and predict future actions.
Descriptive Analytics: Condensing big numbers into smaller pieces of information. This is similar to summarizing the data story. Rather than listing every single number and detail, there is a general thrust and narrative.
Diagnostic Analytics: Reviewing past performance to determine what happened and why. Businesses use this type of analytics to complete root cause analysis.
Predictive Analytics: Using statistical functions on one or more data sets to predict trends or future events. In big data predictive analytics, data scientists may use advanced techniques like data mining, machine learning and advanced statistical processes to study recent and historical data to make predictions about the future. It can be used to forecast weather, predict what people are likely to buy, visit, do or how they may behave in the near future.
Prescriptive Analytics: Prescriptive analytics builds on predictive analytics by including actions and make data-driven decisions by looking at the impacts of various actions.
Data Architecture and Design
How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: (1) conceptual representation of business entities, (2) the logical representation of the relationships among those entities and (3) the physical construction of the system to support the functionality.
Data Cleansing
The process of reviewing and revising data to delete duplicate entries, correct misspelling and other errors, add missing data and provide consistency.
Data Integration
The process of combining data from different sources and presenting it in a single view.
Data Modelling
A data model defines the structure of the data for the purpose of communicating between functional and technical people to show data needed for business processes, or for communicating a plan to develop how data is stored and accessed among application development team members.
ETL (Extract, Transform and Load)
The process of extracting raw data, transforming by cleaning/enriching the data to make it fit operational needs and loading into the appropriate repository for the system’s use. Even though it originated with data warehouses, ETL processes are used while taking/absorbing data from external sources in big data systems.
Exploratory Analysis
An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions. EDA aims to uncover underlying structure, test assumptions, detect mistakes and understand relationships between variables.
Latency
Any delay in a response or delivery of data from one point to another.
Metadata:
Data about data; it gives information about what the data is about. For example, where data points were collected.
Optimization Analysis
The process of finding optimal problem parameters subject to constraints. Optimization algorithms heuristically test a large number of parameter configurations in order to find an optimal result, determined by a characteristic function (also called a fitness function).
Veracity
Ensuring that data used in analytics is correct and precise
Visualization
A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively. Visuals created are usually complex, but understandable in order to convey the message of data.
Pain Points
Pain points are specific problems faced by current or prospective customers in the marketplace. Pain points include any problems the customer may experience along their journey.