Exam 1 Terms Flashcards by Collin Eads

Datasets that are too large and complex for businesses’ existing systems to handle utilizing their traditional capabilities to capture, store, manage, and analyze these datasets.

Big Data

How well did you know this?

Not at all

Perfectly

A data approach that attempts to assign each unit in a population into a few categories potentially to help with predications.

Classification

How well did you know this?

Not at all

Perfectly

A data approach that attempts to divide individuals (like customers ) into groups (or clusters) in a useful way.

Clustering

How well did you know this?

Not at all

Perfectly

A data approach that attempts to discover associations between individuals based on transactions involving them.

Co-occurrence grouping

How well did you know this?

Not at all

Perfectly

The process of evaluating data with the purpose of drawing conclusions to address business questions.

Data Analytics

How well did you know this?

Not at all

Perfectly

Centralized repository of descriptions for all of the data attributes of the dataset.

Data dictionary

How well did you know this?

Not at all

Perfectly

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.

Data reduction

How well did you know this?

Not at all

Perfectly

A data approach that attempts to predict a relationship between two data items.

Link Prediction

How well did you know this?

Not at all

Perfectly

A variable that predicts or explains another variable.

Predictor variable.

How well did you know this?

Not at all

Perfectly

A data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data.

Profiling

How well did you know this?

Not at all

Perfectly

A data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

Regression

How well did you know this?

Not at all

Perfectly

A variable that responds to, or is dependent on, another.

Response variable

How well did you know this?

Not at all

Perfectly

A data approach that attempts to identify similar individuals based on data known about them.

Similarity matching

How well did you know this?

Not at all

Perfectly

Data that are organized and reside in a fixed field with a record or a file.

Structured data

How well did you know this?

Not at all

Perfectly

Data that do not adhere to a predefined data model in a tabular format.

Unstructured data

How well did you know this?

Not at all

Perfectly

A system that records, processes, reports, and communicates the results of business transactions to provide financial and nonfinancial information for decision- making purposes.

Accounting information system

How well did you know this?

Not at all

Perfectly

A special case of primary key that exists in linking tables. (made up of two primary keys in the table that it is linking)

Composite primary key

How well did you know this?

Not at all

Perfectly

An information system for managing all interactions between the company and its current and potential customers.

Customer Relationship Management system (CRM)

How well did you know this?

Not at all

Perfectly

Centralized repository of descriptions for all of the data attributes of the dataset.

Data dictionary

How well did you know this?

Not at all

Perfectly

A method for obtaining data if you do not have access to obtain the data directly yourself.

Data request form

How well did you know this?

Not at all

Perfectly

Attributes that exist in relational databases that are neither primary nor foreign keys. Provide business information.

Descriptive attributes

How well did you know this?

Not at all

Perfectly

A category of business management software that integrates applications from throughout the business into one system.

Enterprise Resource Planning system (ERP)

How well did you know this?

Not at all

Perfectly

The extract, transform, and load process that is integral to mastering the data.

ETL

How well did you know this?

Not at all

Perfectly

A means of storing data in one place, such as in an Excel spreadsheet, as opposed to storing the data in multiple tables, such as in a relational database.

Flat line

How well did you know this?

Not at all

Perfectly

An attribute that exists in a relational databases in order to carry out the relationship between two tables. Does not serve as the "Unique Identifier" for each table.

Foreign Key

An information system for managing all interactions between the company and its current and potential employees.

Human Resource Management system HRM

The second step in the IMPACT cycle; it involves identifying and obtaining the data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.

Mastering the data

An attribute that is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table.

Primary key

A means of storing data in order to ensure that the data are complete, not redundant, and to help enforce business rules. (Communication and integrations of business processes)

Relational Database

An information system that helps manage all the company's interactions with suppliers.

Supply Chain Management system (SCM)

The opposite of the null hypothesis, or a potential result that the analyst may expect.

Alternative hypothesis

The principle that in any large, randomly produced set of natural numbers, there is an expected distribution of the first, or leading, digit with 1 being the most common.

Benford's Law

A data approach similar to regression, but used to test for cause and effect relationships between multiple variables.

Causal modelling

A data approach that attempts to assign each unit in a population into a few categories potentially to help with predictions.

Classification

A data approach that attempts to discover associations between individuals based on transactions involving them.

Co-occurrence grouping

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.

Data reduction

Technique used to mark the split between one class and another.

Decision boundaries

An information system that supports decision-making activity within a business by combining data and expertise to solve problems and perform calculations.

Decision support system

Tool used to divide data into smaller groups.

Decision tree

Procedures that summarize existing data to determine what has happened in the past.

Descriptive analytics

Procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark.

Diagnostic analyics

An interactive report showing the most important metrics to help users understand how a company or an organization is performing.

Digital Dashboard

A numerical value (1 or 0) to represent categorical data in statistical analysis. 1= something, 0= nothing

Dummy variable

Used in addition to statistical significance in statistical testing. Demonstrates the magnitude of the difference between groups.

Effect size

A measure of variability. Divided into four parts.

Interquartile range (IRQ)

A modeling error when the derived model too closely fits a limited set of data points.

Overfitting

Procedures used to generate a model that can be used to determine what is likely to happen in the future.

Predictive analytics

Procedures that work to identify the best possible options given constraints or changing conditions.

Prescriptive analytics

A data approach that attempts to identify similar individuals based on data known about them.

Similarity matching

Describe the location, spread, shape, and dependence of a set of observations.

Summary Statistics

Approach used to learn more about the basic relationships between independent and dependent variables that are hypothesized to exist.

Supervised approach

A discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin.

Support Vector Machines

A set of data used to assess the degree and strength of a predicted relationship established by the analysis of training data.

Test data

A predictive analytics technique used to predict future values based on past values of the same variable.

Time series analysis

Existing data that have been manually evaluated and assigned a class, which assists in classifying the test data.

Training data

A modeling error when the derived model poorly fits a limited set of data points.

Underfitting

Approach used for data exploration looking for potential patterns of interest.

Unsupervised approach

A global standard for exchanging financial reporting information that uses XML.

XBRL

One way to categorize quantitative data, as opposed to discrete data. Height, or weight.

Continuous data

Made when the aim of your project is to declare or present your findings to an audience. Charts made after the data analysis has been completed.

Declarative visualizations

One way to categorize quantitative data, as opposed to continuous data. Whole numbers, like points in a basket ball game.

Discrete data

Made when the lines between steps "P" (perform test plan), "A" (address and refine results), and "C" (communicate results) are not as clearly divided as they are in a declarative visualization.

Exploratory visualization

The third most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. A quantitative data. No meaningful 0.

Interval data

The least sophisticated type of data on the scale of nominal, ordinal, interval and ration. You can only count, group, or take a proportion. Ex: Hair color, gender, and ethic groups.

Nominal data

A type of distribution in which the median, mean, and mode are all equal, so half of all the observations fall below the mean and the other half fall above the mean.

Normal distribution

The second most sophisticated type of data on the scale of nominal, ordinal, interval and ratio. Can be counted and categorized like nominal data. Gold, silver, and bronze medals. Includes ranking.

Ordinal data

The primary statistic used with quantitative data. Calculated by counting the number of items from a group, then dividing that number by the total.

Proportion

Categorical data. All you can do with these data is count and group, and some cases, you can rank.

Qualitative data

More complex than qualitative data. Can be further defined in 2 ways: interval and ratio. Can have mean, median, STD dev.

Quantitative data

The most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. Can be counted, grouped, and the differences between each data point are meaningful like interval data.

Ratio data

A special case of the normal distribution used for standardizing data.

Standard normal distribution

The method used for comparing two datasets that follow the normal distribution. By using a formula, every normal distribution can be transformed into the standard normal distribution.

Standardization