Exam 1 Terms Flashcards

1
Q

Datasets that are too large and complex for businesses’ existing systems to handle utilizing their traditional capabilities to capture, store, manage, and analyze these datasets.

A

Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data approach that attempts to assign each unit in a population into a few categories potentially to help with predications.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A data approach that attempts to divide individuals (like customers ) into groups (or clusters) in a useful way.

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A data approach that attempts to discover associations between individuals based on transactions involving them.

A

Co-occurrence grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The process of evaluating data with the purpose of drawing conclusions to address business questions.

A

Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Centralized repository of descriptions for all of the data attributes of the dataset.

A

Data dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.

A

Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A data approach that attempts to predict a relationship between two data items.

A

Link Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A variable that predicts or explains another variable.

A

Predictor variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data.

A

Profiling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A variable that responds to, or is dependent on, another.

A

Response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A data approach that attempts to identify similar individuals based on data known about them.

A

Similarity matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data that are organized and reside in a fixed field with a record or a file.

A

Structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data that do not adhere to a predefined data model in a tabular format.

A

Unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A system that records, processes, reports, and communicates the results of business transactions to provide financial and nonfinancial information for decision- making purposes.

A

Accounting information system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A special case of primary key that exists in linking tables. (made up of two primary keys in the table that it is linking)

A

Composite primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

An information system for managing all interactions between the company and its current and potential customers.

A

Customer Relationship Management system (CRM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Centralized repository of descriptions for all of the data attributes of the dataset.

A

Data dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A method for obtaining data if you do not have access to obtain the data directly yourself.

A

Data request form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Attributes that exist in relational databases that are neither primary nor foreign keys. Provide business information.

A

Descriptive attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A category of business management software that integrates applications from throughout the business into one system.

A

Enterprise Resource Planning system (ERP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The extract, transform, and load process that is integral to mastering the data.

A

ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A means of storing data in one place, such as in an Excel spreadsheet, as opposed to storing the data in multiple tables, such as in a relational database.

A

Flat line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
An attribute that exists in a relational databases in order to carry out the relationship between two tables. Does not serve as the "Unique Identifier" for each table.
Foreign Key
26
An information system for managing all interactions between the company and its current and potential employees.
Human Resource Management system HRM
27
The second step in the IMPACT cycle; it involves identifying and obtaining the data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.
Mastering the data
28
An attribute that is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table.
Primary key
29
A means of storing data in order to ensure that the data are complete, not redundant, and to help enforce business rules. (Communication and integrations of business processes)
Relational Database
30
An information system that helps manage all the company's interactions with suppliers.
Supply Chain Management system (SCM)
31
The opposite of the null hypothesis, or a potential result that the analyst may expect.
Alternative hypothesis
32
The principle that in any large, randomly produced set of natural numbers, there is an expected distribution of the first, or leading, digit with 1 being the most common.
Benford's Law
33
A data approach similar to regression, but used to test for cause and effect relationships between multiple variables.
Causal modelling
34
A data approach that attempts to assign each unit in a population into a few categories potentially to help with predictions.
Classification
35
A data approach that attempts to discover associations between individuals based on transactions involving them.
Co-occurrence grouping
36
A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.
Data reduction
37
Technique used to mark the split between one class and another.
Decision boundaries
38
An information system that supports decision-making activity within a business by combining data and expertise to solve problems and perform calculations.
Decision support system
39
Tool used to divide data into smaller groups.
Decision tree
40
Procedures that summarize existing data to determine what has happened in the past.
Descriptive analytics
41
Procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark.
Diagnostic analyics
42
An interactive report showing the most important metrics to help users understand how a company or an organization is performing.
Digital Dashboard
43
A numerical value (1 or 0) to represent categorical data in statistical analysis. 1= something, 0= nothing
Dummy variable
44
Used in addition to statistical significance in statistical testing. Demonstrates the magnitude of the difference between groups.
Effect size
45
A measure of variability. Divided into four parts.
Interquartile range (IRQ)
46
A modeling error when the derived model too closely fits a limited set of data points.
Overfitting
47
Procedures used to generate a model that can be used to determine what is likely to happen in the future.
Predictive analytics
48
Procedures that work to identify the best possible options given constraints or changing conditions.
Prescriptive analytics
49
A data approach that attempts to identify similar individuals based on data known about them.
Similarity matching
50
Describe the location, spread, shape, and dependence of a set of observations.
Summary Statistics
51
Approach used to learn more about the basic relationships between independent and dependent variables that are hypothesized to exist.
Supervised approach
52
A discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin.
Support Vector Machines
53
A set of data used to assess the degree and strength of a predicted relationship established by the analysis of training data.
Test data
54
A predictive analytics technique used to predict future values based on past values of the same variable.
Time series analysis
55
Existing data that have been manually evaluated and assigned a class, which assists in classifying the test data.
Training data
56
A modeling error when the derived model poorly fits a limited set of data points.
Underfitting
57
Approach used for data exploration looking for potential patterns of interest.
Unsupervised approach
58
A global standard for exchanging financial reporting information that uses XML.
XBRL
59
One way to categorize quantitative data, as opposed to discrete data. Height, or weight.
Continuous data
60
Made when the aim of your project is to declare or present your findings to an audience. Charts made after the data analysis has been completed.
Declarative visualizations
61
One way to categorize quantitative data, as opposed to continuous data. Whole numbers, like points in a basket ball game.
Discrete data
62
Made when the lines between steps "P" (perform test plan), "A" (address and refine results), and "C" (communicate results) are not as clearly divided as they are in a declarative visualization.
Exploratory visualization
63
The third most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. A quantitative data. No meaningful 0.
Interval data
64
The least sophisticated type of data on the scale of nominal, ordinal, interval and ration. You can only count, group, or take a proportion. Ex: Hair color, gender, and ethic groups.
Nominal data
65
A type of distribution in which the median, mean, and mode are all equal, so half of all the observations fall below the mean and the other half fall above the mean.
Normal distribution
66
The second most sophisticated type of data on the scale of nominal, ordinal, interval and ratio. Can be counted and categorized like nominal data. Gold, silver, and bronze medals. Includes ranking.
Ordinal data
67
The primary statistic used with quantitative data. Calculated by counting the number of items from a group, then dividing that number by the total.
Proportion
68
Categorical data. All you can do with these data is count and group, and some cases, you can rank.
Qualitative data
69
More complex than qualitative data. Can be further defined in 2 ways: interval and ratio. Can have mean, median, STD dev.
Quantitative data
70
The most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. Can be counted, grouped, and the differences between each data point are meaningful like interval data.
Ratio data
71
A special case of the normal distribution used for standardizing data.
Standard normal distribution
72
The method used for comparing two datasets that follow the normal distribution. By using a formula, every normal distribution can be transformed into the standard normal distribution.
Standardization
73