Exam 1 Terms Flashcards

1
Q

Datasets that are too large and complex for businesses’ existing systems to handle utilizing their traditional capabilities to capture, store, manage, and analyze these datasets.

A

Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data approach that attempts to assign each unit in a population into a few categories potentially to help with predications.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A data approach that attempts to divide individuals (like customers ) into groups (or clusters) in a useful way.

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A data approach that attempts to discover associations between individuals based on transactions involving them.

A

Co-occurrence grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The process of evaluating data with the purpose of drawing conclusions to address business questions.

A

Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Centralized repository of descriptions for all of the data attributes of the dataset.

A

Data dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.

A

Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A data approach that attempts to predict a relationship between two data items.

A

Link Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A variable that predicts or explains another variable.

A

Predictor variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data.

A

Profiling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A variable that responds to, or is dependent on, another.

A

Response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A data approach that attempts to identify similar individuals based on data known about them.

A

Similarity matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data that are organized and reside in a fixed field with a record or a file.

A

Structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data that do not adhere to a predefined data model in a tabular format.

A

Unstructured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A system that records, processes, reports, and communicates the results of business transactions to provide financial and nonfinancial information for decision- making purposes.

A

Accounting information system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A special case of primary key that exists in linking tables. (made up of two primary keys in the table that it is linking)

A

Composite primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

An information system for managing all interactions between the company and its current and potential customers.

A

Customer Relationship Management system (CRM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Centralized repository of descriptions for all of the data attributes of the dataset.

A

Data dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A method for obtaining data if you do not have access to obtain the data directly yourself.

A

Data request form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Attributes that exist in relational databases that are neither primary nor foreign keys. Provide business information.

A

Descriptive attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A category of business management software that integrates applications from throughout the business into one system.

A

Enterprise Resource Planning system (ERP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The extract, transform, and load process that is integral to mastering the data.

A

ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A means of storing data in one place, such as in an Excel spreadsheet, as opposed to storing the data in multiple tables, such as in a relational database.

A

Flat line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

An attribute that exists in a relational databases in order to carry out the relationship between two tables. Does not serve as the “Unique Identifier” for each table.

A

Foreign Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

An information system for managing all interactions between the company and its current and potential employees.

A

Human Resource Management system HRM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The second step in the IMPACT cycle; it involves identifying and obtaining the data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.

A

Mastering the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

An attribute that is required to exist in each table of a relational database and serves as the “unique identifier” for each record in a table.

A

Primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

A means of storing data in order to ensure that the data are complete, not redundant, and to help enforce business rules. (Communication and integrations of business processes)

A

Relational Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

An information system that helps manage all the company’s interactions with suppliers.

A

Supply Chain Management system (SCM)

31
Q

The opposite of the null hypothesis, or a potential result that the analyst may expect.

A

Alternative hypothesis

32
Q

The principle that in any large, randomly produced set of natural numbers, there is an expected distribution of the first, or leading, digit with 1 being the most common.

A

Benford’s Law

33
Q

A data approach similar to regression, but used to test for cause and effect relationships between multiple variables.

A

Causal modelling

34
Q

A data approach that attempts to assign each unit in a population into a few categories potentially to help with predictions.

A

Classification

35
Q

A data approach that attempts to discover associations between individuals based on transactions involving them.

A

Co-occurrence grouping

36
Q

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items.

A

Data reduction

37
Q

Technique used to mark the split between one class and another.

A

Decision boundaries

38
Q

An information system that supports decision-making activity within a business by combining data and expertise to solve problems and perform calculations.

A

Decision support system

39
Q

Tool used to divide data into smaller groups.

A

Decision tree

40
Q

Procedures that summarize existing data to determine what has happened in the past.

A

Descriptive analytics

41
Q

Procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark.

A

Diagnostic analyics

42
Q

An interactive report showing the most important metrics to help users understand how a company or an organization is performing.

A

Digital Dashboard

43
Q

A numerical value (1 or 0) to represent categorical data in statistical analysis. 1= something, 0= nothing

A

Dummy variable

44
Q

Used in addition to statistical significance in statistical testing. Demonstrates the magnitude of the difference between groups.

A

Effect size

45
Q

A measure of variability. Divided into four parts.

A

Interquartile range (IRQ)

46
Q

A modeling error when the derived model too closely fits a limited set of data points.

A

Overfitting

47
Q

Procedures used to generate a model that can be used to determine what is likely to happen in the future.

A

Predictive analytics

48
Q

Procedures that work to identify the best possible options given constraints or changing conditions.

A

Prescriptive analytics

49
Q

A data approach that attempts to identify similar individuals based on data known about them.

A

Similarity matching

50
Q

Describe the location, spread, shape, and dependence of a set of observations.

A

Summary Statistics

51
Q

Approach used to learn more about the basic relationships between independent and dependent variables that are hypothesized to exist.

A

Supervised approach

52
Q

A discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin.

A

Support Vector Machines

53
Q

A set of data used to assess the degree and strength of a predicted relationship established by the analysis of training data.

A

Test data

54
Q

A predictive analytics technique used to predict future values based on past values of the same variable.

A

Time series analysis

55
Q

Existing data that have been manually evaluated and assigned a class, which assists in classifying the test data.

A

Training data

56
Q

A modeling error when the derived model poorly fits a limited set of data points.

A

Underfitting

57
Q

Approach used for data exploration looking for potential patterns of interest.

A

Unsupervised approach

58
Q

A global standard for exchanging financial reporting information that uses XML.

A

XBRL

59
Q

One way to categorize quantitative data, as opposed to discrete data. Height, or weight.

A

Continuous data

60
Q

Made when the aim of your project is to declare or present your findings to an audience. Charts made after the data analysis has been completed.

A

Declarative visualizations

61
Q

One way to categorize quantitative data, as opposed to continuous data. Whole numbers, like points in a basket ball game.

A

Discrete data

62
Q

Made when the lines between steps “P” (perform test plan), “A” (address and refine results), and “C” (communicate results) are not as clearly divided as they are in a declarative visualization.

A

Exploratory visualization

63
Q

The third most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. A quantitative data. No meaningful 0.

A

Interval data

64
Q

The least sophisticated type of data on the scale of nominal, ordinal, interval and ration. You can only count, group, or take a proportion. Ex: Hair color, gender, and ethic groups.

A

Nominal data

65
Q

A type of distribution in which the median, mean, and mode are all equal, so half of all the observations fall below the mean and the other half fall above the mean.

A

Normal distribution

66
Q

The second most sophisticated type of data on the scale of nominal, ordinal, interval and ratio. Can be counted and categorized like nominal data. Gold, silver, and bronze medals. Includes ranking.

A

Ordinal data

67
Q

The primary statistic used with quantitative data. Calculated by counting the number of items from a group, then dividing that number by the total.

A

Proportion

68
Q

Categorical data. All you can do with these data is count and group, and some cases, you can rank.

A

Qualitative data

69
Q

More complex than qualitative data. Can be further defined in 2 ways: interval and ratio. Can have mean, median, STD dev.

A

Quantitative data

70
Q

The most sophisticated type of data on the scale of nominal, ordinal, interval, and ratio. Can be counted, grouped, and the differences between each data point are meaningful like interval data.

A

Ratio data

71
Q

A special case of the normal distribution used for standardizing data.

A

Standard normal distribution

72
Q

The method used for comparing two datasets that follow the normal distribution. By using a formula, every normal distribution can be transformed into the standard normal distribution.

A

Standardization

73
Q
A