Chapter 3 Flashcards

1
Q

analytics ready

A

data that has been identified as relevant to the task at hand along with the quality and quantity requirements, displays a consistent structure in terms of key fields and variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

arithmetic mean

A

average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

box-and-whiskers plot

A

displays outliers as extended dots, top and bottom whiskers represent maximum and minimum values (excluding outliers), the box is bounded by the lower and upper quartile, the median is a straight line across the box, and X represents the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

box plot

A

alias of box-and-whiskers plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bubble chart

A

an extension of the scatter plot that uses color and size to add additional information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

business report

A

is an artifact that is generated to convey useful information to decision makers that is derived from any number of data sources using an ETL (extract, transform, load) procedure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

categorical data

A

data represented by dividing a variable into a specific group or label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

centrality

A

an indication of where most of the data fits, using methods such as mean, mode, and median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

correlation

A

makes no a priori assumption of whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate on the degree of association between the variables.

a statistical relation between variables that may indicate some association or connection between these variables. note that correlation does not imply causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

dashboards

A

a visual representation of data designed to be easily digestible with sufficient detail to inform decision making. typically allowing for a way to “drill” down into the information to explore the situation deeper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

data preprocessing

A

(consolidate, clean, transform, reduce)
a procedure designed to make data usable in a data mining scenario.

(1) Consolidation (collect data, select data, integrate data),
(2) Cleaning (impute values, reduce noise, remove duplicates,
(3) Transformation (normalize data, discretize data, create attributes
(4) Reduction (reduce dimension, reduce volume, balance data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data quality

A

The holistic quality of data, including their accuracy, precision, completeness, and relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

data security

A

only those with the proper permissions have access to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

data taxonomy

A

a structure used to define types of data at varying layers of abstraction.

Data [Structured, Unstructured or Semi-structured]

Structured [Categorical, Numerical]
Categorical [Nominal, Ordinal]
Numerical [Interval, Ratio]

Unstructured or Semi-structured [Textual, Multimedia, XML/JSON]
Multimedia [Image, Audio, Video]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

data visualization

A

A graphical, animation, or video presentation of data and the results of data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

datum

A

smallest atomic unit of data, i.e. a single record of facts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

descriptive statistics

A

describing the sample data on hand, typically employs centrality measures (mean, median, mode).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

dimensional reduction

A

removing variables, or reducing columns, variable selection, stage 4 of preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

dispersion

A

a statistical measure of “spread” out the data is, i.e. the degree of variation of a given variable, these include range, variance, and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

high-performance computing

A

a set of techniques including in-memory analytics, in-database analytics, grid-computing, and appliances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

histogram

A

histograms appear as a bar chart however the data is displayed in a way to show the frequency distribution of one or more variables.

22
Q

inferential statistics

A

draws inferences (predictions) or conclusions about the characteristics of the population.

23
Q

key performance indicator (KPI)

A

Measure of performance against a strategic objective and goal.

24
Q

knowledge

A

Understanding, awareness, or familiarity acquired through education or experience; anything that has been learned, perceived, discovered, inferred, or understood; the ability to use information. In a knowledge management system, knowledge is information in action

25
Q

kurtosis

A

a method of assessing the peak-ness or skinny-ness of a distribution

26
Q

learning

A

A process of self-improvement where the new knowledge is obtained through a process by using what is already known.

27
Q

linear regression

A

a method to model the dependence of a variable on one or more independent variables.

28
Q

logistic regression

A

similar to linear regression except it used to classify a categorical variable

29
Q

mean absolute deviation

A

a simpler method to measure dispersion in data (than variance) that is unspecific about the direction of dispersion.

30
Q

median

A

middle value when sorted

31
Q

mode

A

most frequent value

32
Q

nominal data

A

a value that can be reduced to a label. e.g. Marital Status (1: Single, 2: Married, 3: Divorced, etc)

33
Q

online analytics processing

A

An information system that enables the user, while at a PC, to query the system, conduct an analysis, and so on. The result is generated in seconds

34
Q

ordinal data

A

similar to nominal data except there is a natural hierarchy or order e.g. Credit Score (1: low, 2: medium, 3: high)

35
Q

ordinary least squares (OLS)

A

a regression model that creates a line that minimizes squared of the errors.

is a type of linear least squares method for choosing the unknown parameters in a linear regression model

36
Q

pie chart

A

a visual representation of data that should be used only to illustrate relative proportions of a specific measure

37
Q

quartile

A

splits ordered data into 4 quarters such that 25% of values falls in each quartile.

38
Q

range

A

the difference between the largest and smallest values

39
Q

ratio data

A

Continuous data where both differences and ratios are interpretable. The distinguishing feature of a ratio scale is the possession of a nonarbitrary zero value

40
Q

regression

A

A data mining method for real-world prediction problems where the predicted values (i.e., the output variable or dependent variable) are numeric (e.g., predicting the temperature for tomorrow as 68°F)

41
Q

report

A

Any communication artifact prepared with the specific intention of conveying information in a presentable form

42
Q

scatter plot

A

used to explore the relationship between two or three variables, Scatter plots are an effective way to explore the existence of trends, concentrations, and outliers.

43
Q

skewness

A

a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined

44
Q

standard deviation

A

is a measure of the amount of variation or dispersion of a set of value, square root of variance

45
Q

statistics

A

the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data

46
Q

storytelling

A

in relation to BI it is the effective use of reporting tools (graphs, plots, dashboards, etc) in such a way as the data representation suggests a story as an explanation. improves successful interpretation of data

47
Q

structured data

A

data targeted for computer interpretation, flat files that have a well-organized structure (rows as records, columns as variables).

48
Q

time-series forecasting

A

is a sequence of data points of the variable of interest, measured and represented at successive points in time spaced at uniform time intervals.

49
Q

unstructured data

A

data targeted for human interpretation (web searches, audio, video, images, text data)

50
Q

variable selection

A

the process of determining which variables in a data store are relevant and important to the data mining objective.

51
Q

variance

A

a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.

52
Q

visual analytics

A

The combination of visualization and predictive analytics.