CAP Data Flashcards

1
Q

Line intercept sampling

A

Sampling method where elements in a region are selected if a chosen line segment (transect) intersects the element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Theoretical sampling

A

Sample method where individuals are added to the sample based on results of data already collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-standard values data transformation

A

Identify categories represented by multiple categorical values and replace with a standard value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Principal Component Analysis (PCA)

A

Dimensionality reduction method that uses orthogonal transformation to transform data set into a new coordinate system where first coordinate contains the most variance, second coordinate contains the second-most variance, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data volume

A

The quantity of data stored in a warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

probability proportionate to size sampling

A

Sample method where probability of an individual being chosen for the sample is proportional to the size of its subpopulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Smoothing data transformation

A

Apply a simple moving average or a LOESS regression to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

panel sampling

A

Sample method where individuals randomly chosen for an experiment are asked for information in waves of data collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stratified sampling

A

Sample method where population is divided into subpopulations and individuals are randomly chosen for the sample from these subpopulations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binning data transformation

A

Divide the values of a continuous variable into discrete intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data strategy

A

A plan designed to improve the enterprise’s acquisition, storage, management, sharing, and use of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistical uncertainty

A

Natural randomness in a process that effects each experimental trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Voluntary sampling

A

Sample method where individuals choose to join the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Skewness data transformation

A

Transform the distribution using a function such as a logarithm, a square root, or an inverse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Normalization data transformation

A

Scaling the data to remove differences in magnitude between continuous variables; examples include min-max, z-scores, and decimal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Structured data

A

Information organized into a formatted repository so that its elements are easily searchable by basic algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Interval scale

A

Items in the scale are differentiated by degree of difference with no absolute zero as part of the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data value

A

The worth of data stored in and extracted from a warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Master data

A

Data objects agreed on and shared across the enterprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Precision of measurements

A

The closeness of agreement between independent measurements, primarily comes from random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Ratio scale

A

Items in the scale are differentiated by degree of difference and there is an absolute zero in the scale

22
Q

Data variety

A

The different forms of data contained in a warehouse

23
Q

Box-Cox Transformation

A

A method for transforming non-normal dependent variables into a normal shape

24
Q

Noisy data transformation

A

Use a smoothing function or funnel samples into bins

25
Outlier data transformation
Apply a logarithm to the data or, if data is erroneous, discard from data set entirely
26
Metadata
Data that provides information about other data
27
Epistemic uncertainty
Randomness in a process due to things knowable in principle but not known while conducting an experiment
28
Non-relational database
Database where data is not organized in a manner such that each row of information contains a unique key identifying the row
29
Data veracity
The truthfulness and provenance of data contained in a warehouse
30
Cluster sampling
Sample method where the population is divided into mutually homogeneous and internally heterogeneous subpopulations and a subpopulation is the sample (one-stage) or a simple random sample of a subpopulation is the sample (two-stage)
31
Fitting data transformation
Find a function that describes common features of training and testing data, then apply function to that data; one such example is Fourier transformation
32
Data velocity
The speed at which new data is added to the warehouse; the speed at which a user accesses existing data in the warehouse
33
Data steward
Job role involving the use of organization's data governance processes to ensure fitness of data elements
34
Nominal scale
Items in the scale are differentiated by name alone
35
Design of Experiments (DOE)
An approach to problem solving involving the collection of data that supports valid, defensible, and supportable conclusions
36
Ordinal scale
Items in the scale are differentiated by rank with no degree of difference between items specified
37
Quota sampling
A non-probabilistic version of stratified sampling where judgment or convenience identifies what kinds of individuals are chosen for the sample
38
Missing data transformation
Either replacing missing values with placeholder values, removing rows or columns with missing values, or using a statistical method to infer the missing value
39
Simple random sampling
Sample method where every individual in the population has the same odds of being chosen for the sample
40
relational database
Database based on organizing data into one or more tables of columns and rows where each row contains a unique key identifying the row
41
Systematic sampling
Sample method where individuals are chosen for the sample from an ordered sampling frame; to be used only if population is homogenous
42
Accidental sampling; convenience sampling
Non-probabilistic sample method where chosen individuals are readily available and convenient
43
Data governance
Controls that ensure the data entry meets precise standards
44
Accuracy
The agreement between independent measurements and the true value of what is measured, primarily comes from systematic error
45
Unstructured data
Information that does not fit into pre-defined repository and is not organized in an easily searchable manner
46
Snowball sampling
Sampling method where individuals already in the sample are asked to identify new individuals to add to the sample
47
Data collection strategy
Iterative process of determining data needs, reviewing existing data, setting priorities for data, agreeing on roles and responsibilities for collecting the data, producing the data, and determining if data meets all needs
48
Minimax sampling
Sample method where the sampling ratio does not follow the population statistics in order to ease binary classification tasks
49
Response surface methodology (RSM)
A way to explore the relationships between explanatory variables and one or more response variables through a design of experiments
50
Data privacy
The relationship between collection and dissemination of data, technology, public expectation of non-observation of collected data, and legal and political issues surrounding these issues