CAP Data Flashcards

1
Q

Line intercept sampling

A

Sampling method where elements in a region are selected if a chosen line segment (transect) intersects the element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Theoretical sampling

A

Sample method where individuals are added to the sample based on results of data already collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-standard values data transformation

A

Identify categories represented by multiple categorical values and replace with a standard value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Principal Component Analysis (PCA)

A

Dimensionality reduction method that uses orthogonal transformation to transform data set into a new coordinate system where first coordinate contains the most variance, second coordinate contains the second-most variance, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data volume

A

The quantity of data stored in a warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

probability proportionate to size sampling

A

Sample method where probability of an individual being chosen for the sample is proportional to the size of its subpopulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Smoothing data transformation

A

Apply a simple moving average or a LOESS regression to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

panel sampling

A

Sample method where individuals randomly chosen for an experiment are asked for information in waves of data collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stratified sampling

A

Sample method where population is divided into subpopulations and individuals are randomly chosen for the sample from these subpopulations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binning data transformation

A

Divide the values of a continuous variable into discrete intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data strategy

A

A plan designed to improve the enterprise’s acquisition, storage, management, sharing, and use of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistical uncertainty

A

Natural randomness in a process that effects each experimental trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Voluntary sampling

A

Sample method where individuals choose to join the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Skewness data transformation

A

Transform the distribution using a function such as a logarithm, a square root, or an inverse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Normalization data transformation

A

Scaling the data to remove differences in magnitude between continuous variables; examples include min-max, z-scores, and decimal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Structured data

A

Information organized into a formatted repository so that its elements are easily searchable by basic algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Interval scale

A

Items in the scale are differentiated by degree of difference with no absolute zero as part of the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data value

A

The worth of data stored in and extracted from a warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Master data

A

Data objects agreed on and shared across the enterprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Precision of measurements

A

The closeness of agreement between independent measurements, primarily comes from random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Ratio scale

A

Items in the scale are differentiated by degree of difference and there is an absolute zero in the scale

22
Q

Data variety

A

The different forms of data contained in a warehouse

23
Q

Box-Cox Transformation

A

A method for transforming non-normal dependent variables into a normal shape

24
Q

Noisy data transformation

A

Use a smoothing function or funnel samples into bins

25
Q

Outlier data transformation

A

Apply a logarithm to the data or, if data is erroneous, discard from data set entirely

26
Q

Metadata

A

Data that provides information about other data

27
Q

Epistemic uncertainty

A

Randomness in a process due to things knowable in principle but not known while conducting an experiment

28
Q

Non-relational database

A

Database where data is not organized in a manner such that each row of information contains a unique key identifying the row

29
Q

Data veracity

A

The truthfulness and provenance of data contained in a warehouse

30
Q

Cluster sampling

A

Sample method where the population is divided into mutually homogeneous and internally heterogeneous subpopulations and a subpopulation is the sample (one-stage) or a simple random sample of a subpopulation is the sample (two-stage)

31
Q

Fitting data transformation

A

Find a function that describes common features of training and testing data, then apply function to that data; one such example is Fourier transformation

32
Q

Data velocity

A

The speed at which new data is added to the warehouse; the speed at which a user accesses existing data in the warehouse

33
Q

Data steward

A

Job role involving the use of organization’s data governance processes to ensure fitness of data elements

34
Q

Nominal scale

A

Items in the scale are differentiated by name alone

35
Q

Design of Experiments (DOE)

A

An approach to problem solving involving the collection of data that supports valid, defensible, and supportable conclusions

36
Q

Ordinal scale

A

Items in the scale are differentiated by rank with no degree of difference between items specified

37
Q

Quota sampling

A

A non-probabilistic version of stratified sampling where judgment or convenience identifies what kinds of individuals are chosen for the sample

38
Q

Missing data transformation

A

Either replacing missing values with placeholder values, removing rows or columns with missing values, or using a statistical method to infer the missing value

39
Q

Simple random sampling

A

Sample method where every individual in the population has the same odds of being chosen for the sample

40
Q

relational database

A

Database based on organizing data into one or more tables of columns and rows where each row contains a unique key identifying the row

41
Q

Systematic sampling

A

Sample method where individuals are chosen for the sample from an ordered sampling frame; to be used only if population is homogenous

42
Q

Accidental sampling; convenience sampling

A

Non-probabilistic sample method where chosen individuals are readily available and convenient

43
Q

Data governance

A

Controls that ensure the data entry meets precise standards

44
Q

Accuracy

A

The agreement between independent measurements and the true value of what is measured, primarily comes from systematic error

45
Q

Unstructured data

A

Information that does not fit into pre-defined repository and is not organized in an easily searchable manner

46
Q

Snowball sampling

A

Sampling method where individuals already in the sample are asked to identify new individuals to add to the sample

47
Q

Data collection strategy

A

Iterative process of determining data needs, reviewing existing data, setting priorities for data, agreeing on roles and responsibilities for collecting the data, producing the data, and determining if data meets all needs

48
Q

Minimax sampling

A

Sample method where the sampling ratio does not follow the population statistics in order to ease binary classification tasks

49
Q

Response surface methodology (RSM)

A

A way to explore the relationships between explanatory variables and one or more response variables through a design of experiments

50
Q

Data privacy

A

The relationship between collection and dissemination of data, technology, public expectation of non-observation of collected data, and legal and political issues surrounding these issues