CAP Data Flashcards

Question

Outlier data transformation

Answer 1

Apply a logarithm to the data or, if data is erroneous, discard from data set entirely

Answer 2

Data that provides information about other data

Answer 3

Randomness in a process due to things knowable in principle but not known while conducting an experiment

Answer 4

Database where data is not organized in a manner such that each row of information contains a unique key identifying the row

Answer 5

The truthfulness and provenance of data contained in a warehouse

Answer 6

Sample method where the population is divided into mutually homogeneous and internally heterogeneous subpopulations and a subpopulation is the sample (one-stage) or a simple random sample of a subpopulation is the sample (two-stage)

Answer 7

Find a function that describes common features of training and testing data, then apply function to that data; one such example is Fourier transformation

Answer 8

The speed at which new data is added to the warehouse; the speed at which a user accesses existing data in the warehouse

Answer 9

Job role involving the use of organization's data governance processes to ensure fitness of data elements

Answer 10

Items in the scale are differentiated by name alone

Answer 11

An approach to problem solving involving the collection of data that supports valid, defensible, and supportable conclusions

Answer 12

Items in the scale are differentiated by rank with no degree of difference between items specified

Answer 13

A non-probabilistic version of stratified sampling where judgment or convenience identifies what kinds of individuals are chosen for the sample

Answer 14

Either replacing missing values with placeholder values, removing rows or columns with missing values, or using a statistical method to infer the missing value

Answer 15

Sample method where every individual in the population has the same odds of being chosen for the sample

Answer 16

Database based on organizing data into one or more tables of columns and rows where each row contains a unique key identifying the row

Answer 17

Sample method where individuals are chosen for the sample from an ordered sampling frame; to be used only if population is homogenous

Answer 18

Non-probabilistic sample method where chosen individuals are readily available and convenient

Answer 19

Controls that ensure the data entry meets precise standards

Answer 20

The agreement between independent measurements and the true value of what is measured, primarily comes from systematic error

Answer 21

Information that does not fit into pre-defined repository and is not organized in an easily searchable manner

Answer 22

Sampling method where individuals already in the sample are asked to identify new individuals to add to the sample

Answer 23

Iterative process of determining data needs, reviewing existing data, setting priorities for data, agreeing on roles and responsibilities for collecting the data, producing the data, and determining if data meets all needs

Answer 24

Sample method where the sampling ratio does not follow the population statistics in order to ease binary classification tasks

Answer 25

A way to explore the relationships between explanatory variables and one or more response variables through a design of experiments

Answer 26

The relationship between collection and dissemination of data, technology, public expectation of non-observation of collected data, and legal and political issues surrounding these issues