Theory Flashcards

1
Q

What is the impact of faulty data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why the amount of data do not compensate faulty data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the risks of dropping faulty data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data driven management?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the steps of data driven decision making?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why the complexity of data analysis is increasing?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the stages toward data driven culture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the data driven decision making depends on?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the points in the data pipeline that could cause problems to data analysis?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the snowball effect?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the definition of data quality?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the main causes for poor data quality?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When can we say we have data quality?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main problem when transferring data from the operational system to the analysis system?

A

There could be a loss of context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When do companies realize they have problems with data quality?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we need adequate architecture to analyze the data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is data preparation importante?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the foundations of a data driven company?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is data governance and why do we need it?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why do we consider data governance a technical and organizational discipline?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the corporate governance of a company?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is the reflex of corporate governance in IT governance? And where does the data governance stands in this context?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the goals of data governance?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the main components of data governance?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is master data management?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why do we give preference on quality to static data?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is metadata and the importance of it?

A
28
Q

What is the first step of data quality implementation?

A

Creating a reliable data catalog.

29
Q

What is data catalog and its importance?

A
30
Q

What is data catalog and its importance?

A
31
Q

What are the roles and responsibilities of data quality?

A
32
Q

What are the roles and responsibilities of data quality?

A
33
Q

What are the goals of data governance?

A
34
Q

Discuss some of data governance actions

A
35
Q

What are the phases of data quality management?

A
36
Q

Is data quality measurable? How?

A
37
Q

When do we have a correct representation of the data?

A
38
Q

What are some causes of poor data quality describe each of them.

A
39
Q

What are some of the main dimensions of data quality? Which of them depends on the usage, and which do not?

A
40
Q

According to Wang and Strong how can we decide the dimensions?

A
41
Q

What are the most used objective dimensions?

A
42
Q

What is a dimension in data quality? What are the types of them?

A
43
Q

Define the accuracy dimension. What are the types of this dimension?

A
44
Q

What are the approaches for determining syntactic accuracy?

A
45
Q

What are the ways we can measure syntactic accuracy?

A
46
Q

What is the problem semantic accuracy implies? What issues need to be addressed to solve this problem?

A
47
Q

What is the consistency dimension? What are the semantic rules?

A

The consistency dimension captures the violation of semantic
rules defined over (a set of) data items, where items can be tuples
of relational tables or records in a file.

Semantic rules can be:
- integrity constraints
- data edits
- Business rules

48
Q

What are the types of integrity constraint?

A

It is possible to distinguish two main categories of integrity constraints:

  • intrarelation constraints: can regard single attributes (also called domain constraints) or multiple attributes of a relation.
  • interrelation constraints: Interrelation integrity constraints involve attributes of more than one relation.
49
Q

What types of dependencies can be found in integrity constraints?

A

Among integrity constraints, the following main types of dependencies can be considered:

Key dependency. This is the simplest type of dependency. Given a relational instance r, defined over a set of attributes, we say that for a subset K of the attributes, a key dependency holds in r, if no two rows of r have the same K- values.

Example: SocialSecurityNumber can serve as a key in any relational instance of a relational schema Person.
Inclusion dependency.

Inclusion dependency is a very common type of constraint and is also known as referential constraint. An inclusion dependency over a relational instance r states that some columns of r are contained in other columns of r or in the instances of another relational instance.

Example: A foreign key constraint is an example of inclusion dependency, stating that the referring columns in one relation must be contained in the primary key columns of the referenced relation.

Functional dependency. Given a relational instance r, let X and Y be two nonempty sets of attributes in r. r satisfies the functional
dependency X —> Y, if the following holds for every pair of tuples t1 and t2 in r:

If t1.X=t2.X; then t1.Y=t2.Y

50
Q

What is the difference between relaxed and functional dependencies?

A
51
Q

What are data edits?

A
52
Q

What is the timeliness dimension?

A
53
Q

What are the concurrency and volatility Data quality dimensions?

A
54
Q

What is the accessibility dimension? What is the importance of considering it?

A

Accessibility measures the ability of the user to access the data from
his or her own culture, physical status/functions, and technologies
available.

55
Q

What are the redundancy, readability, and usefulness dimensions?

A

Redundancy, minimality, compactness, and conciseness refer to
the capability of representing the aspects of the reality of interest
with the minimal use of informative resources.
Readability, comprehensibility, clarity, and simplicity refer to ease
of understanding and fruition of information by users.
Usefulness, related to the advantage the user gains from the use
of information.

56
Q

How can we classify the data quality dimensions according to Wang and Strong?

A

Intrinsic data quality, capturing the quality that data has on its own. As an example, accuracy is a quality dimension
that is intrinsic to data.
Contextual data quality considers the context where data are used. As an example, the completeness dimension is
strictly related to the context of the task.
Representational data quality captures aspects related to the quality of data representation, e.g., interpretability.
Accessibility data quality is related to the accessibility of data and to a further nonfunctional property of data access,
namely, the level of security.

57
Q

What is the accuracy dimension related to schema quality?

A
  • Correctness with respect to the model concerns the correct use
    of the constructs of the model in representing requirements.
  • Correctness with respect to requirements concerns the correct
    representation of the requirements in terms of the model constructs
58
Q

What is the completeness dimension of schema quality?

A

Completeness measures the extent to which a conceptual
schema includes all the conceptual elements necessary to meet
some specified requirements

59
Q

When is a schema considered minimal?

A

A schema is minimal if every part of the requirements is represented
only once in the schema. In other words, it is not possible to
eliminate some element from the schema without compromising
the information content.
Avoidance of redundancies and normalisation should be applied in
order to guarantee minimality

60
Q

What is the readability dimension related to schema quality?

A

Intuitively, a schema is readable whenever it represents the meaning of
the reality represented by the schema in a clear way for its intended use.
This simple, qualitative definition is not easy to translate in a more formal
way, since the evaluation expressed by the word clear conveys some
elements of subjectivity.
With regard to the diagrammatic representation (i.e., ER diagram),
readability can be expressed by a number of aesthetic criteria that
human beings adopt in drawing diagrams: e.g., crossings between lines
should be avoided as far as possible, graphic symbols should be
embedded in a grid, lines should be made of horizontal or vertical
segments, the total area of the diagram should be minimized.

61
Q

Do data quality dimensions work for semi structured data?

A
62
Q

What is data provenance? What is its relation with data trust worthiness?

A
63
Q

What are the quality acessement techniques?

A
64
Q

How questionair technique is applied? What are its advantages?

A
65
Q

What is the general approach for building questionnaires?

A
66
Q

What is the reason data experts might perceive the data as low quality and the user percieve is as high quality? What about the other way around?

A
67
Q

Why is the average not good to measure the timeliness data quality dimensions?

A

The average is not ideal for measuring timeliness in data quality dimensions for several reasons:

1.	Masking Outliers: The average can be distorted by extreme values (outliers). For instance, if most data is timely but a few records are extremely late, the average may not accurately reflect the general timeliness.
2.	Lack of Distribution Insight: The average doesn’t show how data is distributed. Two datasets could have the same average timeliness but vastly different distributions, where one might have most data on time and a few late, while the other has consistent delays.
3.	Sensitivity to Skew: If the timeliness data is skewed (i.e., not symmetrically distributed), the average will not provide an accurate central tendency. In skewed distributions, the median or percentiles are often better indicators.
4.	No Indication of Variability: The average doesn’t capture variability or consistency in timeliness. For data quality, it’s important to know not just the average timeliness but how often data arrives on time or is delayed.

In practice, medians, percentiles, or histograms are often more effective in assessing timeliness, as they give a clearer picture of both central tendency and the spread of data delays.