Introduction to Data Science Challenges Flashcards

1
Q

Finding data

A
  • There may be hundreds or thousands of tables.

* There may be many different entities that are less relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Transforming data

A
  • Reorganizing data, filtering, etc.

* Extracting relevant features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dealing with data

A

Dealing with Big data

Dealing with streaming data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data quality

A

• Data may be incomplete, invalid, inconsistent, imprecise, and/or outdated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Fitting the data

A

Overfitting / underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dealing with concept drift

A

Do Nothing (Static Model) The most common way is to not handle it at all and assume that the data does not change. …

Periodically Re-Fit. …

Periodically Update. …

Weight Data. …

Learn The Change. …

Detect and Choose Model. …

Data Preparation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Making results actionable

A

• Analysis results need to be relevant, specific, novel and clear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ensuring fairness

A

Data science without prejudice: how to avoid

unfair conclusions even if they are true?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ensuring accuracy

A

Data science without
guesswork: how to answer
questions with a guaranteed level of accuracy?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ensuring confidentiality

A

Data science that
ensures confidentiality:
how to answer questions without revealing secrets?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ensuring transparency

A

Data Science that
provides transparency:
how to clarify answers such that they become
indisputable?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ill-posed problems

A

• A problem is well-posed if
− a solution exists and
− the solution is unique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

• Problems in data science are often ill-posed

A

− there may be many possible models explaining observed phenomena

− the (training) data set is just a sample

− there may be noise (exceptional or incorrectly recorded instances) in the data set,

− the result needs to generalize to have predictive or explanatory value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly