Unit 1 - What is Data Science? Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Data Science

A

Data Science is an interdisciplinary field that is concerned with
- collecting,
- preparing,
- processing,
and obtaining insight
from available data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data mining

A

Data mining is the process of gaining insights into a data set to recognise hidden patterns, known as pattern recognition.

This is done through analysis and model fitting – trying to find a model that represents the data, or the process that generates the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CRISP

A

The cross industry standard process (CRISP) model of the data mining process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

6 Phases of the CRISP model

A
  • understanding the business scenario that the data mining process will be performed for
  • understanding the data involved in the task and defining what is and isn’t needed
  • preparing the data in order to make the data mining task less cumbersome and easier to achieve
  • fitting a model that performs the required task
  • evaluating the model using metrics that are suitable to the task in hand (evaluating classification is different to evaluating regression or clustering)
  • deploying the model to be utilised by the business.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

System testing

A

This is testing whether a system is working as intended. It normally investigates the integration of different components and whether there are any issues that might arise due to integrity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 Types of System Testing Faults

A
  • Accidental
  • Logical
  • Flow
  • Implementational
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unit testing

A

Unit testing is testing an individual module or component of a system.

As with system testing, it is normally conducted for logical or implementational error regarding the intended functionality of the unit.

This type of testing is more prevalent and occurs several times in the life of the component, whenever there is a change to its functionality or coding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model testing

A

Model testing depends on the task in hand whether it is classification, clustering or regression.

In model testing the prediction performance of the model and its level of accuracy in performing the required task are tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discriminative models

A

Discriminative models can be categorised by addressing their intrinsic capabilities.

This can distinguish between models that are capable of only discriminating between the different classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Generative models

A

These are another type of models, capable of generating synthetic data that is likely to come with the tasks being dealt with.

These are more powerful than discriminative models, but they are more difficult to build and often need more computational power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bayes Theorem

A

Bayes Theorem is the backbone of Bayesian Statistics and Bayesian models. As opposed to Frequentist Statistics, Bayes Theorem defines the probability (P) of an event (H) conditioned on another event (E) as follows:

P(H | E) = P(E | H) . P(H) | P(E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Precision of an attribute

A

Precision of a feature (taken from a measurement) is the closeness of measurements to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bias of an attribute

A

Bias is a systematic variation of measurements from the actual quantity being measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stratification

A

Sampling by maintaining the distribution of the underlying data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standardisation

A

Standardisation involves calculating the mean x̄ and standard deviation sx of a feature x.

This is done by looking into the data that resides inside the feature as samples to calculate these statistical measures.

The following transformation is then applied to the data that involves the features x′= (x-x̄)/sx.

The new features x′ that were calculated out of x has a mean 0 and standard deviation of 1, i.e., it is standardised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Rescaling

A

Rescaling is similar to standardisation, with some differences.

The calculations are x′= (x-xmin)/(xmax - xmin) where xmax and xmin are the maximum and minimum values that x might have, respectively.

This guarantees that the range or scale of the new feature is [0 , 1].

17
Q

Discretisation

A

A process that converts continuous data attribute values into a discrete form, meaning they are converted into a finite set of intervals and associate each of these intervals with a specific data value.

18
Q

Binarisation

A

The process of converting attribute values to binary values, i.e. ones that have two values.