Unit 1 - What is Data Science? Flashcards

Question 1

Q

Data Science

Answer

A

Data Science is an interdisciplinary field that is concerned with
- collecting,
- preparing,
- processing,
and obtaining insight
from available data.

Question 2

Q

Data mining

Answer

A

Data mining is the process of gaining insights into a data set to recognise hidden patterns, known as pattern recognition.

This is done through analysis and model fitting – trying to find a model that represents the data, or the process that generates the data.

Question 3

Q

CRISP

Answer

A

The cross industry standard process (CRISP) model of the data mining process.

Question 4

Q

6 Phases of the CRISP model

Answer

A

understanding the business scenario that the data mining process will be performed for
understanding the data involved in the task and defining what is and isn’t needed
preparing the data in order to make the data mining task less cumbersome and easier to achieve
fitting a model that performs the required task
evaluating the model using metrics that are suitable to the task in hand (evaluating classification is different to evaluating regression or clustering)
deploying the model to be utilised by the business.

Question 5

Q

System testing

Answer

A

This is testing whether a system is working as intended. It normally investigates the integration of different components and whether there are any issues that might arise due to integrity.

Question 6

Q

4 Types of System Testing Faults

Answer

A

Accidental
Logical
Flow
Implementational

Question 7

Q

Unit testing

Answer

A

Unit testing is testing an individual module or component of a system.

As with system testing, it is normally conducted for logical or implementational error regarding the intended functionality of the unit.

This type of testing is more prevalent and occurs several times in the life of the component, whenever there is a change to its functionality or coding.

Question 8

Q

Model testing

Answer

A

Model testing depends on the task in hand whether it is classification, clustering or regression.

In model testing the prediction performance of the model and its level of accuracy in performing the required task are tested.

Question 9

Q

Discriminative models

Answer

A

Discriminative models can be categorised by addressing their intrinsic capabilities.

This can distinguish between models that are capable of only discriminating between the different classes.

Question 10

Q

Generative models

Answer

A

These are another type of models, capable of generating synthetic data that is likely to come with the tasks being dealt with.

These are more powerful than discriminative models, but they are more difficult to build and often need more computational power.

Question 11

Q

Bayes Theorem

Answer

A

Bayes Theorem is the backbone of Bayesian Statistics and Bayesian models. As opposed to Frequentist Statistics, Bayes Theorem defines the probability (P) of an event (H) conditioned on another event (E) as follows:

P(H | E) = P(E | H) . P(H) | P(E)

Question 12

Q

Precision of an attribute

Answer

A

Precision of a feature (taken from a measurement) is the closeness of measurements to one another.

Question 13

Q

Bias of an attribute

Answer

A

Bias is a systematic variation of measurements from the actual quantity being measured.

Question 14

Q

Stratification

Answer

A

Sampling by maintaining the distribution of the underlying data.

Question 15

Q

Standardisation

Answer

A

Standardisation involves calculating the mean x̄ and standard deviation sx of a feature x.

This is done by looking into the data that resides inside the feature as samples to calculate these statistical measures.

The following transformation is then applied to the data that involves the features x′= (x-x̄)/sx.

The new features x′ that were calculated out of x has a mean 0 and standard deviation of 1, i.e., it is standardised.

Question 16

Q

Rescaling

Answer

Study These Flashcards

A

Rescaling is similar to standardisation, with some differences.

The calculations are x′= (x-xmin)/(xmax - xmin) where xmax and xmin are the maximum and minimum values that x might have, respectively.

This guarantees that the range or scale of the new feature is [0 , 1].

Question 17

Q

Discretisation

Answer

Study These Flashcards

A

A process that converts continuous data attribute values into a discrete form, meaning they are converted into a finite set of intervals and associate each of these intervals with a specific data value.

Question 18

Q

Binarisation

Answer

Study These Flashcards

A

The process of converting attribute values to binary values, i.e. ones that have two values.

Unit 1 - What is Data Science? Flashcards

(18 cards)