Data Science Terms Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is data science

A

It is the combination of business analytical and programming skills that are used to extract meaningful insights from raw data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Deep learning

A

The application of computational network. Deep learning is a subset of machine learning that trains a computer to perform human-like tasks, such as speech recognition, image identification and prediction making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Artificial intelligence

A

A set of approaches to enable computer to emulate and thus automatize congnitivr behaviour - often based on learning from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Machine learning

A

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Benefits of data science

A

-enable organizations to make better decisions
-enhance operational efficiency, business routines and workflows
-recognize and inform companies of their target audience
Assist the automated aspect of HR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Training set

A

The dataset used by the machine learning model that will help it to learn its desired task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Testing set

A

These data are used to measure the performance of the developed machine learning model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Outlier

A

A data recorded which is seen as exceptional and outside the distribution of the normal input data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data cleansing

A

The process of removing redundant data, handling missing data entries and removing, or at least alleviating other data quality issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Feature

A

An observable measure of data. E.g height, length data, other terms are also used such as properties, characteristics and attribute instead of feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dimensionality reduction

A

The process of reducing dataset into less dimensions, ensuring that it conveys similar information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Feature selection

A

The process of selecting relevant features of the provided data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Supervised learning

A

The subset of machine learning that is based on data learning. It can be further distinguished in regression and classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unsupervised learning

A

The subset of machine learning that is based on unlabelled data. Typical unsupervised tasks are clustering and dimensioniallity reduction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability

A

Quantification of how likely it is that a certain event occurs, or the degree of belief in given proposition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard deviation

A

A measure of how spread out the data values are

17
Q

Type I error

A

False positive output, meaning that it was actually negative but predicted that it was positive

18
Q

Type II error

A

False negative output, meaning that it was actually positive but has been predicted as negative

19
Q

Decision model

A

A model assesses the relationships between the element of provided data to recommend a possible decision for given situation

20
Q

Regression

A

A forecasting technique to estimate the functional difference between input and output variables

21
Q

Cluster analysis

A

A type of unsupervised learning used to portion a set of data records into clusters. Records in a cluster are more similar to those than in another cluster

22
Q

Classification

A

A machine learning approach to categorise entities into pre defined classes

23
Q

Data science related activities

A

-understand the problem
-collect enough data
-processing raw data
-explore the data
-analyze the data
-communicate the results