Data Science Terms Flashcards
What is data science
It is the combination of business analytical and programming skills that are used to extract meaningful insights from raw data
Deep learning
The application of computational network. Deep learning is a subset of machine learning that trains a computer to perform human-like tasks, such as speech recognition, image identification and prediction making
Artificial intelligence
A set of approaches to enable computer to emulate and thus automatize congnitivr behaviour - often based on learning from data
Machine learning
Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
Benefits of data science
-enable organizations to make better decisions
-enhance operational efficiency, business routines and workflows
-recognize and inform companies of their target audience
Assist the automated aspect of HR
Training set
The dataset used by the machine learning model that will help it to learn its desired task
Testing set
These data are used to measure the performance of the developed machine learning model
Outlier
A data recorded which is seen as exceptional and outside the distribution of the normal input data
Data cleansing
The process of removing redundant data, handling missing data entries and removing, or at least alleviating other data quality issues
Feature
An observable measure of data. E.g height, length data, other terms are also used such as properties, characteristics and attribute instead of feature
Dimensionality reduction
The process of reducing dataset into less dimensions, ensuring that it conveys similar information.
Feature selection
The process of selecting relevant features of the provided data set
Supervised learning
The subset of machine learning that is based on data learning. It can be further distinguished in regression and classification
Unsupervised learning
The subset of machine learning that is based on unlabelled data. Typical unsupervised tasks are clustering and dimensioniallity reduction.
Probability
Quantification of how likely it is that a certain event occurs, or the degree of belief in given proposition