Lecture 1 Flashcards by Marvel Itemuagbor

Data

A set of discrete, objective facts about events

How well did you know this?

Not at all

Perfectly

Dataset

a collection of data with a defined structure

How well did you know this?

Not at all

Perfectly

Data point

a single instance in the dataset

How well did you know this?

Not at all

Perfectly

Attribute

A single property of the dataset

How well did you know this?

Not at all

Perfectly

Data science

a collection of techniques used to extract value from data

process of building a representative model that fits the observational data

How well did you know this?

Not at all

Perfectly

Model

representation of a relationship between variable in a dataset

How well did you know this?

Not at all

Perfectly

modeling

process in which a representative abstraction is built from the observed dataset

How well did you know this?

Not at all

Perfectly

Data science model serves two purposes

it predicts the output (interest rate) based on the new and unseen set of input variables
the model can be used to understand the relationship between the output variable and all the input variables

How well did you know this?

Not at all

Perfectly

techniques used in the steps of a data science process

- descriptive statistics 
exploratory visualization 
dimensional slicing 
hypothesis-testing 
data engineering 
business intelligence

How well did you know this?

Not at all

Perfectly

Supervised model

supervised data science tries to infer a function or relationship based on labeled training data and uses this function to map new unlabeled data

How well did you know this?

Not at all

Perfectly

Unsupervised data

uncovers hidden patterns in unlabeled data

How well did you know this?

Not at all

Perfectly

Classification and regression techniques

predicting a target variables based on input variables

How well did you know this?

Not at all

Perfectly

Clustering

the process of identifying the natural groupings in a dataset

How well did you know this?

Not at all

Perfectly

recommendation engines

the systems that recommend items to the users based on individual user preference

How well did you know this?

Not at all

Perfectly

anomaly or outlier detection

identifies the data points that are significantly different from other data points in a dataset

How well did you know this?

Not at all

Perfectly

time-series forecasting

Study These Flashcards

the process of predicting the future value of a variable based on past historical values that may exhibit a trend and seasonality

text mining

Study These Flashcards

a data science application where the input data is text which can be in the form of documents, messages, emails or web pages

feature selection

Study These Flashcards

A process in which attributes in a dataset are reduced to a few attributes that really matter

association analysis

Study These Flashcards

identifying pairs of items that are purchased together, so that specific items can be bundled or placed next to each other

deep learning

Study These Flashcards

increasingly used for classification and regression problems

Big data

Study These Flashcards

High-volume, high-velocity, and or high variety information that requires new forms of processing to enable enhanced decision making, insight discovery and process optimization

Big data characteristics (5vs)

Study These Flashcards

Volume, velocity, variety, veracity, and value

volume

Study These Flashcards

increase in data size coming from infinite sources

velocity

Study These Flashcards

increase in the speed of input and output data and the ability to quickly incorporate new data
ability to quickly add new data sources

Variety

increasing the range of diversity and data structure - structured data, - semi-structured data, - unstructured data

Veracity

valid and truthful data that provides the right direction for future decisions and actions - data freshness - quality dimensions (challenges) - trust, quality& validity of data

Value

data that has high veracity provides higher value | - usefulness of data for an enterprise

Data science tends to fall into three broad categories

investigating, predicting, and optimizing

Data science tasks

``` regression clustering association analysis anomaly detection recommendation engines deep learning time series forecasting text mining feature selection classification ```

Lecture 1 Flashcards

(29 cards)