Intro to Data and Data Science Flashcards
What is Analysis?
1.2
‘how’ and ‘why’ something happened
performed on past data
What are Analytics?
1.2
Analytics apply logical reasoning to info obtained from analysis
Explores the future and looks for patterns
2 types:
Qualitative and
Quantitative
What are Qualitative Analytics?
1.2
The use of:
intuition
experience and
analysis
to plan the next business move
What are Quantitative analytics?
1.2
The application of formulas and algorithms to numbers gathered from analysis
What is Business Intellegence?
1.4
Process of analysing and reporting historical business data
Preliminary step to predictive analytics
What is Machine Learning?
1.4
Ability of machines to predict outcomes without being programmed to do so
The machines use data to:
- Make predictions
- analyse patterns
- give recommendations
What are advanced analytics?
1.4
all types of analytic processes
Symbolic reasoning is a type of AI that makes an exception and does not use ML and deep learning.
It is based on high-level human-readable representations of problems and logic.
True or False:
Symbolic reasoning is commonly used in practice
1.4
False:
Very rarely used in practice.
5 Primary Columns om the 365 infographic
1.5
traditional data big data business intelligence Applying traditional data science techniques Using ML techniques
What is “Data”
2.0
information stored in a digital format
used for:
a) analysis
b) decision making
2 Types:
a) Traditional
b) Big Data
What is traditional data?
2.0
Data in the form of tables containing numeric or text values;
Data that is structured and stored in databases
What is big data?
2.0
Extremely large data;
It can be in various formats:
- structured
- semi-structured
- unstructured
often characterized by ‘V’ (volume, variety, velocity, etc.)
What is Data Science?
2.0
an interdisciplinary field that combines:
statistical,
mathematical,
programming,
problem-solving, and
data-management tools.
What are Traditional Methods?
2.0
derived from stats and adapted for business
What is Raw Data?
4.1
AKA Primary Data
- cannot be analysed immediately
- accumulated and unorganized. The organization is called data collection
What is Class labelling?
4.1
Labelling the data point to the correct data type
What is data cleansing?
4.1
AKA Data Scrubbing
- Deals with inconsistent data
- -containing typos or missing info
What is data balancing?
4.1
Ensuring the sample gives equal priority to each class
What is Data Shuffling?
4.1
Shuffles data to ensure data is free from unwanted patterns from collection
What is a numerical variable?
4.2
Manipulatable numbers that provide useful information
What is a categorical variable?
4.2
Numbers with no numerical value.
Dates are also considered categorical
What is text data mining?
4.3
The process of deriving valuable, unstructured data from text.
What is data masking?
4.3
data masking conceals the original data with random and false data,
allows you to conduct analysis and keep confidential information in a secure place.
What is a metric?
4.5
a value derived from obtained measures
aims at gauging business performance/progress (has business meaning)
What is a measure?
4.5
simple stats of past performance (no business meaning)
What is a KPI?
4.5
Key Performance indicator
metrics + business objective
What is clustering?
4.7
grouping the data in neighbourhoods to analyse meaningful patterns
What is a time series?
4.7
used in economics and finance
shows the development of certain values over time (i.e. stock prices, sales volume)
What is a model in machine learning
4.9
an algorithm to recognize certain patterns
What is an objective function?
4.9
The specification of a machine learning problem;
a function to be maximized or minimized depending on the task
What is an optimization algorithm?
4.9
Algorithm that compares previous solutions until reaching the reaching the optimal solution
What are the three main types of machine learning?
Supervised
Unsupervised
Reinforcement
What is supervised learning?
4.10
Provides feedback
whether they did ‘good’ or whether they need to improve
Uses labelled data
What is unsupervised learning?
4.10
In this case, the algorithm trains itself
algorithm uses unlabelled data
What is reinforcement learning?
4.10
A reward system is introduced.
maximize a reward (not minimize an error)
What is deep learning?
4.10
modern state-of-the-art approach to machine learning
– leverages the power of neural networks
can be both supervised and unsupervised
Python and R have their limitations. They are not able to address problems specific to some domains. One example is ‘relational database management systems’. In these instances, ______ works best
5.
SQL
Data architect
6
designs the way data will be retrieved processed and consumed
Data engineer
6
processes the data for analysis
database administrator
6
– handles this control of data; works with traditional data
BI analyst
6
performs analyses and reporting of past historical data
BI consultant
6
– ‘external BI analyst’
BI developer
6
performs analyses specifically designed for the company
Data scientist
6
employs traditional statistical methods or unconventional machine learning
techniques for making predictions
Data analyst
6
prepares advanced analyses
Machine learning engineer
6
applies state-of-the-art ML techniques
200,000 lines of data constitute big data – TRUE or FALSE?
FALSE
-It is not just volume that defines a data set as ‘big’
– variety, variability, velocity, veracity and other characteristics play an important role as well
Qualitative analysis such as SWOT are not used for quantitative analysis. Hence, they are not
part of business intelligence –TRUE or FALSE
False