IDSA PRELIMS Flashcards
What are new techniques to solve problems?
Data Science & Analytics
What are the different roles in analytics?
Collector/ Data Steward
Business Analyst
Modeler/Data Scientist
Data Engineer
What are the APEC Analytics Competencies?
- Domain Knowledge & Application
- Data Management & Governance
- Operational Analytics
- Data Visualization & Presentation
- Research Methods
- Data Engineering Principles
- Statistical Techniques
- Data Analytics Methods & Algorithms
- Computing
- 21st Century Skills
Who has the best domain knowledge?
Steward
Analyst
Manager
Who has the best data governance?
Steward
Manager
Who has the best operational analytics?
ALL
Who has the best data visualization?
Analyst
Manager
Who has the best research methods?
Scientist
Who has the best data engineering?
Engineer
Who has the best statistical techniques?
Scientist
Who has the best methods and algorithms?
Scientist
Who has the best computing?
Scientist
Who has the best 21st century skills?
ALL
What are the components of the Data Science Skillset?
Substantive Expertise
Math and Sciences Knowledge
Hacking Skills
Substantive Expertise
Traditional Research
Machine Learning
Danger Zone
Data science requires the intersection of what abilities?
Hacking skills
Math and Science Statistics
Substantive Expertise
Necessary for working with massive amounts of electrical data
Hacking skills
Crucial for generating motivating questions and hypotheses and interpreting results
Substantive expertise
Allows a data scientist to choose appropriate methods and tools in order to extract insight from data
Math & Statistics knowledge
Stems from combining hacking skills with math and statistics knowledge, but does not require scientific motivation
Machine learning
Lies at the intersection of knowledge of math and statistics with substantive expertise in a scientific field
Traditional Research
Combined with substantive scientific expertise without rigorous methods can beget incorrect analyses
Danger Zone
Data Science or Data Analytics: Uses big data
Both
Data Science or Data Analytics: Healthcare, gaming, travel, industries with immediate data needs
Analytics
Data Science or Data Analytics: Macro
Science
Data Science or Data Analytics: To ask the right questions
Science
Data Science or Data Analytics: Machine learning, AI, Search engine, engineering, corporate analytics
Science
Data Science or Data Analytics: To find actionable data
Analytics
Data Science or Data Analytics: Micro
Analytics
What is the mother of innovation?
Necessity
What is the goal of report writing?
Automation
What are the goals of a centralized system?
ERP - Enterprise Resource Planning
MIS - Management Info System
Goals: Apps for everyone
Business Intelligence
Where is data science and analytics seen?
Education
Environment
Healthcare
process of knowledge discovery, machine learning and predictive analytics.
Data Mining
Data mining is NOT about?
- Descriptive statistics
- Exploratory visualization
- Dimensional slicing
- Hypothesis testing
- Queries
Data Mining involved extracting ____, building _____ and is a combination of ____, _____, ____ .
- Extracting Meaningful Patterns.
- Building Representative Models.
- Combination of Statistics, Machine Learning, and Computing Algorithms
Types of Learning Models in Data Mining?
Supervised/ Directed
Unsupervised/ Undirected
What model of data mining: generalizes the relationship between the input and output variables.
Supervised
What model of data mining: to find patterns in
data based on the relationship between data points themselves
Unsupervised
DATA MINING: Groups of Learning Models?
- Classification Models (S)
- Regression Models (S)
- Clustering Models (S/US)
- Anomaly Detection (US)
- Time Series Forecasting (US)
- Association (US)
- Text and Sentiment Analysis (US)
DATA MINING: Steps?
Business Understanding Data Understanding
Data Preparation
Modeling
Testing and Evaluation
Deployment
Data cleaning is the process of preparing data for analysis by removing or modifying?`
incorrect, incomplete, irrelevant, duplicated, or improperly formatted data.
Parts of Rapidminer interface
Repository
Canvas
Operators/Analysis tabs
Parameter tabs
Description tabs
How to import data on Rapidminer?
File –> import data
or click the repository tab
Types of data when importing?
polynomial
binomial
real
integer
date_time
date
time
What type of data:
many different string values (for example: red, green, blue, yellow)
polynomial
What type of data: (for example: 23.12.2014 17:59).
date_time
What type of data:
a fractional number (for example: 11.23 or -0.0001).
real
What type of data: (for example 23.12.2014).
Data
What type of data: (for example 17:59).
Time
What type of data: a whole number (for example: 23, -5, or 11,024,768).
Integer
What type of data: exactly two values (for example: true/false, yes/no)
Binomial
After importing data, the data will appear in the ______ tab.
Results
To find the basic statistics of each attributes, click _____.
Statistics
In filtering cases, You may add more criteria by clicking ____.
Add Entry.
In missing value imputation data preparation, Instead of filtering, you may?
remove all cases with missing values, using the condition class, instead of Add Filters.
To impute missing data, in the operator tab, search for ____, then drag and drop on the line connecting the Filtering Examples and the res knob.
Replace Missing Values
In dealing with miscoded data, To remove “white spaces” in the encoding, use the ____ operator.
TRIM
In Dealing with miscoded data, Connect the _____ and the ______.
Out node of the Retrieve Customer operator and second res of the result knob
To remove “duplicates” in the encoding, use the _____ operator.
Remove Duplicates
To recode miscoded values, use the _____operator.
REPLACE
You may impute missing values using _______ operator in other attributes.
REPLACE MISSING VALUES
Use the ______ operator to select the attributes that you need for analysis.
Select Attributes
Set role operator is used when?
to tag the attribute that will be use as the label (Target Variable) or any other role it will act in the analysis.
Join operator is needed when?
If two data sets are needed to be merged in order to make an analysis
Connect the first data set or its result in the (right/left) node of the Join operator and the other data set at the (right/left) node.
Left; right
What are the steps of data preparation in RapidMiner?
- Importing Data
- Data Preparation
- Data Filtering
- Missing Value Imputation
- Dealing with Miscoded Entries
- Selecting and Setting Roles of Attributes
- Combining Data Sets
- Data Cleaning
What is data visualization?
graphical representation of data
techniques used to communicate
insights from data through visual
representation.
What are the objectives of data visualization?
- to distill large datasets into visual
graphics to allow for easy understanding
of complex relationships within the data - to analyze massive amounts of information
and make data-driven decisions.
What are the common visualization techniques?
- Bar Graph
- Line Graph
- Pie Graph
- Histogram
- Scatterplot
- Boxplot
- Heatmap
What Common Visualization Technique: to compare counts, percentage, or other measures (average) for different discrete categories of data
Bar Graph
T or F: Bar Graphs in RapidMiner are aggregated data
T
In creating bar graphs in RapidMiner, Set the ________
and use the _____ function.
Group by Stage;
Average aggregate
What Common Visualization Technique: to observe trend
Line Graph
What Common Visualization Technique: shows the relative contribution that different categories contribute to an overall total
Pie Graph
What Common Visualization Technique: the frequency distribution of continuous attribute
Histogram
Bar graph presents ____ attribute while histogram
represents ____ attribute .
categorical
numerical
T or F: Histograms have spaces in between
F
T or F: In creating a histogram, CHECK the reverse axis to keep the order of the values.
F; do not check
T or F: There can be a histogram for two or more variables
T
What Common Visualization Technique: plots two numerical attributes
Scatterplot
What Common Visualization Technique: graphical representation of the quartiles
Boxplots
What Common Visualization Technique: graphical representation of data where the individual values contained in a matrix (map) are represented as colors.
Heat maps