Week 2 Flashcards

1
Q

Data Science

A

the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge and formulate actionable results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Business Intelligence

A

strategies and technologies used by enterprises for the data analysis of business information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CRISP-DM

A

provides useful input on ways to frame analytics problems and is popular approach for data mining. Six steps include: business understanding, data understanding, data preparation, modeling, evaluation and deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Framing a Decision

A

outline what decision is being considered, why it is important, what data is need, who will provide input. Business Understanding, Data Understanding and Data Preparation of CRISP-DM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Analyzing a Decision

A

what kind of analytical approach is needed, what. does it show, what does it mean. Modeling in CRISP-DM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Implementing a Decision

A

how do I make use of the decision, what can I expect, what else should be considered, how do I “sell” the result. Evaluation and Deployment in CRISP-DM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Modeling Blocks

A
  1. Data, 2. Build Model, 3. Inter hidden variables, 4. Predict & Explore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpretation Error and Inconsistencies

A

Taking the value in your data for granted and difference between data sources and company’s standardized values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cleansing Data

A

Interpretation and Inconsistencies. Data Entry Errors, Redundant Whitespace, Fixing Capital Letter Mismatching, Outliers, Dealing with Missing Values, Different Units of Measurement, Different Level of Aggregation, Deviation for a Cook Book, Impossible values and Sanity Checks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Integrating Data

A

Combining data from different data sources. Joining/Appending Data, Appending Tables, Using Views to Simulate Data Joins and Appends, Enriching Aggregated Measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transforming Data

A

making data into a certain shape for models. Reducing the number of variables, turning variables into dummy variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Retrieval

A

data stored within the company, data outside organization and data quality checks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Preparation

A

fix problems in the data; create derived variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Exploratory Data Analysis

A

the use of graphical techniques to gain an understanding of your data and the interactions between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Joining

A

enriching an observation from one table with information from another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Appending/Stacking

A

Adding the observations of one table to those of another table.

17
Q

Dummy Variables

A

Can only take two values true(1) and false(0). Used to indicate the absence of a categorical effect that may explain the observation.

18
Q

Unsupervised Learning

A

Algorithm does not have past data cases, with inputs and output of interest identified. Algorithm “attempts” to learn something interesting about the data.

19
Q

Data Partitioning

A

Training 60%, Validation 30%,

Test 10%.

20
Q

Technical Data Scientist

A

designs solution from scratch.

21
Q

Business Data Scientist

A

monitors the solution from scratch. Not as knowledgeable as a Technical Data Scientist.

22
Q

Databases

A

structured with defined schema. Items are organized as a set of tables with columns and rows. Transactional.

23
Q

Data marts

A

stores data from data warehouse. Subject-oriented, partitioned segment of an enterprise data warehouse.

24
Q

Data Warehouses

A

exists on top of databases and used for business intelligence. Consumes data from databases and creates a layer optimized to perform data analytics. Schema is done on import.

25
Q

Data Lakes (Big Data)

A

centralized repository of structured/unstructured data. Store raw data without structure(schema). No ETL or transformation jobs.