Lecture 2 Flashcards

1
Q

Data?

A

A collection of facts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data obtained?

A

As the result of experiences, observations, or experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does data consist of?

A

Numbers
Words
Images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data source reliability?

A

Confidence and belief in this data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data content accuracy?

A

The right data for the job

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data accessibility?

A

Can we easily get to the data when we need to?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data security and privacy?

A

Allow people with authority only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data richness?

A

All the required data elements are required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data consistency?

A

Accurately collected and combined/merged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data currency?

A

Up to date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data granularity?

A

The variables be defined at the lowest level of detail for the intended use of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data validity?

A

Match/mismatch between the actual and expected data values of a given variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data relevancy?

A

The variables in the data set are all relevant to the study being conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Structured Data?

A

Targeted for computers to process

Numeric versus Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unstructured/Textual Data?

A

Targeted for humans to process/digest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Semi-Structured Data?

A

XML
HTML
Log files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Categorical Structured Data?

A

Nominal

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Numerical Structured Data?

A

Interval

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Unstructured Data contents?

A

Textual
Multimedia
XML/JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does data preprocessing include?

A

Data consolidation
Data cleaning
Data transformation
Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variables?

A

Dimensional Reduction

Variable Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cases/Samples?

A

Sampling

Balancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data consolidation subtasks?

A

Access and collect the data
Select and filter the data
Integrate and unify the data

24
Q

Data consolidation popular methods?

A

SQL queries
Software agents
Web services
Domain expertise

25
Q

Data cleaning subtasks?

A

Handle missing values in the data
Identify and reduce noise in the data
Find and eliminate erroneous data

26
Q

Data cleaning, handling missing data popular methods?

A

Fill in the missing values with the most appropriate values

27
Q

Data cleaning, identifying and reducing noise in the data popular methods?

A

Identify the outliers in data with simple statistical techniques or with cluster analysis

28
Q

Data cleaning, finding and eliminating erroneous data popular methods?

A

Identify the erroneous values in data, such as odd values, inconsistent class labels, odd distributions

29
Q

Data transformation subtasks?

A

Normalize the data
Discretize or aggregate the data
Construct new attributes

30
Q

Data transformation, normalizing data popular methods?

A

Reduce the range of values in each numerically valued variable to a standard range by using a variety of normalization or scaling techniques

31
Q

Data transformation, discretize or aggregate data popular methods?

A

Convert the numeric variables into discrete representations using range-or-frequency-based binning techniques

32
Q

Data transformation, construct new attributes popular methods?

A

Derive new and more informative variables from the existing ones using a wide range of mathematical functions

33
Q

Data reduction subtasks?

A

Reduce number of attributes
Reduce number of records
Balance skewed data

34
Q

Data reduction, reduction number of attributes popular methods?

A

Principal component analysis
Independent component analysis
Chi-square testing
Correlation analysis

35
Q

Data reduction, reduction of number of records popular methods?

A

Random sampling
Stratified sampling
Expert-knowledge-driven purposeful sampling

36
Q

Data reduction, balancing skewed data popular methods?

A

Oversample the less represented or undersample the more represented classes

37
Q

Statistics?

A

A collection of mathematical techniques to characterize and interpret data

38
Q

Descriptive statistics?

A

Describing the data as it is

39
Q

Inferential statistics?

A

Drawing inferences about the population based on sample data

40
Q

Mean Absolute Deviation?

A

Average absolute deviation from the mean

41
Q

Regression?

A

A part of inferential statistics
The most widely known and used analytics technique in statistics
Used to characterize relationship between explanatory and response variable

42
Q

What can regression be used for?

A

Hypothesis testing

Forecasting

43
Q

Correlation vs Regression?

A

Correlation is a single statistic or data point, where regression is the entire equation with all of the data points that are represented with a line

44
Q

How to develop linear regression models?

A

Scatter plots

Ordinary least squares method

45
Q

Regression Modelling Assumptions?

A
Linearity
Independence
Normality
Constant Variance
Multicollinearity
46
Q

What is a report?

A

Any communication artifact prepared to convey specific information

47
Q

Functions that report can fulfill?

A
To ensure proper departmental functioning
To provide information
To provide the results of an analysis
To persuade others to act
To create an organizational memory
48
Q

What is a business report?

A

A written document that contains information regarding business matters

49
Q

Purpose of business report?

A

To improve managerial decisions

50
Q

Source of business report?

A

Data from inside and outside the organization

51
Q

Format of business report?

A

Text + tables + graphs/charts

52
Q

Distribution of business report?

A

In-print
Email
Portal

53
Q

Steps of business report distribution?

A

Data acquisition -> Information generation -> Decision making -> Process management

54
Q

Types of Business Reports?

A

Metric Management Reports
Dashboard-Type Reports
Balanced Scorecard - Type Reports

55
Q

Data Visualization?

A

The use of visual representations to explore, make sense of, and communicate data

56
Q

Information visualization?

A

Aggregation, summarization, and contextualization of data

57
Q

Types of dimension reduction?

A

Variable Selection
Principle Components
Multi-dimensional scaling