Midterm Flashcards

1
Q

FILTER + REPRESENT

A

Reorganize your data and take only what you need

The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your questions

Filter and Represent have an iterative nature. How you represent data can influence what you acquire

This stage could lead you back to aquire

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ACQUIRE

A

Locate and download the data from a source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Primary Data

A

information collected for specific purpose at hand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Secondary Data

A

information that already exists somewhere, having been collected for another purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PARSE

A

Look through data columns and identify the types and its correctness

Modify columns by splitting if needed

Each piece of data needs to be converted to a useful format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

String

A

a set of characters that forms a word of sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Float

A

a number with a decimal point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Character

A

a single letter or other symbol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Integer

A

a number with no fractional part

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Alphanumeric

A

consists of both letters and numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Boolean

A

True or False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MINE

A

Determine basic descriptors and statistics for your data, categorize it, and figure out the range and spread, as well as partters

Categorize your data into groups such as nutrient fact

Should also start asking questions

Figure out if temporal data needs to be reorganized

Range check is important to see if there are null / na or negative numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FILTER + REPRESENT

A

Reorganize your data and take only what you need

The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your question

Filter & Represent have an iterative nature. How you represent data can influence what you aquire

This stage could lead you back to acquire

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CHRTS

A

categorical, hieratical, relational, temporal, spatial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical

A

compare categories of quantitative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hierarchical

A

visualize relationships and hierarchies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Relational

A

charts relations to explore correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Temporal

A

data that happens over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Spatial

A

data pertaining to a location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

CRITIQUE + REFINE

A

Get feedback of your charts and refine based on the feedback

This stage could lead you back to acquire, min, or filter & represent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data Product

A

translate the records of a data source into an easily understandable format

ex:
Raw vs Processed
Granular vs Summarized
Textual vs Quantitative
Statistic vs Dynamic
Small vs Massie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Structured Data

A

easily searchable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Unstructured Data

A

not easily searchable

ex:
audio, video, reviews

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Quantitative

A

numerical data that is either discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Qualitative Data Types

A

nominal, ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Nominal

A

label for a field

ex:
M/F, color, names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Ordinal

A

order matters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Anatomy of a graphic

A

Chart tile, data label, legend, horizontal axis title, left vertical axis title, category labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Bar Charts vs Histograms

A

bar charts are comparing categories while histograms show the pattern of data within a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Bar Chart

A

categories don’t have an order

order the bars by length for each comparison

horizontal bar charts for long category labels

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Clustered Bar Chart

A

comparison between subcategories

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Pictogram

A

use point marks, in the form of symbols or pictures, to represent an associated quantitative count

Categorical

33
Q

Proportional Symbol Chart

A

works best when you have diverse range of quantitative value sizes

Categorical

34
Q

Word Cloud

A

shows the frequency of individual word item

Categorical

35
Q

Matrix Chart and Heat Map

A

displays quantitative values across the intersection of two categorical and or discrete quantitative dimensions

Categorical

36
Q

Histogram and Density Plot

A

displays the frequency and distribution of quantitative measurements across grouped values for data items

Categorical

37
Q

Box and Whisker Plot

A

displays the distribution and shape of quantitative values for different categories

Categorical

38
Q

Pie Chart and Donut Chart

A

how proportions of quantities for different constituent categories make up a whole

Categorical

39
Q

Treemap

A

an enclosure digram providing a hierarchical display that shows how quantitative values for different constituent categorical parts make up a whole

Hierarchical

40
Q

Venn Diagram

A

shows collections of and relationships between multiple sets

Hierarchical

41
Q

Scatter Plot

A

displays the relationship between two quantitative variables for different category items

Relational

42
Q

Bubble Plot

A

displays the relationship between three quantitative variables for different category items

Relational

43
Q

Network Diagram

A

display relationships through the connections between data items

Relational

44
Q

Line Chart

A

shows how quantitative values have changed over time for different categorical items

Temporal

45
Q

Bump Chart/Ribbon Chart/Rank Chart

A

shows how quantitative values have changed over time for categorical items, where the quantitative values are ranking measurement

Temporal

46
Q

Slope Graph

A

shows how quantitative values have changed over two points in time for different category items

Temporal

47
Q

Area Chart

A

shows how quantitative values have changed over time for a single categorical item

Temporal

48
Q

Stacked Area Chart

A

shows how quantitative values have changed over time for multiple categorical items

49
Q

Gantt Chart

A

shows time based intervals for different categorical items

Temporal

50
Q

Instance Chart

A

displays time-based events for different categorical items

Temporal

51
Q

Choropleth

A

displays quantitative values for distinct, definable spatial regions

Spatial

52
Q

Isarithmic Map/Contour Map

A

displays distinct spatial surfaces on a map that shares the same quantitative classification

Spatial

53
Q

Proportional Symbol Map

A

displays quantitative values for locations on a map; ideal for highlighting the magnitude of data at specific locations through varying symbol sizes

Spatial

54
Q

Dot Map

A

displays the distribution of phenomena on a map

Spatial

55
Q

Flow Map

A

the characteristics of movement or connections between phenomena across spatial regions

Spatial

56
Q

Area Categorm

A

displays the quantitative values associated with distinct, definable spatial regions on a map by proportionately distorting (inflating or deflating) the relative size of and, to some degree, shape of the respective regional areas

Spatial

57
Q

Dorling Cartogram

A

displays the quantitative values associated with distinct, definable spatial regions on a map with marks which is proportionally sized to represent the quantitative values

Spatial

58
Q

Grid Map

A

displays the quantitative values associated with distinct, definable spatial regions on map. Each geographic region is represented by a fixed-size uniform shape, sometimes termed a tile. Attributes of color are applied to each rational tile to represent a quantitative measurement

Spatial

59
Q

Projections

A

Preserving local angles, but introducing severe distortions in areas near the poles

Spatial

60
Q

Logarithmic Transformation

A

Useful when data spans multiple orders of magnitude or has skewness (right-skewed)

61
Q

Square Root Transformation

A

Appropriate for moderately skewed data or data with moderate outliers (right-skewed)

62
Q

Reciprocal Transformation

A

Effective when large values disproportionately influence the dataset or right skewed data

63
Q

Squaring/Cubing

A

Effective for left skewed data

64
Q

Currency (Verifying Data)

A

Is the information up to date? When was it collected/published/updated

65
Q

Relevancy (Verifying Data)

A

Is the information suitable for your intended use? Does it address your research question? Is there other (better) information

66
Q

Authority (Verifying Data)

A

Is the information creator reputable and has the necessary credentials? Can you trust the information?

67
Q

Accuracy (Verifying Data)

A

Do you spot any errors? What is the source of the information? Can other data or research support this information?

68
Q

Purpose (Verifying Data)

A

Was the intended purpose of the information collected? Are other potential uses identified

69
Q

Data Type Checking (Data Cleaning)

A

Checking to see if all the data types are the same
ex: all inputs for ages should be integers

70
Q

Range Check (Data Cleaning)

A

Checking to make sure that the information is within a reasonable range
ex: an age shouldn’t be negative, zero or over a hundred

Missing or incorrect values should be replaced with an estimate (median age of the dataset) or as “Missing” or “Unknown”

71
Q

Format Check (Data Cleaning)

A

Making sure the format is uniform

72
Q

Handling Missing Data (Data Cleaning)

A

< 5% of data missing:
delete those entries

make note on how this impacts the data analysis and size

> 5% of the data missing:
Categorical Data should have a placeholder like “Unknown”

Numerical Data: replace the mean of the data

Temporal/Interval Data: User interpolation or a placeholder like “Unknown”

check for patterns of missing data

73
Q

Duplication (Data Cleaning)

A

Making sure that there are no duplicates in your data and getting rid of all entries that are

74
Q

Spelling Check (Data Cleaning)

A

Detect and correct any spelling errors

75
Q

Data Standardization

A

Ensure consistency in text entries, formats, and measurement units

76
Q

Design Principles

A

Trustworth: data should be accurate, consistent, complete, and reliable with no misleading data representation

Accessible: data should be relevant and understandable

Elegant: eliminate the arbitrary and be thorough

77
Q

Interval

A

quantitative data that’s measured on a scale with equal intervals between values

78
Q

Ratio

A

quantitative data and has a true zero point

79
Q

Textual

A

stores any kind of text data