Midterm Flashcards
FILTER + REPRESENT
Reorganize your data and take only what you need
The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your questions
Filter and Represent have an iterative nature. How you represent data can influence what you acquire
This stage could lead you back to aquire
ACQUIRE
Locate and download the data from a source
Primary Data
information collected for specific purpose at hand
Secondary Data
information that already exists somewhere, having been collected for another purpose
PARSE
Look through data columns and identify the types and its correctness
Modify columns by splitting if needed
Each piece of data needs to be converted to a useful format
String
a set of characters that forms a word of sentence
Float
a number with a decimal point
Character
a single letter or other symbol
Integer
a number with no fractional part
Alphanumeric
consists of both letters and numbers
Boolean
True or False
MINE
Determine basic descriptors and statistics for your data, categorize it, and figure out the range and spread, as well as partters
Categorize your data into groups such as nutrient fact
Should also start asking questions
Figure out if temporal data needs to be reorganized
Range check is important to see if there are null / na or negative numbers
FILTER + REPRESENT
Reorganize your data and take only what you need
The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your question
Filter & Represent have an iterative nature. How you represent data can influence what you aquire
This stage could lead you back to acquire
CHRTS
categorical, hieratical, relational, temporal, spatial
Categorical
compare categories of quantitative data
Hierarchical
visualize relationships and hierarchies
Relational
charts relations to explore correlations
Temporal
data that happens over time
Spatial
data pertaining to a location
CRITIQUE + REFINE
Get feedback of your charts and refine based on the feedback
This stage could lead you back to acquire, min, or filter & represent
Data Product
translate the records of a data source into an easily understandable format
ex:
Raw vs Processed
Granular vs Summarized
Textual vs Quantitative
Statistic vs Dynamic
Small vs Massie
Structured Data
easily searchable
Unstructured Data
not easily searchable
ex:
audio, video, reviews
Quantitative
numerical data that is either discrete or continuous
Qualitative Data Types
nominal, ordinal
Nominal
label for a field
ex:
M/F, color, names
Ordinal
order matters
Anatomy of a graphic
Chart tile, data label, legend, horizontal axis title, left vertical axis title, category labels
Bar Charts vs Histograms
bar charts are comparing categories while histograms show the pattern of data within a range
Bar Chart
categories don’t have an order
order the bars by length for each comparison
horizontal bar charts for long category labels
Categorical
Clustered Bar Chart
comparison between subcategories
Categorical