Lecture 2 Flashcards
Data?
A collection of facts
How is data obtained?
As the result of experiences, observations, or experiments
What does data consist of?
Numbers
Words
Images
Data source reliability?
Confidence and belief in this data source
Data content accuracy?
The right data for the job
Data accessibility?
Can we easily get to the data when we need to?
Data security and privacy?
Allow people with authority only
Data richness?
All the required data elements are required
Data consistency?
Accurately collected and combined/merged
Data currency?
Up to date
Data granularity?
The variables be defined at the lowest level of detail for the intended use of the data
Data validity?
Match/mismatch between the actual and expected data values of a given variable
Data relevancy?
The variables in the data set are all relevant to the study being conducted
Structured Data?
Targeted for computers to process
Numeric versus Categorical
Unstructured/Textual Data?
Targeted for humans to process/digest
Semi-Structured Data?
XML
HTML
Log files
Categorical Structured Data?
Nominal
Ordinal
Numerical Structured Data?
Interval
Ratio
Unstructured Data contents?
Textual
Multimedia
XML/JSON
What does data preprocessing include?
Data consolidation
Data cleaning
Data transformation
Data reduction
Variables?
Dimensional Reduction
Variable Selection
Cases/Samples?
Sampling
Balancing