Week 1 Continued Flashcards
Cleaning data means removing __________, dealing with _______ data, and resolving _________ data
duplicates, dealing with missing data, and resolving incomplete
Name the 5 ways to handle missing data
- Delete the row
- Replace it with the Mean/Median/Mode
- Create a new Category for missing values (I.E. ‘Unknown’)
- Predict the missing values
- Use an algorithm to produce a estimated result (kNN or Random forest work)
Name the 3 categories of data
Structured, Unstructured, Semi-structured
Explain structured data
Every element shares the same field. Ex: DBs, objects
Explain unstructured data
No common structure. News articles, websites, videos, audio, photographs.
They’re all the same thing but there’s no agreed upon format or rules
Explain semi-structured
Some structure, but it’s not common.
Ex: XML, JSON
Categorical and numerical are ____ ____s
data types
Nominal and ordinal are ___________ data types
Categorical
discrete and continuous are _________ data types
numerical
Example of nominal.
Male or female
Example of ordinal.
Strongly agree, agree, neutral, disagree, strongly disagree
T/F - Ordinal answers must have a gradual order
True
Explain discrete
Values must be distinct and separated, cannot be measured
Ex. # of students in class, # of tickets sold
Explain continuous
Measured, cannot be counted
Ex. Height, salary
Explain Sample vs Population
Sample contains a subset of the population
Population always contains all members of a given group