Lecture 1 Flashcards
What is big data?
extremely large datasets that have grown enormous sizes beyond the ability to manage and analyze using traditional data processing tools
4 Data Structures
Vector, Matrix, List, Dataframe
Vector is
one-dimensional data structure that holds elements of the same data type, used in statistical analysis and data modeling
Matrix?
TWO dimensional data structure with rows and columns of data, used for mathematical applications
List
data structure that hold different data types and can be dynamically resized, used in programming for tasks like building lists
Dataframe
two-dimensional data structure stores data in tabular format like sheets, include numeric, character and vector. Used in data analysis and manipulation like Python and R
Types of Data
Structured, unstructured, semi-structured
Structured data is
- data that is organized, easily analyzed using traditional tools & technologies.
- Ex: financial data, sales and customer data
Unstructured data
- data that has no specific format
- variety of formats: text, audio, video
- more difficult to analyze than structured data
- ex: social media posts, customer reviews, emails
Semi-structured data
- a type of data that has some structure
- does not fit neatly into a structure data model
- example: XML or JSON formats
Levels of Measurement
2 Qualitative:
- Nominal
- Ordinal
2 Quantitative:
- Interval
- Ratio
Nominal
categories data with no order. Example: gender, male and female
Ordinal
Data have ordered categories but no consistent intervals. Example: satissifed - dissatisfied
Interval
have consistent intervals but an arbitrary zero point. Example: Weather, celcius
Ratio
data with consistent intervals and true zero point. Example: weight, height, income