Lecture 1 Flashcards

1
Q

What is big data?

A

extremely large datasets that have grown enormous sizes beyond the ability to manage and analyze using traditional data processing tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 Data Structures

A

Vector, Matrix, List, Dataframe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vector is

A

one-dimensional data structure that holds elements of the same data type, used in statistical analysis and data modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Matrix?

A

TWO dimensional data structure with rows and columns of data, used for mathematical applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List

A

data structure that hold different data types and can be dynamically resized, used in programming for tasks like building lists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dataframe

A

two-dimensional data structure stores data in tabular format like sheets, include numeric, character and vector. Used in data analysis and manipulation like Python and R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of Data

A

Structured, unstructured, semi-structured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Structured data is

A
  • data that is organized, easily analyzed using traditional tools & technologies.
  • Ex: financial data, sales and customer data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unstructured data

A
  • data that has no specific format
  • variety of formats: text, audio, video
  • more difficult to analyze than structured data
  • ex: social media posts, customer reviews, emails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Semi-structured data

A
  • a type of data that has some structure
  • does not fit neatly into a structure data model
  • example: XML or JSON formats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Levels of Measurement

A

2 Qualitative:
- Nominal
- Ordinal

2 Quantitative:
- Interval
- Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal

A

categories data with no order. Example: gender, male and female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinal

A

Data have ordered categories but no consistent intervals. Example: satissifed - dissatisfied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interval

A

have consistent intervals but an arbitrary zero point. Example: Weather, celcius

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ratio

A

data with consistent intervals and true zero point. Example: weight, height, income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Characteristic of Big Data (6V’s)

A
  1. Volume: amount of data generated and stored in the system
  2. Variety: type of data managed by information system
  3. Velocity: frequency at which data is generated, captured and shared
  4. Veracity: Level of quality, accuracy and uncertainty of data and sources
  5. Value: The value and potential derived from data
  6. Variability: how often this change happens