L2: Data sources and data collection Flashcards

1
Q

What are the ten common characteristics of big dat?

A

2.3.1 Big
2.3.2 Always-on
2.3.3 Nonreactive
2.3.4 Incomplete
2.3.5 Inaccessible
2.3.6 Nonrepresentative
2.3.7 Drifting
2.3.8 Algorithmically confounded
2.3.9 Dirty
2.3.10 Sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are generally helpful characteristics of big data

A

generally helpful: big, always-on, and nonreactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are generally problematic of big data

A

generally problematic : incomplete, inaccessible, nonrepresentative, drifting, algorithmically confounded, dirty, and sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Traps in big dala analytics

A
  1. Transparency and Replicability -> large datasets also have large biases. ”Junk in = Junk Out”
  2. Use Big Data to Understand the Unknown > existing models were literally good enough
  3. Study the Algorithm -> analytics often come with black box automated processes. Know what is in the box!
  4. Size does matter but it’s not all that matters -> small data can still provide you with information that is not captured by big data. Also big data and analytics go very well together
  5. Real time data processing leaves little room to insure that conditions for 1 are met and increases risks from 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Secondary dat: key questions

A
  1. Where did the data come from?
  2. What are its strengths and weaknesses?
  3. How were variables defined?
  4. What instruments (e.g. questionnaire) were used to collect the data and how good were they?
  5. Who collected the data and how good was that person or organization (i.e., was he/she conscientious, properly trained)?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Scale Complexitiy

A

Scale: Nominal, Ordinal, interval, ration

Complexity: Simple, complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ADVANTAGES OF A RDBMS

A

· Establish a centralized, logical view of data
· Minimizes data duplication (i.e. “redundancy”)
· Promote data accuracy and integrity
· Capacity of database Superior multi-user or concurrent access
· Security
· Retrieve information quickly
Inter-operability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DATABASE TERMINOLOGY

A

Table, Entity, Relation, (similar to an Excel Worksheet)
Row, Record, Instance
Column, Field, Attribute
Primary Key –unique and mandatory
Foreign Key –a cross-reference between tables because it references the primary key of another table
Relationship–created though foreign keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly