L2: Data sources and data collection Flashcards
What are the ten common characteristics of big dat?
2.3.1 Big
2.3.2 Always-on
2.3.3 Nonreactive
2.3.4 Incomplete
2.3.5 Inaccessible
2.3.6 Nonrepresentative
2.3.7 Drifting
2.3.8 Algorithmically confounded
2.3.9 Dirty
2.3.10 Sensitive
What are generally helpful characteristics of big data
generally helpful: big, always-on, and nonreactive
What are generally problematic of big data
generally problematic : incomplete, inaccessible, nonrepresentative, drifting, algorithmically confounded, dirty, and sensitive
Traps in big dala analytics
- Transparency and Replicability -> large datasets also have large biases. ”Junk in = Junk Out”
- Use Big Data to Understand the Unknown > existing models were literally good enough
- Study the Algorithm -> analytics often come with black box automated processes. Know what is in the box!
- Size does matter but it’s not all that matters -> small data can still provide you with information that is not captured by big data. Also big data and analytics go very well together
- Real time data processing leaves little room to insure that conditions for 1 are met and increases risks from 3
Secondary dat: key questions
- Where did the data come from?
- What are its strengths and weaknesses?
- How were variables defined?
- What instruments (e.g. questionnaire) were used to collect the data and how good were they?
- Who collected the data and how good was that person or organization (i.e., was he/she conscientious, properly trained)?
Scale Complexitiy
Scale: Nominal, Ordinal, interval, ration
Complexity: Simple, complex
ADVANTAGES OF A RDBMS
· Establish a centralized, logical view of data
· Minimizes data duplication (i.e. “redundancy”)
· Promote data accuracy and integrity
· Capacity of database Superior multi-user or concurrent access
· Security
· Retrieve information quickly
Inter-operability
DATABASE TERMINOLOGY
Table, Entity, Relation, (similar to an Excel Worksheet)
Row, Record, Instance
Column, Field, Attribute
Primary Key –unique and mandatory
Foreign Key –a cross-reference between tables because it references the primary key of another table
Relationship–created though foreign keys