Lectures 20-21 - Big Data and Analytics Flashcards
What is Big Data?
- Whatever your size, data is “Big” when there is too much of it for you to handle.
- Usually thought of as so big that you can’t store all of it.
Data stream processing
Data stream processing is technology that enables
o Collection
o Integration
o Analysis
o Visualization
o System integration
That does not disrupt the activity of existing sources, storage, and enterprise systems
5 Common Data Analysis Mistakes
- Large “other” category
- Bar charts that do not start at 0 on the y-axis
- Confusing correlation vs. causation
- Improper use of averages
- Pie charts as a sum of a whole
7 Habits of Highly Effective Big Data Users
- Begin With No End In Mind
- Be Proactive, Pragmatic, Progressive & Persuasive
- Be Technology Toolset Agnostic
- Take Big Data Into The Toilet
- Be Time Sensitive
- Scalable for lots more data
- Above All, Be Holistic
Web 2.0
Web 2.0 – coined in 1999 to describe the move away from static webpages
o Enables users to interact and collaborate (Social Media)
Web 3.0
Web 3.0 – may not be here but is coming
o Internet may behave like a computer assistant
4 Basic Search Tips
- Keep it simple
- Use words most likely to appear on the page
- Describe search with least terms possible
- Describe search with least terms possible
Excluding Terms
• Use the – (minus sign) immediately before terms you wish to exclude.
Terms with Similar Meaning
• Use a tilde (~) to search with other word of similar meaning
Wildcard
• Use the * (wildcard) as a placeholder for any unknown terms
Exact Phrases
• Use “” (double quotations) around a set of words or phrase to search with exact order without any change. (usually not necessary)
The OR Operator
• Type ‘OR’ in CAPS or use the ‘|’ symbol to search specifically either one of a set of words.
Searching within a Specific Website
• Adding ‘site:website’ (such as ualberta.ca, .gov, …etc.) will limit your search results to a specific website.
Advertising on Google
2 Methods
- Google AdWords
* Google AdSense
Google Correlate
- Google Correlate is a tool on Google Trends which enables you to find queries with a similar pattern to a target data series
- The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter