Data Analysis Concepts Flashcards
What are the Data Analysis phases (in this course)?
AP-PASA
Ask: understand problem, goals, stakeholders - plan project
Prepare: get the data
Process: clean, organize transform
Analyze: explore, visualize, stats
Share: communicate, report, story
Act: solutions
What is Data Analysis
Turning data into insights for informed action - reduce risk of wasted efforts
What is the SMART question methodology?
Specific: simple, focused
Measurable: quantifiable
Action-oriented: Encourage chage
Relevant
Time-bound
What’s the difference between a Data Analyst, Data Engineer, & Data Scientist?
Analyst: answers questions with existing data – SQL, spreadsheets, DB’s, BI, dashboards
Engineer: turn raw data into actionable pipelines
Scientist: creates new ways of modeling and using data
What’s the difference between data-driven vs data-inspired decision-making?
Data-driven: using facts to guide strategy… requires quality & quantity… over-reliance can result in historical bias, ignoring qualitative insight
Data-inspired: adds in other sources of info - feelings/experience, difficult to measure qualities, related concepts
Quantitative vs Qualitative data: explain differences and give examples
Quantitative: specific & measurable. Often gives WHAT of a problem.
* Structured interviews, surveys/polls
Qualitative: subjective or explanatory - can’t be quanitified. Often gives WHY of a problem.
* Focus groups, social media text/review analysis, in-person interviews
Powerful when combined
Report vs Dashboard: explain differences, strengths/weaknesses
Report: Static, distributed periodically
+/- High level, historical
+. Quick to build, easy IF maintained
+. Static data - no cleaning
-. Continual mainteance
-. Less interactive
Dashboard: Real-time data, multiple datasets in one place
+. Dynamic, automated, interactive
+. User exploration
-. Labor-intensive design
-. Can be confusing/overwhelming (requires training)
-. More initial effort, and may need fixes
-. Potentially unclean data
3 Types of Common Dashboard focus
Strategic: long term goals - highest level metrics over time frame
Operational: short-term performance and goals (most common - real-time status)
Analytical: datasets and mathematics
Small Data vs Big Data: define and explain the differences in use
**Small Data: ** specific, short time-period, day-to-day decisions
- usually spreadsheets
- small/mid-size businesses
- simple to collect, store, manage, sort, visualize
- usually manageable size for analysis
**Big Data: ** larger, less-specific, longer time period, big decision
- usually database, queried
- larger businesses
- takes effort to collect, store, manager, sort, visualize
- usually needs to be broken down for analysis
- often more data than needed - challenge is to sift for gems
What is structured thinking?
Process - recognize problem, organize availble info, reveal gaps/opportunities, identify options for action.
Scope of Work & Statement of Work: Define and explain difference
Scope of Work: agreed upon timeline, including deliverables, milestones, and reports
Statement of Work: identifies products/services vendor or contractor will provide an organization (objectives, guidelines, deliverables, schedule, cost)
What are the W questions to explore possible bias in data?
**Who: **person/organization who collected/funded
**What: **things in world the could have impacted
**Where: **origin of data
**When: **time data was created/collected
Why: motivation behind creation/collection
**How: **methods used to create/collect
Important to include context/possible bias when presenting/reporting data
What are some tips when dealing with Executive team stakeholders?
- Strategic
- Headlines first
- Limited time
- Details in appendix
1st vs 2nd vs 3rd party data sources: what’s the difference?
First Party: collected by individual/group themselves for own use
Second Party: collected by a group from its own audience, then sold
Third Party: collected by outside sources who didn’t collect it directly themselves - requires more checking
Discrete vs Continuous?
Ordinal vs Nominal?
Internal vs External?
Discrete: whole numbers only
**Continuous: **any numeric value
Ordinal: qualitative data with set order/sequence
Nominal: qualitative data with no order/sequence
**Internal: **lives in org’s systems (more reliable, easier to collect)
**External: **lives outisde org’s systems