Week 2: Business Analysis Tools Flashcards
- Learn different business analytics techniques. - Understand the importance of spreadsheet tools for business analytics. - Understand the importance of programming tools in business analytics.
Big Data Analytics (examples)
Data mining, clustering, regression, classification, association analysis, decision tree, neural networks,
statistical analysis, optimization
Text analytics (examples)
Information retrieval, text summarization, sentiment analysis, topic modeling, thematic analysis
Web/mobile analytics (examples)
Web information retrieval, search systems, web crawling, website ranking, search log analysis, smartphone platforms, mobile advertising and marketing, gamification
Social media analytics (examples)
Content-based analytics, Structure-based analytics (or social network analytics), community detection, social influence analysis, Link prediction
Multimedia analytics (examples)
Audio analytics, speech analytics, video analytics, automatic video indexing and retrieval
3 main approaches to business analytics tools
- Spreadsheet tools
- Programming tools
- Proprietary business analytics solutions
Spreadsheet tool (definition)
Interactive software application for structuring, transforming, analyzing, and storing data in rows and columns
Tabular data (definition)
Data that has rows and columns
Comma-separated value [CSV] file
Tabular spreadsheet data that uses commas to separate lines and new lines to separate records
Comma-separated value [CSV] file
Tabular spreadsheet data that uses tabs to separate lines and new lines to separate records
Examples of data cleaning in spreadsheets
- Search and replace
- Sorting and filtering
- Built-in functions
Programming (definition)
process of solving a problem using computer algorithms
Programming language (definition)
formal set of instructions that can be used to produce various kinds of output
Open-source programming tools (definition)
programming tools that are made
freely available, often developed by and for the community
Programming code (definition)
collection of statements written in a particular programming language
Pros of Excel
- Highly accessible functions easily implemented at the point of contact with data
- Functions built into spreadsheet environment (easier to implement)
- Experience with Excel is more comment
Cons of Excel
- Functionality limited for advanced statistics
- Limited number of rows and columns on a worksheet
- Limited number of characters in one cell
Pros of R [programming language]
- Good for data-oriented projects
- Handles very large datasets (big data)
- Large number of ready-made packages
- Data visualization tools built-in
- Developed by data scientists
- Large community support
- Supported by RStudio (integrated development environment) that lacks good competitors and and has no Python equivalent
Cons of R [programming language]
- Steep learning curve
- Less efficient for general computations
- Some inefficiently written packages
Pros of Python
- Growing community of compsci software engineers and programmers
- More opportunities to take advantage of AI
- Flexible
- data analysis can be integrated with website and mobile apps or production database
- Can do other programming tasks besides data analysis
Cons of Python
- Less efficient for statistical computations
- Less visually-appealing data visualization tool
- Fewer packages
Types of variables
Logical, integer, numeric, and character
Logical variable (definition)
contains only two possible values: TRUE or FALSE
(indicator variable or dummy variable)
Integer variable (definition)
contains numbers without decimal points
Numeric variables (definition)
contains numbers with decimal points
Character variables (definition)
contains words that do not have order or numerical meaning
(string variable or text variable)