Week 2 - Introduction to Data Technologies & Analytical Tools Flashcards
Structured data
easily searchable by basic algorithms
Unstructured data
no pre-defined data model or not organized in a pre-defined manner
Forms of unstructured data
Verbal data
* spoken(eg,acoustic characteristics)
* written(eg,text,symbols)
Non-verbal data
* human (eg, facial or gestural cues)
* non-human (eg, geographic)
Feature of data
- Data is never entirely structured or unstructured
- Should be rather understood as a continuum
- Location of data unit on continuum determined by ease with which structure can be added to each data unit at the time of data collection.
Unstructured Analysis Methods
Highly unstructured data
- text mining
- social network analysis
- sentiment analysis
- machine learning
Structure data analysis method
Highly structure data
* Factor Analysis
* Cluster Analysis
* Linear & Logistic Regression
* Customer Lifetime Value Analysis
* RFM Analysis
Types of Analytical Tools
Dashboard Tools
Spreadsheet Tools
Programming Tools
true or false : Dashboard Tools are suitable for visualizing both structured & unstructured data
True
What is a Dashboard Tools
A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance”
Goal of Dashboard tool
Monitoring the health of a firm
Spreadsheet tools
Interactive software application for structuring transforming analysing and storing data in rows and columns
Dashboard Tools: Benefits
- Help managers and staff make decisions
- Reduce time spent on information retrieval & reports
- Use a single tool to aggregate data from multiple sources
- Reduced need for technical resources
- Potential to supply data or information in real-time
- Ability to implement statistical criteria
- Ability to trigger alerts to individuals and share information
- Immediacy of data analysis with ability to drill-down
- Possibility of adding notes, events and corrective action
Popular Dashboard Tools for Marketing Purposes
Tableau
Google analytics
Bluekai
Power BI
True or False : spreadsheet is particularly suitable for rather unstructured data
False : it is suitable for structured data
True or false : Programming Tools suitable for both structured & unstructured data
true
Programming vs Statistical programming
Programming is the process of solving a given problem using executable computer algorithms, well-defined procedures for solving problems.
Statistical programming is the process of solving data-related problems using executable computer algorithms
One is for solving problem other is data related problem
Programming language
a formal set of instructions that can be used to produce various kinds of data output
Programming Tools
a software package that allows the execution of a programming language
Programming Code
statements written in a particular programming language
Programming Code
statements written in a particular programming language
Statistic Programme or Software
specialized computer programmes which allow for the collection, organisation, statistical analysis, and interpretation of data.
Example of programming tool for accessing data
Relational database management systems:
MySQL
SQL server
Oracle Database
Example of Programming Language for accessing data
SQL
Example for programming language for Analysing Marketing Data
Java
phython
R
Example for programming tools for Analysing Marketing Data
SPSS
R STUDIO
KNIME
Relational Database Management Systems (RDBMS)
A relational database refers to a database that stores data in a structured format, using rows and column (essentially a set of tables)
All RDBMS use the same programming language: WHAT IS IT ?
SQL (Structured Query Language)
Relational database management system makes it easy to locate and access specific values within the database by using what ?
unique key
SQL is useful for what type of data
structured data
Feature of SQL
Lightweight, declarative language, relatively easy to learn
Advantages of SQL
High Speed
No Coding Required:
Disadvantages of SQL
Difficulty in Interfacing
More Features Implemented in Proprietary Way
Most Popular Relational Databases Using SQL
ORCALE DATABASE
Microsoft SQL SERVER
POSRGRE SQL
MY SQL
ORCALE DATABASE
- Most common
- Expensive, full-service option
- Runs across 9 different operating systems
- Supports over 25+ programming languages
- Used by large corporations
Microsoft SQL SERVER
- Only available on Windows computers
- Highly sophisticated queries
- Enterprise-level database
My SQL
- Less complex
- Open-source
- Used for smaller operations and tasks
- Excellent for CMS sites and blogs
Postgre SQL
- Open source
- Uses other programming languages (e.g., Python) in addition to SQL
- Default database for macOS Server
Benefits of RDBMS & SQL for Marketers
● RDBMS stores large amounts of structured data on customers (demographics, consumer behavior, etc.), products, and employees
● SQL helps to access, link, and retrieve valuable customer information to generate strategic insight
● Information in RDBMS can be easily transformed and enriched through SQL in real time
● This insights help firms to make data-driven decisions from the analysis of existing data (e.g., to support marketing campaigns, segment & target customers, monitor employee performance, etc.)
Relational vs non relational database
NOSQL database : A non relaitonal database does not incorporate the table model. Instead data can be stored in a single document file (Resembles a folder). Suitable for semi structured and unstructured data.
SQL database : A relational data base organises data field into defined column. (Resembles a phonebook). Suitable for structured data
NoSQL Databases: Benefits for Marketers
- Flexibility
- Relatively inexpensive
- Affordability
- Accessibility
- Scope
- Effort
NoSQL Databases: Benefits for Marketers
- Flexibility
- Relatively inexpensive
- Affordability
- Accessibility
- Scope
- Effort
Types of No-SQL Databases
Key-Value Stores
Document Database
Wide Column Database
Graph Database
Key-Value Stores (Advantages and Disadvantages)
Advantages
* Scalability, reliability, simplicity, speed
Disadvantages
* Not adequate for complex applications
Document Databases (Advantages and Disadvantages)
Advantages
* Documents can have data with different structures, fast write performance and fast queries
Disadvantages
* Most suitable for data that is document- oriented, but still somewhat structured
Wide Column Stores
Advantages:
* Very efficient in data compression, scalability, fast load & queries
Disadvantages:
* Moderate flexibility, low complexity Dr Ilias Danatzis
Graph Databases
Advantages:
* Completely flexible structure
* Saves the relationships (edges) that connects
data; particularly suitable for relationship- related queries (e.g., social media data)
Disadvantages:
* Variable performance & scalability
Specific Programming Languages & Tools: Python, Advantages and Disadvantages
Advantages
* A growing community that includes computer science software engineers and programmers
* There are more opportunities to take advantage of artificial intelligence (i.e. machine learning)
* Flexibility; e.g.,data analysis can be integrated with website and mobile apps or a production database
* Ready for programming tasks besides analysing data
Disadvantages
Less efficient for statistical computations (it was original built for non-statistical purposes)
* Has less appealing data visualization built in
* Fewer packages
Specific Programming Languages & Tools: R & RStudio Advantages and Disadvantages
Advantages
- Made for data-oriented projects in general
- Handles Big Data (very large datasets)
- Large number of ready-made packages
- Built-in ways to professionally visualize data
- Developed by data scientists, important for marketing analytics
- Large community that provides support through mailing lists, documentation and blogs
- Supported by a well-established programming tool (a.k.a. integrated development environment) called RStudio for which there are no close competitors in R and for which Python has no comparative leader
Disadvantages
- Hard to learn, however steep learning curve
- Less efficient for general computations, sometimes due to inefficiently written package
Specific Programming Languages & Tools: SPSS Advantages and Disadvatages
Advantages
* Very user-friendly
* Easy to learn
* No coding necessary to conduct complex statistical analysis
* In-depth statistical capabilities
* Very good for data manipulation & preparation
* It can easily recode variables and create new variables from existing information
* Can get output in an easy to read form
* Good visualisation of analysis output
* No debugging necessary
* Intuitive command names (mostly)
Disadvantages
* Less used in industry
* Proprietary programming tool (requires
annual licences)
* Less efficient for large datasets
* Limited analysis capabilities for unstructured data
Specific Programming Languages & Tools: KNIME (Advantages and Disadvatages)
Advantages
* Open source
* Easy to use graphical interface
* Clear view and documentation of data processing across all steps
* Large range of statistical tools
* Suitable to analyse unstructured text data
* Integration of machine learning
* Can be extended through R and Python
Disadvantages
* Ability to handle large amounts of data and performance in processing could be better
* Data visualisation not as well developed as other programs