Session 3 Flashcards
Data
- Raw facts
- Just numbers(s) and /or text
- Context is not always provided
Information
- Data with context
- Processed data
- Value-added to data (Summarized, Organized, Analyzed)
Data transformation
- Organizing (re-grouping, clustering)
- Analysing (summarizing, manipulating)
- Contextualizing (giving meaning, sense, setting)
Propreties of information
- Intagibility & non-rivalrousness
- Multiple re-uses
- Use/exploitation is capability-based
- Information asymmetry
Why does digital ecosystems providers exploit information asymmetries ?
Because that imbalance means that the side with more, better or unique information enjoys a competitive advantage over others.
Structured data
Follows strict norms and conventions which make it easy to store and query
Unstructured data
Do not follow strict norms and conventions, which make it harder to store and query
What are the 2 types of Quantitative data ?
- Discrete data
- Contininuous data
Discrete data
- The number of students in a class
- The number of workers in a company
- The number of home runs in a baseball game
Continuous data
- The height of children
- The square footage of a two-bedroom house
- The speed of cars
Qualitative data
Descriptive or non-numerical data used to describe and understand characteristics, attributes, or experiences
What are the 2 types of Qualitative data ?
- Nominal data
- Ordinal data
Nominal data
Categorical data that has no inherent order or rank. It can take on any value from a set of categories, but the categories themselves have no specific meaning or order (ex: gender, hair color, ethinicty)
Ordinal data
Categorical data that has a natural order or ranking. Unlike nominal data, ordinal data categories have a specific meaning and order (ex: first, second and third; letter grades; economic status)
Types of data ownership:
- Organizational data
- Public or open data
- Personal data
Organizational data
Data that is privately owned by companies (Transaction data, Social media data, Clickstream data, System logs)
Public or open data
Data that is publicly available and accessible by everyone
Personal data
Data belonging or private to individual / person. Data that can be used to identify person (date of birth, fingerprints, DNA, income, address, health information)
Relational Databases
Data model based on storing data as an organized collection of interrelated tables containing data
Data warehouse
Digital storage system that connects large amounts of structured organizational data from many different sources : store current and historical data in one place (single source of truth for an organization)
Data lake
Digital storage system that conncets large amouths of unstructured data from many different sources
Metadata
“Data about data”
Business metadata
Adds context to your data
Technical metadata
Describes how to access data - including how it is structured
Big Data
Data sets that are so large or complex that it is difficult to process using traditional database management tools or data processing application
Main challenges of Big Data
- Capture
- Storage
- Search
- Analysis
- Transfer
- Visualization
- Data privacy
Big data puts a focus on what ?
- Unstructured data (social media, audio and video streams, clickstreams)
- Data storage (“the cloud”)
- Data access (via APIs)
Thanks to which technology big data is possible ?
Machine learning and AI
5 characteristics of Big Data
- Volume
- Variety
- Variability
- Velocity
- Veracity
- Value
Who is generating Big Data ?
- Media
- Cloud
- Web
- IoT
- Databases
Big Data (Volume)
A lot of data (around 10 sextillion)
Big Data (Variety)
Different forms of data, genreated and stored in different ways
Big Data (Variability)
Variability refers to the changing nature of the form of data and the way it is used over time.
Big Data (Velocity)
The speed at which data is generated and processed is very fast (social media, websites, AI bots)
Big Data (Veracity)
Veracity refers to the Truthfulness, accuracy, and reliability of data being collected, stored, and analyze -> Big Data is messy, noise, and error-prone
Big Data (Value)
The worth or usefulness of data for a particular purpose (supporting research, driving innovation and revenue, reducing costs)
Valuable uses of Big Data
- Tracking consumer behaviour and shopping habits to deliver hyper-personalized retail product recommendations
- Monitoring payment patterns to detect fraud in real time
Data Storage
Methods and technologies used to store and retain digital information (traditional or newer methods)
Traditional methods
- Hard Disk Drive (HDD)
- Solid-state drive (SSD)
Newer methods
Cloud computing
Cloud Storage
A method of storing data on remote servers accessed over the internet, rather than on a local computer system or physical storage device.
What is the CSP ?
Cloud Service Provide
Benefits of cloud storage
- Scalability
- Accessibility
- Cost-effectiveness
Where is the data in the cloud stored ?
Data Centers
What are the 3 types of Cloud Computing ?
- laaS (Infrastructure as a Service)
- PaaS (Platform as a Service)
- SaaS (Software as a Service)
What are the 3 main cloud computing companies ?
- Amazon web services
- Azure
- Google Cloud