Module 16 Data Analytics Flashcards
4 key features of big data
- Volume
- Velocity
- Variety
- Veracity
Big data - volume
- The volume of data is beyond the processing power of a simple IT infrastructure
Big data - velocity
Big data can be processed at speed, allowing businesses to change their strategy
Big data - variety
- Big data can be any of a multitude of types of data and can be very far reaching
Big data - veracity
- The information extrapolated from big data must be trustworthy
Uses and outcomes of big data
- Making informed business decisions
- Improving products and/or customer service and improving operating efficiency
- Assisting in identifying weaknesses
Limitations to the implementation of big data analysis
- Cost of implementing
- Compliance and There is no one company which security of data
- Employing the correct people
- Data quality
Data retention policy
- how long data is to be stored for
- how it is to be stored and the security associated with it
Conceptual modelling
- Shows the mapping between information
- Employee’s record will be linked to their national insurance number and all their payslips
Logical modelling
- Describes the actual tables and columns to be used in the system
Physical modelling
Describes the storage of the data
Examples of data base management systems
- Oracle
- IBM DB2
- Microsoft Access
- Microsoft SQL Server
Storage devices
- Hard drives
- Cloud Storage
- Compact Disk (CD)
- Flash memory (USB)
- Digital Versatile Disks (DVD)
- Blu-ray disks
Advantages of edge computing
- Quicker sharing of the data between machines, reducing the latency when using cloud computing
- Increased privacy
- Bandwidth savings
disadvantages to edge computing
- considerable time to develop and implement edge
- once implemented they need to maintain it
- significant capital expenditure
- devices may not be compatible with each other
Data Mining
- Way of identifying patterns or trends from a large data set
- Uses mathematical algorithms to predict likely outcomes based on historical information
- Done using a mix of statistics, Artificial Intelligence and Machine Learning.
The 6 Steps of KDD
- Business Understanding - what the objectives and goals
- Data Understanding - data is collected and explored so it contains the right information
- Data Preparation - data is cleansed and formatted
- Data Mining/Modelling - data is now analysed by the system and any patterns identified
- Evaluation - analysed to see if the results are suitable to meet the business objectives
- Deployment - easily understood by all stakeholders and decisions made
Graphical or pictorial representation of data can be used to help
- See trends and outliers
- Understand the key features of the data set
- Make data understandable even for non-experts in a specific subject area
Consideration when sharing data
- Is the data confidential and how it can be kept secure
- Is the data complete, accurate and unbiased to allow a fair decision to be made
- How is it most appropriate the data is shared – electronically or in a hard copy
- Who needs to see the data and are they aware of any data protection and retention policies
Super Computers
- Powerful
- Expensive
- Fast to process
- Used for massive data manipulation
Mainframes
- Powerful
- Expensive
- Allows many concurrent users
- Operates at high speed
- Typical users are manufacturers, insurance companies and airlines
Servers
- Accommodates simultaneous multiple users
- Used for running networks and internet applications
- Large memory and storage capacities
- Fast and efficient for multiple users
- Susceptible to failure
Microcomputers
- Includes personal computers and workstations
- Common
- Can be understood and easily used by most people
- Can be networked together within an organisation
- Can often break
Portable Computers
- Allow “off-site” working
- Similar capabilities to a microcomputer
Handhelds
- Portable
- Supports basic functions but lacks processing power of more complex
- machines
- User friendly
- Most people have access to a handheld device
- Often cannot perform difficult tasks