de volgende stukje over data volgens mij Flashcards
Big data
these data sets with volumes so huge that they are beyond the ability of typical DBMS (data base management systems) to capture, store and analyze.
- Massive sets of unstructured / semi structured data from web traffic, social media, sensors and so on.
- Can reveal more patterns and relationships, but requires new tools and technologies to manage and analyze
van data
- Volume
- Variety
- Velocity
- Veracity
-scale of data
-different forms of data
-analysis of data
-uncertainty of data
Hadoop
collection of inexpensive computers. It breaks big data down and distributes it in to 1000 of inexpensive computers and then combines result into smaller data sets which are easier to analyze.
Technology impact on business firms
- Every company can use internet technology, making it easy for rivals to compete and for new competitors to enter the market.
- Because information is available to everyone, the internet raises the bargaining power of customers, who can quickly find the lowest- cost provider on the web.
- Internet nearly distorted some industries and has threatened more.
- Internet created also new markets and provided new opportunities for building brands with very large and loyal customer bases.
Tools facilitating big data analysis
- Data warehouse
A database that stores current and historical data of potential interest to decision makers throughout the organisation.
Three types of information stores: company historical data, company actual data, relevant external data.
Data marts
Subset of data warehouse in which a summarized or highly focused portion of the organisations data is placed in a separate database for a specific population of users. - HADOOP (very large volume of data)
Hadoop enables distributed parallel processing of big data across inexpensive computers - In memory computing
Relies on computer main memory (RAM) for data storage: faster and more predictable outcome. - Analytical platforms
Full featured technology solution. Joins different tools and analytical systems together. Designed for high speed analysis of large data sets.
OLAP (online analytical processing)
tool that enables users to view the same data in different ways using multiple dimensions. (product, pricing, cost, region or time period)
Data mining
discovery driven. Finds hidden patterns in customer buying behavior to predict future behavior of customers. (type of information obtainable from data mining includes associations, sequences, classifications, clusters and forecasts).
- Associations
Occurrences linked to a single event
- Sequences
events that are linked over time
- Classifications
recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules.
- Clustering
works in a manner of similar to classifications when no group yet been defined.
- Forecasting
used predictions in a different way than the other ones, it uses a set of existing values to forecast what other values will be.
- Text mining
Able to extract key elements from unstructured big data sets, discover patterns and trends. (+ summarize the information) (structured data)
- Sentiment analysis
Mine text comments in mails etc. to detect favourable and unfavourable opinions
- Web mining
Analysis of useful patterns and information from the web (unstructured data).