Data Mining Flashcards
what is data mining?
the automatic analysis of large data sets in a data warehouse. pattern recognition are used to idenify patterns and to predict trends. data is combined from multiple sources
what are the main features of data mining?
its involves analysing large data sets to identify patterns to predict future trends
what is big data?
its a term associated with data sets than are so complex that tradirional database and other processing applications are unable to capture, manage and process within acceptable time frame
what are the big data challenges?
volume- amount of data to be processed
variety- the number of types of data to be anaylsed
velocity- the speed of data processing
what does digital technology in data mining allow?
it allows us to collect data for further analysis using mthids such as online forms, mobile phones data transmission, email data and stock market data
what are commonly used data souces?
social media
machine data- data regenerated from devices such as RFID generated chip readers, GPS results
transactional data- data regenerated from companies such as ebay, amazon
internal data sources
customer details, product details, sales data
external data sources
data collected from business partners, data suppliers, internet
what is key requirements of big data storage?
it can handle very large amounts of data and keep scaling to keep up with large amounts of data and keep up with the growth of data sets
what is Network Attached System?
this is a file access shared storage which can easily be scaled out to meet the increased capacity or computing requirements required for big data analysis
what are some of the methods of processing in big data storage?
cluster analysis- where groups of data records are identified
classification- where the data mining process is used to determine an appropriate structure to new data.
anomaly detection- where unusual records are identified.
regression- where relationships between data variables are investigated to help how a change in an independent variable can impact upon a dependent data variable
what can big data do for organisations?
help gain insight, help into potential revenue increases he or p them determine how to improve operations
what are the key objectives of using big data in financial sector?
ensuring they are complying with regulations- using traditional data processing platforms to support objective- increase expense
improving risk analysis- can help identify fraudulent activity
how does retail use big data?
predicting trends and forecasting demand
price optimisation
identifying potential customer