U1T3.2 - Applications of DT Flashcards

Data Mining & Cloud Computing

1
Q

What is data mining?

A

Process of analysis large data sets (big data) with view to discovering patterns + trends that go beyond simple analysis. Combines AI, stats + database systems in analysis of groups of (un)structured data sets which are difficult to analyse using traditional methods. Extracts info from data set + transforms into appropriate format for use. (Summary of input data for analysis) Stops at process of pattern extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is big data?

A

Data sets so complex that traditional databases + other processing applications can’t capture, curate, manage + process them in acceptable time frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does curate mean?

A

Process of organising data from range of data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 big data challenges?

A

Volume (amount of data to be processed), variety (num of types of data to be analysed) + velocity (speed of data processing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does DT allow us to collect data for analysis?

A

Online forms, mobile phone data transmissions, email data, stock market data, market research, PDAs, smartphones, tablets + netbooks etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can data sources be categorised and what are the differences?

A

Internal + external. Internal = customer details, product details, sales data. External = business partners, data suppliers, internet, govt + market research companies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the most commonly used data sources?

A

Social media, machine data (generated from devices like RFID chip readers, GPS results) + transactional data (data from companies like eBay, Amazon, Tesco)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the key requirements of big data storage?

A

Handle large amounts of data + keep scaling up to handle growth of data sets. High speed input/output operations to support delivery of data analytics as they’re carried out. Big data practitioners run hyperscale computing environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are hyperscale computing environments?

A

Consists of many servers with DAS, each unit has PCIe flash storage devices to support data storage + high speed access to data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is DAS?

A

Direct Attached Storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can smaller organisations support the storage of big data?

A

Use of NAS devices, can scale outward so can be difficult to manage as span out in hierarchial manner (many devices, many folders within folders)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is NAS?

A

Network Attached Storage. File access shared storage, easily scaled out to meet increased capacity/computing requirements for big data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are object-based storage systems?

A

Alt to NAS devices + their issues. Each file storage given unique identifier + index to support high speed access to particular data file/set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do big data processing techniques do?

A

Analyse data sets at terabyte/petabyte scale. Some methods include cluster analysis, classification, anomaly detection, association rule mining + sequential pattern mining, regression + summarisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is cluster analysis?

A

Groups of data records identified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is classification?

A

Data mining process used to determine appropriate structure to new data. e.g. way email application classified some emails as spam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is anomaly detection?

A

Unusual records identified. Some anomalies merit investigation as points of interest to organisation or may be representative of errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is association rule mining + sequential pattern mining?

A

Dependencies between data items identified. e.g. use of data sets by supermarket to determine which patterns of products bought together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is regression?

A

Relationships between data variables investigated to help see how change in independent variable impacts on dependent data variable.

20
Q

What is summarisation?

A

Data summarised in visual format.

21
Q

What are some of the key objectives of collecting and using big data by the financial services sector?

A

Ensure they comply with regulations (using fuzzy matching to check customer names + aliases against customer blacklist, lower cost), improve risk analysis (algorithms run of transaction data to identify fraudulent activity/perform risk analysis, support trading decisions), understand customer behaviour/transaction patterns + improve services (identify what leads to dissatisfaction)

22
Q

How does the health sector use big data?

A

Predict epidemics, cure disease, improve life quality + avoid preventable deaths. Smartphones measure steps, diet + sleep patterns which in future could be shared with GP for diagnosis help. Supports clinical trials to select best subjects. Phone location can track pop movement and predict spread of Ebola virus.

23
Q

How does the retail sector use big data?

A

Predict trends + forecast demand, price optimisation (spending habits + demand) + identify potential customers (data collected through transactional records + loyalty programs allows demand to be forecast on basis of geographical areas)

24
Q

What is cloud computing?

A

Use of internet by large computing companies to provide services normally provided by LAN. Use server farms to host services they provide for other organisations who can access these services from any computer w/ internet connection. Users don’t know where data stored. Virtual servers form foundation of cloud servers. Capitalises on principle of virtual clusters.

25
Q

What are server farms?

A

Central computer centre consisting of large num of linked file servers. Each location could have many servers, comp storage devices + other components used to support services provided by Cloud Service Provider.

26
Q

What is virtualisation?

A

Allows virtual servers to run on physical server platform. Separates physical infrastructures to create dedicated resources. Possible to run multiple OS’s + applications on same server at same time by making servers. Manipulates hardware used to provide cloud computing as service to client users. Virtual version of physical device/resource where users can use resource as if real single resource.

27
Q

What is a cloud instance?

A

Location of physical memory on cloud server which has been allocated to particular client. Acts as virtual server for client + has own allocation of processing power, storage + other components. Each server has multiple clients. Can be used by client for processing of cloud based task/application. End user doesn’t need to consider how many servers/resources applied to application. Location is immaterial + dynamic nature means resources reassigned as needed without downtime. If end user wants access to app/service, create cloud instance for time they use app.

28
Q

What does each server in cloud computing having multiple clients do?

A

Possible to allocate additional capacity to clients when usage spikes + resource demands increase. Each of elements in instance is dynamic + can be changed

29
Q

What is ahosted solution?

A

Like cloud instance but hardware + software made available to client is reserved for servicing of their needs + noone else. They pay for all resources whether used or not. Where usage exceeds capacity, additional investment in resources by client is made.

30
Q

What are virtual clusters?

A

Formed when virtual machine, established to meet demands of cloud instance, configures available resources on network to meet client demands.

31
Q

How can users access cloud computing?

A

Secure client login to access services. Data transfer between cloud service providers + clients is via encrypted connection.

32
Q

What are some of the services available to clients with cloud computing?

A

Data storage, email, virtualised software, backup + remotely hosted applications.

33
Q

How does cloud computing provide data storage? What is cloud storage?

A

Saving data on off-site storage system provided by third party. Saved to remote database + connected to via internet.

34
Q

How does cloud computing provide email?

A

Web based services like Yahoo + Outlook allows users to access email from cloud using any browser/hardware platform with internet connection. Emails sent + received via client’s account stored on service provider’s server rather than being stored on user’s own comp.

35
Q

How does cloud computing provide virtualised software?

A

Many applications available via web. MS Office moved partly to web. Users can access application hosting + not consider storage space taken up on own personal devices. Software accessible without installing + updates automatic. Access from any location + can share + access data files from anywhere they have an internet connection.

36
Q

How does cloud computing provide backup?

A

Allows clients to store data on internet using storage service provider instead of locally on hard drive. Backups schedules automatically. To access backup, must use service provider’s specific application of web based interface provided.

37
Q

How does cloud computing provide remotely hosted applications?

A

Clients can access business applications from anywhere with internet + comp. Relevant software + data stored on remote server. Supports multiple concurrent access to apps + data by staff.

38
Q

What are the benefits of cloud computing?

A

Providers can react more responsibly to user requirement changes, collaboration from diff locations, updates readily available to all, increased security + reliability as only maintain 1 system, all users access 1 system so no cross-compatibility issues, reduced costs as don’t need to buy hard + software for meeting business needs, increased flexibility in work arrangements so changes in demand for computing resources can be provided for at a cost + increased reliability as data backup, recovery + business continuity easier + less expensive.

39
Q

What are the drawbacks of cloud computing?

A

Trust someone else to look after data (updates, backup, restoration, security), cyber attacks problematic, insider threats where employees with access to cloud data go undetected + could destroy entire cloud environment hosting data, users concerned over loss of data confidentially which can be accessed by govt + legal liability issues as become liable after data breach. (GDPR)

40
Q

What is server virtualisation?

A

Partitioning of physical server into smaller virtual servers.

41
Q

What is storage virtualisation?

A

Amalgamation of multiple network storage devices into single storage unit.

42
Q

What is operating system virtualisation?

A

Multiple operating systems used on single server.

43
Q

What is network virtualisation?

A

All servers + services in network treated as single pool of resources which can be accessed without regard for physical components.

44
Q

What is application virtualisation?

A

Each application adapts a set of configurations on demand.

45
Q

What are the 4 V’s of big data?

A

Volume (scale of data), velocity (analysis of streaming data), variety (diff forms of data) + veracity (uncertainty of data)