Paper 2 Unit 6 Flashcards

1
Q

What is data?

A

Data refers to raw facts, observations or measurements that have little meaning on their own

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is information?

A

Processed and organised data that has meaning and context. It is derived from data through interpretation, analysis and contextualisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is knowledge?

A

Knowledge goes beyond information and represents the understanding, insights and expertise gained from information and experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is human readable data?

A

Unstructured data like a block of text that can only be interpreted by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine readable data?

A

Structures data like a set of instructions that can be processed by computer programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is big data?

A

Large, complex, and layered groups of data that can be analysed to spot patterns and trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a data type.

A

The way data is stored (string, integer, float etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is data wrangling?

A

The process of transforming a raw data form into a desired format suitable for purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name the stages of data wrangling

A

Discover, structure, clean, enrich, validate, share

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do organisations need data?

A

To analyse market trends to identify patterns and inform decisions, system performance analysis, user monitoring, targeted marketing, inform decision making, assess threats and opportunities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is data generated?

A

Human input, AI, sensors, Internet of Things, Transactional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Name different data formats

A

ASCII, CSV, fixed width text file, XML, JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the benefits and drawbacks of ASCII?

A

Benefits: Standard format for all computer systems, communicate using standard English alphabet
Drawbacks: Limited number of characters, replaced by Unicode which contains other alphabets and symbols so can be more widely used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the benefits and drawbacks of CSV?

A

Benefits: Common format understood by most applications
Drawbacks: Format is delimited and it is possible to use other delimiters other than the comma, tab is common making TSV widely used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the benefits and drawbacks of Fixed Width data formats?

A

Benefits: For very large data files it’s easy to calculate the location of data to retrieve it since the length is fixed
Drawbacks: Fixed sizes for fields, padding character and alignment need to be known before data can be retrieved accurately, needs to be carefully planned before setting up and saving data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits and drawbacks of XML?

A

Benefits: platform dependent so can be used on any system, supports Unicode so can cope with data, displayed in a GUI using HTML
Drawbacks: Requires a series of complex tags to store the data making files large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the benefits and drawbacks of JSON?

A

Benefits: Compact format based on JavaScript that can be used on most systems, good and reliable for website-to-website data transfer, many programming languages support JSON
Drawbacks: No error handling for JSON calls, security issues with data transfers if it is being hosted on a vulnerable website so it is open to hacking

18
Q

What are the differences between file-based directory structure and dictionary-based data structures?

A

File-based: defines the structure and types of data stored, each application defines and manages own data, data should be consistently structured to maintain access to different processes, different file formats can make data incompatible, data can be duplicated, analysing data can become complex since formats need to be rationalised.

Dictionary-based: typically hierarchal, easier to locate data, set of keys to a value, set of keys are ordered making structure more logical and searching more efficient.

19
Q

What are the stages of data wrangling?

A

Discovery- becoming familiar with the data and understanding it so patterns can be identified
Structure- data restructured into single format after coming from different sources with additional items of no value being removed
Clean- errors identified and fixed, outliers, null values and duplicates removed, format standardised, typos fixed, measurements standardised and data validated
Enrich- existing data from internal or third-party added to fill gaps and enhance set
Validate- quality of data validated for quality, consistency, accuracy, security and authenticity
Output- data ready to be published and used

20
Q

What are the core functions of a data system?

A

Input, Search, Save, Integrate, Organise, Output, Feedback loop

21
Q

Why is maintaining data important?

A

Making sure data entry is accurate is not enough, regular check need to be carried out to update important data e.g by contacting customers regularly to check their data is correct

22
Q

How can data be visualised?

A

Graphs and charts to help clarify data but the scales and choice of graph can be confusing and misleading, data tables to provide rapid and easy access to enable stakeholders to compare information but they need to be labeled accurately, reports show data in an accessible format to assess the performance of a business but a consistent format is needed so comparisons can be made, infographics represent data graphically with minimal text for an easy-to-see overview however they can sometimes lack detail

23
Q

What is the purpose of business intelligence software?

A

To retrieve and analyse data to inform decision making. The applications can provide information that a business can use to inform decisions about long-term strategic decisions

24
Q

What is the purpose of financial planning and analysis?

A

To support the financial aspects of the business including financial planning, setting budgets and forecasting future performance including profits

25
Q

What is the purpose of customer relationship management?

A

To manage relationships with existing and potential customers and to make it easier for the customer-facing employees to build relationships, steamline processes, improve customer service and increase profits. Software can gather data from various sources, track customer interactions, identify any trends and use the information to inform decisions

26
Q

What are the aspects of data models?

A

Conceptual (high-level needs, main concepts and relationships)
Logical (description of model in terms of structures to define implementation)
Physical (how the model should be implemented using a specific database management model)
Hierarchal (relationships between tables, records and fields in a tree-like format)
Relational (relationships between tables, records and fields in terms of primary and foreign keys)

27
Q

What are entity relationship diagrams (ERDs)?

A

Formal diagrams that represent the relationships between entities in a database to convey the design objectives

28
Q

What relationships can be represented in ERDs?

A

One-to-one
One-to-many
Many-to-many

29
Q

What diagrams does UML use to explain interactions and links?

A

Class- facts about an entity and the ways to access the facts
Use case- describe what is going on in the system
Communication- show how objects combine and carry out a task
Activity- show various processes in a system
Sequence- how objects interact with each other

30
Q

What are the 6 V’s of big data?

A

Volume- the size of the data being analysed, more data= more accurate results
Variety- a wide variety of data can result in unexpected results and be used to create different uses for the data
Variability- the measure of how quickly and by how much the data are changing
Velocity- the measurement of the temporary value of the data and how quickly the data should be processed to have any value
Veracity- the quality of the data, the origin and any conflicting information to determine if the data is authentic
Value- how effective and useful the data is

31
Q

What is validation?

A

Checking that the data is reasonable within a realistic range and complete

32
Q

What validation techniques can be used?

A

Presence check, length check, type check, format check, range check, combination check

33
Q

What is verification?

A

Checking that the data is copied correctly (not that the data is correct as this could be that the source is incorrect)

34
Q

How can data be verified?

A

Enter the data twice so the two copies can be compared to ensure they are correct. Data redundancy can be used to store the data multiple times. Analysis can be carried out on the redundant data to ensure it is consistent and that it has not been recorded in a different format or has been changed or been recorded differently

35
Q

How can data be collected by an organisation and what are the costs associated with them?

A

As a by-product of everyday activities
From external sources
As a result from market research

Costs:
Entering data is time consuming and could involve hiring extra staff
3rd party data may be less reliable and may need to be cleansed
Phrasing questions to solicit a response in market research could give unreliable results
Data may be passed between department in the organisation which could lead to it being too old to be useful or being different or changed in some way

36
Q

What is the difference between data lakes and data warehouses?

A

Data lakes are vast stores of raw data that are not in any specific format or for a defined purpose whereas data warehouses store structured data that have been processed for a specific purpose

37
Q

How can raw data be processed?

A

Data mining- analysing raw to data to find patterns and trends which can be used to inform decisions, identify risks, cut costs, increase income and improve customer relations
Data reporting- providing the organisation with an overview of the current situation based on an analysis of past data and the current situation to identify future actions but the facts are presented without any context

38
Q

What are the different types of metadata?

A

Descriptive- enable identification and selection for a resource
Administrative- enable the management of a resource including any restrictions
Structural- used when processing data and include any relationships or structural features

39
Q

How can user access entitlements impact the organisation and stakeholders?

A

Authorised users are given privileges to limit what they can access and edit in line with their role. The organisation will include a set of rules about how data can be access and used based upon the access rights given

40
Q

What is the role of an Application Programming Interface (API)?

A

It is a way for multiple applications to communicate with each other as it enables data to be transmitted from one app to another. The user sends a request for data and the server responds, data is exchanged in an agreed format. A web API processes data between a web server and browser

41
Q

Why is API authentication and monitoring crucial to the security of links between different software?

A

Authentication tokens are checked to make sure the user is who they claim to be and has access rights to the API and API keys and codes are sent with a request to authenticate the application and the user and are used to track interactions and how the interface is used to prevent abuse or malicious use of the API

42
Q

What is the difference between data at rest, data in motion and data in use?

A

Data at rest is when the data are stored but not being moved
Data in motion are data that are moving from one location to another through the internet into storage or internally from one device to another. Data is most vulnerable to security breaches when it is in motion
Data in use is data that is currently being processed or used