All Flashcards

1
Q

Define Statistics

A

The art, language and science of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is synonymous with Domain Knowledge

A

Business/context understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Data

A

The raw, unorganised facts used in analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Information

A

Data which has been processed to make it useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Knowledge

A

Understanding of the information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List three common data formats

A

CSV
XML
RTF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Open Data

A

Data which may have no copyright or referencing requirement. E.g open-source software like R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Public Data

A

Data within the public domain. Free to use, but still has ownership and restrictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Proprietary Data

A

Opposite of public data. Private IP of a company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Operational Data

A

Used in the day-to-day activities of a business, e.g. customer records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Administrative Data

A

Data used to make informed decisions, often the subject of analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Structured and Unstructured Data

A

Structured data has a well defined model. It’s easy to tabularise.

Unstructured data has no defined model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Quantitative Data

A

Discrete/categorical are numeric variables which can only take specific values, which can be counted between.

Continuous is data which can take any value within the interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of Qualitative Data

A

Nominal is label data with no order.

Ordinal is label data which can be ordered.

Binomial is a binary data label, e.g. TRUE/FALSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the stages of the Data Lifecycle?

A
Created
Initial storage
Archived 
Obsolete
Deleted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do Databases and Structured Data relate?

A

A database is a repository of structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a Relational Database?

A

A large grouping of schemes, tables, queries, reports, views and other elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain Tables in the relational model

A

In the relational mode, every relation must have a header (columns) and body (rows).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define Keys

A

Designated columns within a table with which the data can be ordered and linked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some examples of Semi-Structured data?

A

XML and csv are technically semi-structured, as some processing is required to get them into table form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define Big Data

A

Sets of data which are beyond the capabilities of traditional data processing software. They must be analysed computationally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the four Vs of Big Data?

A

Volume
Variety
Velocity
Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are Requirements?

A

The constraints placed on an analysis project, usually determining the data to analyse. Aims to establish the purpose of the project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Explicit Knowledge?

A

Knowledge that can easily and swiftly be articulated to other people and is usually stored somewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is Tacit Knowledge?

A

Knowledge that cannot be readily articulated to other people, may be assumed and may not be stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Elicitation?

A

A proactive activity, where the analyst initiates conversations with stakeholders to gain an understanding of the problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are some techniques of Requirement Elicitation?

A

Interviewing
Observing
Recounting
Apprenticing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is Recounting

A

The method of having multiple stakeholders articulate their requirements. Aims to identify misunderstandings, assumptions and reach consensus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the difference between Requirements Elicitation and Gathering?

A

Requirements gathering is a reactive activity - data exists and must be collected and analysed.

Elicitation is a proactive activity. The analyst initiates conversations with stakeholders to gain an understanding of their problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are some Elicitation challenges?

A

Problems of scope - customers give ill-defined or unnecessary requirements.

Problems of volatility - requirements change over time.

Problems of understanding - customers unsure of what is needed and the capabilities in their computing environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are some Elicitation solutions

A
Visualisation
Consistent language
Guidelines
Consistent use of templates
Documenting dependencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the Elicitation guidelines?

A

Assess business + technical feasibility.

Identify requirement specifiers and their bias.

Define technical environment.

Identify domain constraints

Select 1+ Elicitation techniques.

Encourage participation from many stakeholders.

Identify ambiguous requirements for prototyping.

Use usage scenarios to help customers better identify their key requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the difference between Validation and Verification?

A

Validation judges the accuracy of something, eg 50% of company records are compliant.

Verification is concerned with meeting standards in absolute terms, eg the company records are not compliant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Define the types of Data Models

A

Conceptual - high-level mappings of database elements and the relationships between them. Identifies info to collect, attributes and class relationships.

Logical - converts business requirements into a model. Revolves around customer need, rather than technical needs. eg a flow diagram.

Physical - a full server model diagram, showing the detail of the database. Shows constraints, eg keys and check constraints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Define Check Constraints

A

Check whether an attribute meets a certain requirement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Define Quality

A

The standard of something when compared to other things of a similar kind.

For data, quality doesn’t need to be perfect - just high enough for the specific analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the 8 principles of the Data Protection Act?

A

Used fairly and lawfully.
Used for limited, specifically stated purposes.
Used in a way that’s adequate, relevant and not excessive.
Accurate.
Kept for no longer than absolutely necessary.
Handled according to people’s data protection rights.
Kept safe and secure.
Not to be transferred outside the EEA.

38
Q

Under the Data Protection Act, for what do stronger legal protections exist?

A

Race, ethnic background, political opinions, religious beliefs, TU membership, genetics, biometrics, health, sexual orientation.

39
Q

What are the 8 rights under GDPR?

A
Right to be informed.
Right of access.
Right of rectification.
Right of erasure.
Right to restrict processing.
Right to data portability.
Right to object.
Right in relation to automated decision making and profiling.
40
Q

Which acronym gives the fundamentals of Data Security?

A

CIA

Confidentiality
Integrity
Availability

41
Q

What are the reasons for Dirty Data?

A
Data is missing.
Data is incorrect.
Incorrectly formatted.
Entered into wrong fields.
Stale (out of data).
Missing links, eg relationship.
Duplicated.
42
Q

What are the sources of Data Error?

A

Completeness - does not capture the entire problem.
Uniqueness - no duplicates.
Timeliness - data is available when expected and needed.
Accuracy - data reflects reality.
Consistency - providing the same data for the same data object.
Conformity - the data follows the required format.

43
Q

How can Data Error be avoided?

A

Process - Put greater controls around data creation.
Entry - have independent checking of drop-down lists to ensure correct data entry.
Identification - searching for errors in data.
Validate - automatically or manually check accuracy in data.

44
Q

What are the steps in the Data Analysis Process?

A
Problem hypothesis
Identify what to measure
Collect data
Cleanse data
Model data
Visualise data
Analyse data
Interpret results
Document/communicate results
45
Q

Define a Hypothesis

A

A possible explanation for something, which serves as a starting point for further investigation.

46
Q

What’s the difference between H0 and H1?

A

Null hypothesis is the default assumption, that nothing has changed.
Alternative hypothesis is the prediction you make, can be considered the case if H0 is disproven.

47
Q

Define Data Accessibility

A

Data in a format that is easy to handle/manage. Similar to data quality.

48
Q

Define Data Extraction

A

Adding further structure to data. Yields usable data from unstructured data.

49
Q

What are the types of Data Cleansing?

A

Filtering - data is included based on a Boolean condition.
Interpolation - using other data points to fill in the gaps.
Masking - hides certain data from view by unauthorised people, but still allows analysis to occur.
Blending - Combining data from different sources into a single dataset. May be warehoused.
Transformation - changing data from one format/structure to another.

50
Q

What is an ETL process?

A

Extract Transform Load

Define the source.
Define the target.
Define the mapping.
Create the session.
Create the workflow.
51
Q

Define Data Models

A

Mathematical abstractions of reality. Seek to capture relationships between variables.

Date = Model + Error

52
Q

Explain Inferential Statistics

A

A branch of statistics which quantifies relationships (rather than descriptive statistics).

Correlation quantifies strength of linear trend.
Hypothesis testing asses the significance of patterns in data.
Regression analysis models trends.

53
Q

What are some types of Data Visualisation?

A

Infographics
Time series
Part to whole
Geospatial

54
Q

Define Data Analysis

A

Deriving insight and meaning from data. Includes assessing trends and correlations.

55
Q

Define a Variable (data structure)

A

A reference to a particular location in a computer’s memory (an address).

56
Q

Define an Array (data structure)

A

A sequence of slots of memory, where each slot contains an element (value or object). Deleting and inserting can be slow - it will change the address of all elements in the array.

57
Q

Define a List (data structure)

A

Similar to arrays, but permit elements of more than one data type. Values can be inserted/deleted without changing the address of other elements.

58
Q

Define a Class (data structure)

A

A data structure containing data fields. It offers a blueprint defining the variables common to an object.

59
Q

Define a Tree (data structure)

A

Shows a hierarchical data structure. The top node is called the root. Faster than arrays when inserting and deleting, but slower the linked lists.

60
Q

Define a Record (data structure)

A

A value that contains other values. Aka a tuple of struct. (A row in a table).

61
Q

Define a Schema

A

A database design including conceptual, logical and physical considerations.

62
Q

What are the types of Schema?

A

Conceptual schema - a representation of an organisation, showing the entities, attributes and relationships.

Logical schema - the natural successor, articulates data structures, eg tables, objects and shows relationships.

Physical schema - successor to the logical schema, includes precise detail on the database structure.

63
Q

What is a Relational Database?

A

It breaks data into multiple tables. Tables linked through primary and foreign keys.

64
Q

What is a Relational Database?

A

It breaks data into multiple tables. Tables linked through primary and foreign keys.

65
Q

What is a Flat File Database?

A

Before relational, all data was stored in a single table (eg a spreadsheet).

66
Q

What is a Hierarchical Database?

A

Organised into a tree structure. Parent records can have many child records. Each child record can have one parent record. Still widely used for certain functions.

67
Q

What is a Network Database?

A

Aims to boost the flexibility of hierarchical databases by allowing many-many relationships between records.

Still less flexible than relational.

68
Q

What is an Object Orientated Database?

A

Info on each entity stored within a single object. Eg each customer had an object to store their own file info.

69
Q

What is a Multi-dimensional Database?

A

Data visualised as a collection of cubes. Includes data cubes and hyper cubes (more than three dimensions).

70
Q

What is a NoSQL Database?

A

Not only SQL database. Came from the need to have large scale, clustered databases. Useful for unstructured data.

71
Q

What are the types of NOSQL Database?

A

Document store - stores semi-structured data by allowing Devs to update code without refering to a central schema. eg JSON and XML.

Wide-column store - organised data into columns rather than rows. Each column has lots of info on the same entity. Can be faster to query large volumes.

Graph store - data stored in nodes, rather than traditional records. Node connections known as edges.

72
Q

Define Normalisation

A

The process of organising tables (and their columns) in order to improve data integrity.

73
Q

What are the two types of Anomalies?

A

Insertion anomalies - describes when data cannot be added into the table.

Deletion anomalies - describes attributes being lost when other attributes are deleted.

74
Q

What are the three Normal Forms?

A

First normal form - stored in a relational table with no multi-valued columns.

Second normal form - all columns depend on the tables primary key.

Third normal form - no column has transitive dependency on primary keys.

75
Q

Define Data Warehousing

A

Data stored ready to be dispatched/used.

76
Q

Explain four Database Maintenance techniques

A

Log file maintenance - log files contain a history of every transaction against the database.
Log files are a form of redundancy (they’re data additional to the actual data).

Data compaction - frees up unused space for new data, but doesn’t necessarily reduce the size of the database file. May require downtime.

Defragmentation - identifies data that is related, and relocated it to the same physical location to improve performance.

Integrity checks - looks for problems with data that may cause corruption or other problems. Eg a virus scan.

77
Q

What is a Canonical data model?

A

Provides a high-level view of entities and their relationships across an organisation.

78
Q

Define Data Architecture

A

The set of rules, policies, standards or models set by the organisation that govern the use of its data.

It’s a business process, rather than technical primarily.

79
Q

Define Data Policies, Standards and Rules

A

Data policies - a broad framework for how decisions should be made regarding data.

Data standards - provide detailed rules on how to implement data policies.

Data rules - provide specific instructions on how to implement data standards.

80
Q

Define Data Migration

A

The transfer of data from storage/computing environment to another.

81
Q

Define Data Integration

A

Combining data from different sources to provide a unified view.

82
Q

What are the four features of Database Architecture?

A

Database design, data warehousing, migration and integration.

83
Q

Define Domain Context

A

Understanding of the business environment the data is in.

84
Q

Define Decision Analytics?

A

Using visual data techniques to support choices or decisions made by people.

85
Q

Define Descriptive Analytics

A

Focusses entirely on the understanding of historical data. Can inform decision-making.

86
Q

Define Predictive Analytics

A

Using historical data to understand or predict the future and inform decisions.

87
Q

Define Prescriptive Analytics

A

The integration of predictive analytics into business systems. Seeks to identify what will happen, when and why.

88
Q

What is a Functional Requirement?

A

It describes a feature which the solution should have.

89
Q

What are the steps of the ETL process?

A
Define the source
Define the target
Create the mapping
Create the session
Create the work flow
90
Q

What is data validation?

A

The process of ensuring a program operates on clean, correct and useful data.