All Flashcards
Define Statistics
The art, language and science of data.
What is synonymous with Domain Knowledge
Business/context understanding.
Define Data
The raw, unorganised facts used in analysis.
Define Information
Data which has been processed to make it useful.
Define Knowledge
Understanding of the information.
List three common data formats
CSV
XML
RTF
Define Open Data
Data which may have no copyright or referencing requirement. E.g open-source software like R.
Define Public Data
Data within the public domain. Free to use, but still has ownership and restrictions.
Define Proprietary Data
Opposite of public data. Private IP of a company.
Define Operational Data
Used in the day-to-day activities of a business, e.g. customer records.
Define Administrative Data
Data used to make informed decisions, often the subject of analysis.
Define Structured and Unstructured Data
Structured data has a well defined model. It’s easy to tabularise.
Unstructured data has no defined model.
Types of Quantitative Data
Discrete/categorical are numeric variables which can only take specific values, which can be counted between.
Continuous is data which can take any value within the interval.
Types of Qualitative Data
Nominal is label data with no order.
Ordinal is label data which can be ordered.
Binomial is a binary data label, e.g. TRUE/FALSE.
What are the stages of the Data Lifecycle?
Created Initial storage Archived Obsolete Deleted
How do Databases and Structured Data relate?
A database is a repository of structured data.
What is a Relational Database?
A large grouping of schemes, tables, queries, reports, views and other elements.
Explain Tables in the relational model
In the relational mode, every relation must have a header (columns) and body (rows).
Define Keys
Designated columns within a table with which the data can be ordered and linked.
What are some examples of Semi-Structured data?
XML and csv are technically semi-structured, as some processing is required to get them into table form.
Define Big Data
Sets of data which are beyond the capabilities of traditional data processing software. They must be analysed computationally.
What are the four Vs of Big Data?
Volume
Variety
Velocity
Veracity
What are Requirements?
The constraints placed on an analysis project, usually determining the data to analyse. Aims to establish the purpose of the project.
What is Explicit Knowledge?
Knowledge that can easily and swiftly be articulated to other people and is usually stored somewhere.
What is Tacit Knowledge?
Knowledge that cannot be readily articulated to other people, may be assumed and may not be stored.
What is Elicitation?
A proactive activity, where the analyst initiates conversations with stakeholders to gain an understanding of the problem.
What are some techniques of Requirement Elicitation?
Interviewing
Observing
Recounting
Apprenticing
What is Recounting
The method of having multiple stakeholders articulate their requirements. Aims to identify misunderstandings, assumptions and reach consensus.
What is the difference between Requirements Elicitation and Gathering?
Requirements gathering is a reactive activity - data exists and must be collected and analysed.
Elicitation is a proactive activity. The analyst initiates conversations with stakeholders to gain an understanding of their problem.
What are some Elicitation challenges?
Problems of scope - customers give ill-defined or unnecessary requirements.
Problems of volatility - requirements change over time.
Problems of understanding - customers unsure of what is needed and the capabilities in their computing environment.
What are some Elicitation solutions
Visualisation Consistent language Guidelines Consistent use of templates Documenting dependencies
What are the Elicitation guidelines?
Assess business + technical feasibility.
Identify requirement specifiers and their bias.
Define technical environment.
Identify domain constraints
Select 1+ Elicitation techniques.
Encourage participation from many stakeholders.
Identify ambiguous requirements for prototyping.
Use usage scenarios to help customers better identify their key requirements.
What is the difference between Validation and Verification?
Validation judges the accuracy of something, eg 50% of company records are compliant.
Verification is concerned with meeting standards in absolute terms, eg the company records are not compliant.
Define the types of Data Models
Conceptual - high-level mappings of database elements and the relationships between them. Identifies info to collect, attributes and class relationships.
Logical - converts business requirements into a model. Revolves around customer need, rather than technical needs. eg a flow diagram.
Physical - a full server model diagram, showing the detail of the database. Shows constraints, eg keys and check constraints.
Define Check Constraints
Check whether an attribute meets a certain requirement.
Define Quality
The standard of something when compared to other things of a similar kind.
For data, quality doesn’t need to be perfect - just high enough for the specific analysis.