13: Data analysis Flashcards
What is quantitative data?
concerns quantitative variables such as measurements or counts and expressed numerically
What is qualitative data?
qualitative attributes which CANNOT be expressed numerically
What is discrete data?
Takes exact values
What are continuous variables?
Can take any value within a range and we tend to analyse the values based on which range they fall within
Information should be ACCURATE and complete. What does this acronym stand for?
Accurate
Complete
Cost-benefitial
User-targeted
Relevant
Authoritative
Timely
Easy-to-use
What is meant by descriptive statistics?
The use of statistics that summerises the data in a data set
What is inferential statistics?
Statistical methods that deduce the characteristics of a bigger population from a small but representative sample
What is exploratory data analysis?
Identifying relationships in a set of data
What is confirmatory data analysis?
Confirming a pre-determined hypothesis
What is a representative sample?
A sample that reflects the characteristics of the population from which it is drawn
Explain:
Simple random sampling
Systematic sampling
Stratified sampling
Simple random:
- RNG from whole population
- every point assigned a number
Systematic:
- all items assigned a number
- RNG for first value
- Every nth value afterward
Stratified:
- population divided into strata based on characteristic
- relative size of strata determines sample size
- RNG from each item in a strata
What are surveys?
widely used by organisations to obtain useful information for decision making and research
What is meant by ‘survey fatigue’?
When surveys are too long and don’t get too the point such that the user doesn’t answer the questions properly or quits midway
What are some top tips for writing good survey questions? (7)
- simple short clear questions
- questions that require specific answers
- avoid broad questions
- using scales rather than yes/no
- avoiding leading questions
- avoid ‘double-barrelled- questions
- qualitative data can be as important as quantitative – use focus groups
What are the Excel commands listed in the textbook?
=SUM
=AVERAGE
= COUNTIF(B2:B7, “>25000”)
What is a risk of spreadsheet use?
Errors in spreadsheets: could be errors in formulae, human error, or logic
ICAEW Principles of good spreadsheet practice: (16) + (4 risks)
- Determine what role spreadsheets play
- Adopt a standard
- Ensure everyone has appropriate knowledge and competence
- Collaborative work
- Satisfy yourself that spreadsheets are the right tool for job
- Identify your audience
- Include an ‘About’ or ‘Welcome’ sheet
- Design for longevity
- Focus on required outputs
- Seperate and clearly identify inputs
- Be consistent in structure
- Be consistent in use of formulae
- Keep formulae short and practical
- Never embed numbers in a formulae that might change
- Perform any calculation once
- Avoid using advanced features when same result can be obtained with simpler features
What is comparability?
The extent to which differences between statistics can be attributed to differences between true value of statistics
Explain data bias
Data is biased when it is not representative of the population that is being analysed
Bias can be inherent or introduced by those analysing it
What is selection bias?
When data is not selected randomly and leads to a sample that is not at all representative of a population
What is self-selection bias?
When individuals select themselves to be part of a sample
Example: online questionaire
What is observer bias?
Occurs when observing and recording results and relates to interpretation
The researcher allows their assumptions, conscious or unconscious, to influence observation
What is omitted variable bias?
where key variables are not included within the data to be analysed
What is cognitive bias?
where the perception of whether something is good or not is influenced by being shown a previous or expected value for that variable
What is confirmation bias?
occurs when people see data that confirms their beliefs and they ignore data that disagrees with their beliefs
What is survivorship bias?
sample only contains items that survived some previous event
Example: exam results
What is statistical significance?
the results generated by testing or experimentation are unlikely to occur by chance or randomly, but occur due to a specific cause
What is Type I and Type II error?
Type I:
- ‘false positive’
- where the null hypothesis Ho is true
Type II:
- ‘false negative’
- where the null hypothesis is false, but is accepted because the sample result is not statistically different to Ho
What are the four principles of effective data visualisation?
- appropriate type of chart has to be picked
- an appropriate scale should be chosen
- charts should have a clear title
- appropriate use of colour and shading
What are the three data visualisation examples given in the textbook?
- Bar charts
- Pie charts
- Line charts
What is meant by big data?
Datasets whose size is beyond the ability of typical database software to caputre, store, manage, and analyse
What are the four characteristics of Big Data? (The Four Vs)
- Volume
- Velocity
- Variety
- Veracity (trustworthiness or accuracy)
Types of structured data: (4)
- created data
- provoked data
- transacted data
- compiled data
Types of unstructured data (2)
- captured data
- user-generated data
ICAEW sets out the driving forces behind the increasing importance of big data (3)
- new sources of data is unstructured human-sourced and machine-generated data
- exponential growth in computing power and storage means entire data can be captured and processed regardless of size
- new infrastructure for knowledge creation such as crowdsourcing and open-source software
What is data analytics?
The process of collecting, organising, and analysing large sets of data to discover patterns and other information which an organisation can use for its future business decisions
What are the four types of data analytic and what associated question to they require to be answered?
Descriptive analytics
- What has happened?
Diagnostic analytics
- Why has something happened?
Predictive analytics
- What is likely to happen in the future?
Prescriptive analytics
- What is the best course of action?
A report by the management consultants at McKinsey highlight the following six ways big data and data analytics can be used by a business to create value:
- Enhance transparency
- Performance improvement
- Market segmentation and customisation
- Decision making
- Innovation
- Risk management
What are the risks of big data and data analytics? (6)
- Storage
- Workforce skills
- Data dependency
- Information overload
- Data privacy
- Data security
What legislation governs data protection in the UK?
Data Protection Act 2018
What is IP and what is it governed by?
Intellectual property
Intellectual property law
What are the five methods of IP protection?
- Copyright
- Design right
- Trademark
- Registered design
- Patent
What are the protections offered for copyright law? (Lengths of time for each medium)
written, dramatic, musical, artistic:
- 70 years from artist death
sound and music recording:
- 70 years from publishing
films
- 70 years after death of the director
broadcasts
- 50 years from when first broadcasted
published written, dramatic, or musical work
- 25 years from when first published
What are the protections offered for design rights? (Lengths of time for each medium)
15 years after created
10 years after it is sold, whichever is earlier
What are the protections offered for trademark? (Lengths of time for each medium)
10 years
What are the protections offered for registered design? (Lengths of time for each medium)
25 years
What are the protections offered for patent? (Lengths of time for each medium)
20 years