Chapter 13 - Data Analysis Flashcards
Define Data
Data: distinct bits of information, in whatever form such as numbers, text, bytes stored in electronic memory or as facts in someone’s mind
Define Information
Information: the output of whatever system is used to process data or organise it in a useful way. Data by itself is useless and it’s only when we turn it into information does it become useful.
What is the relationship between data and information?
Data by itself is useless and it’s only when we turn it into information does it become useful.
Define Quantitative data
Quantitative data is data in the form of numbers, such as the number of units of a product sold each day, and lends itself to statistical analysis. We may say that we are measuring quantitative variables.
Define Qualitative data
Qualitative data is data about variables that cannot be expressed numerically, such as nationality, favourite colour or how someone is feeling. We may say we are measuring qualitative attributes.
Define Discrete data
Discrete data can only take exact values such as the number of products sold in a day. This kind of data is usually counted.
Define Continuous variables
Continuous variables can take any number within a range. For example, a range of 170cm to 171cm would include observations such as 170.4cm, 170.9cm and so on.
What is the primary use of data in business?
Data is used to inform decision-making, improve efficiency, identify trends, and support strategic planning.
What are some common sources of data and information in a business context?
Sources include internal data (e.g., sales records, employee data), external data (e.g., market research, industry reports), and public data (e.g., government statistics).
What is the role of planning in data usage?
Ensures sufficient resources are available by making accurate forecasts to support better decision-making.
Why is decision-making an important use of data?
It helps managers evaluate mutually exclusive options and manage varying levels of risk.
How does control benefit from data analysis?
Financial and non-financial information helps determine whether the business is meeting its objectives and identifies areas requiring corrective action.
Examples of Internal data sources 6
Internal data sources
Organisations can capture data/information internally from a number of different sources:
transactions
communication between managers and between managers and their staff
accounting records
human resources and payroll records
machine logs
procurement data
timesheets
Examples of External data sources 4
External data sources
Data/information collected outside of the organisation may be formal or informal:
New legislation
Market research
Research and development functions may look outside the business to what they should look into
Companies House to source the financial statements of competitors, customers and suppliers
What are the qualities of good information? ACRONYMN ACCURATE
Good information is accurate, relevant, complete, timely, cost-effective, understandable, and actionable.
ACCURATE
Qualities of good information
Whatever the information is, it will be deemed to be of good quality if it meets the following criteria:
Accurate
Complete
Cost-beneficial - E.G. HOW MUCH DOES IT COST TO ACQUIRE THE INFO VS ITS USE - DOES IT COST 2K BUT ONLY SAVES 1K ETC
User-targeted - E.G. AUDIENCE
Relevant
Authoritative
Timely
Easy to use E.G. IS IT ACCESSIBLE - FORMAT - WHERE ITS STORED - HOW ITS DELIVERED
Stages of data analysis 5
- Identify info needed
- Collect the data e.g. method are we doing a survey, phone call
- Analyse the data
- Present the information (changing data into information)
- Use the information
What data set is data analysis carried out on?
Data analysis may be based upon the whole population of data or upon a sample within it. We may say that we analyse a data set, which could either the population or a sample.
What are some methods used in data analysis?
Methods include statistical analysis, data visualization, predictive modeling, and data mining techniques.
What are the main methodologies of data analysis? 4
- Descriptive statistics: Summarizing all data in the dataset.
- Inferential statistics: Drawing conclusions about a population based on a sample.
- Exploratory data analysis: Identifying relationships and patterns in the data.
- Confirmatory data analysis: Testing a hypothesis using statistical methods.
Define Descriptive statistics
Descriptive statistics: the statistical summarisation of all of the data in the data set.
Define Inferential statistics
Inferential statistics: the statistical findings of a relatively small sample of data are taken to be applicable to the characteristics of the larger population.
Define Exploratory data analysis
Exploratory data analysis: the identification of relationships within a sample of data and thus the attributes of those in the relationship. A good example of this is churn which is a set of customers that switch to alternative suppliers.
Define Confirmatory data analysis
Confirmatory data analysis: the use of statistical analysis to confirm a pre-determined hypothesis. A good example would be the production manager whose instinct tells her that 5% of products off a particular line are faulty. She investigates to see if this is correct.
What are the challenges of sampling in data analysis?
Results may not represent the population exactly, as they are estimates. Sampling bias and insufficient sample size can affect accuracy.
What are some ways to improve sampling estimates? 2
- Choosing sampling methods that reduce bias.
- Increasing the sample size to make it more representative of the population.
What are the three main sampling methods? 3
- Simple random sampling: Every item in the population has an equal chance of selection using a random number generator.
- Systematic sampling: Selecting every nth observation in the population after a random initial selection.
- Stratified sampling: Dividing the population into subgroups (strata) and randomly sampling from each strata to ensure representation.
Define Simple random sampling
Simple random sampling: a random number generator is used to select a sample from within the population. The disadvantage is that the resulting sample may, through chance, not be representative of the population.
Define Systematic sampling
Systematic sampling: following a random initial selection, every nth observation from within a population is selected. This avoids the chance of an unrepresentative sample being taken.
Define Stratified sampling
Stratified sampling: the population is divided into sub populations (strata) based on a particular characteristic. A number of observations are then taken randomly from each strata. This ensures that all strata are represented in the sample.
What is a survey in the context of data collection?
A survey is a method of acquiring information about a population by asking questions to targeted respondents.
What are some good practices for creating survey questions? 4
- Use simple, short, direct, and specific questions.
- Avoid leading questions that hint at the correct answer.
- Avoid double-barreled questions that introduce ambiguity (e.g., ‘Do you like cats and dogs?’).
- Use scales to gauge the level of an answer.
Why should long surveys be avoided?
Long surveys may cause recipients to suffer from survey fatigue and abandon the process before completion.
Why is prioritization important in survey design?
Prioritizing questions ensures that key data is captured before participants may abandon the survey.
What is the purpose of conducting surveys in data analysis?
Surveys gather specific, actionable information directly from stakeholders or target audiences to inform business decisions.
How can surveys ensure representative responses?
Target respondents should be representative of the population as a whole.
How can surveys avoid self-selection bias?
Achieving a high response rate ensures that respondents are not only those with extreme views or opinions, avoiding self-selection bias.
When might a survey not be the right tool?
Surveys may not be suitable when in-depth discussions are needed; focus groups might be a better option in such cases.
What are spreadsheets commonly used for in data analysis?
Spreadsheets are used for data organization, analysis, calculation, visualization, and reporting.
What are the three basic Excel functions you need to know for your exam?
The three basic Excel functions are SUM, AVERAGE, and COUNTIF.
How is the SUM function written in Excel?
SUM is written as =SUM(A1:A10), where you are summing cells A1 through A10.
How is the AVERAGE function written in Excel?
AVERAGE is written as =AVERAGE(A1:A10), where you are averaging cells A1 through A10.
How is the COUNTIF function written in Excel?
COUNTIF is written as =COUNTIF(A1:A10,B1) where A1:A10 is the range you are counting and B1 is the cell containing the criteria. This can also be a word presented in quotation marks. For example =COUNTIF(A1:A10,“anexample”) where “anexample” is the criteria.
What are some risks associated with poor spreadsheet design? 4
Risks include:
1. Inconsistent design between people and departments.
2. Poor design and presentation of results.
3. Lack of documentation, making spreadsheets hard to use.
4. Loss of data through corruption or deletion.
What are some principles of good spreadsheet design? 9
- Ensure it is the right tool for the job.
- Adopt a standard layout and construction.
- Peer review spreadsheets.
- Train users in their use.
- Design for the long term with adaptable construction.
- Keep formulas short, simple, and consistent.
- Avoid embedding variable numbers in functions.
- Use backups and version control with built-in checks and alerts.
- Use the ‘protect cells’ feature to restrict editing.
Why should accountants maintain professional skepticism when reviewing data?
Accountants should remain skeptical to ensure the validity of data and information provided, as it may not always be accurate or reliable.
What are comparability issues in data?
Comparability issues arise when data from multiple sources differ in definition or measurement. For example, different countries will use different methods to recognise unemployed people differently (e.g. not classifying someone as being unemployed until they have been out of a job for at least three months or not at all if they left work voluntarily).
What are outliers, and why are they significant in data analysis?
Outliers are observations that deviate significantly from the norm. They can skew averages and may not reflect typical performance. For example, a runner runs 50 miles for four weeks and then sustains an ankle injury after running only 3 miles in the fifth. Her mean/average would be 40.6 miles (4 × 50 + 3 all divided by 5). However, this is not indicative of her usual performance.
What is data bias, and how does it affect representative samples?
Data bias occurs when the sample is not representative of the population, often due to improper sampling techniques or inherent biases.
What is selection bias?
Selection bias occurs when data is not randomly selected, leading to a sample that is not representative of the population.
What is self-selection bias?
Self-selection bias happens when individuals voluntarily opt into the sample, such as customers participating in an online survey, leading to skewed results.
What is observer bias?
Observer bias arises when researchers’ assumptions influence their observations, potentially distorting results. e.g. In a population of schoolchildren, the researcher decides to select those that look happy
What is omitted variable bias?
Omitted variable bias occurs when important variables are excluded, leading to incorrect findings or incomplete conclusions. e.g. For example, they could ask the public if they like a product but not whether they would actually be interested in buying it.
What is cognitive bias?
Cognitive bias relates to how data is presented and perceived, potentially leading to misleading interpretations, such as overstating the significance of a growth rate. e.g. For example, a company could boast of profit growth of 20%, which sounds impressive to shareholders until they learn that the market grew by 30%!
What is confirmation bias?
Confirmation bias happens when researchers accept data that supports their beliefs while ignoring contradictory data. e.g. A car company decides to launch a radical new model despite market research suggesting it will flop in the market.
What is survivorship bias?
Survivorship bias arises when only successful data points are considered, ignoring failures, which can lead to misleading conclusions. e.g. A firm could let students sit their BTF exam if they achieve over 45% in their mock exam. The firm later boasts that 95% of their students passed BTF in the last sitting but can only do so because they prevented some students from take the exam.
What is hypothesis testing?
Hypothesis testing uses data to confirm whether a predetermined idea, called the ‘null hypothesis,’ is true, or whether an alternative hypothesis is true.
What is the null hypothesis?
The null hypothesis is the assumption that there is no significant difference in the data being tested. It is rejected if the sample shows statistically significant differences.
What is a Type I error in hypothesis testing?
A Type I error, or false positive, occurs when the null hypothesis is true but is rejected because the sample result is significantly different.
What is a Type II error in hypothesis testing?
A Type II error, or false negative, occurs when the null hypothesis is false but is accepted because the sample result is not significantly different from the null hypothesis.
Provide an example of a Type II error.
A sports retail company believes the average age of its customers is 32. A sample of 100 customers shows a mean not significantly different from 32, so the hypothesis is accepted. However, the true average age is 24. This is a Type II error.
What are potential problems with data in a business context?
Problems include data inaccuracies, incompleteness, redundancy, lack of timeliness, and security breaches.
What are best practices for the presentation of information?
Effective presentation involves clarity, appropriate visualization tools, concise summaries, and consideration of the audience’s needs.
What are the principles of effective visualizations? 4
- Visualizations should enlighten, not confuse the user.
- Use an appropriate scale to avoid exaggerating or minimizing variations.
- Ensure charts are correctly titled, labeled, and include legends where appropriate.
- Use colors and shading to distinguish components.
What is a bar chart, and when is it useful?
A bar chart uses bars to represent data values and is useful for displaying discrete data and making comparisons across different datasets.
What is the difference between clustered and stacked bar charts?
Clustered bar charts show breakdowns of data with separate bars for each category. Stacked bar charts combine data into one column, breaking it into components.
What is a pie chart, and what is it used for?
A pie chart shows components as proportions of a total. The size of each segment reflects its share of the total, but pie charts are limited to one time period.
What is a line chart, and what is it used for?
A line chart visualizes trends over time, such as quarterly sales and profits, by plotting data points connected by lines.
What is ‘big data,’ and why is it important?
Big data refers to large, complex datasets that traditional data-processing tools cannot handle. It is important for uncovering insights and trends at scale, enabling advanced analytics and decision-making.
What are the four key characteristics of big data? 4 V’s
- Volume: The amount of data available is much higher than in previous years.
- Velocity: Big data is streamed at great speed, allowing for real-time analysis.
- Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
- Veracity: Refers to the trustworthiness and accuracy of the data.
4 V’s characteristics of big data
- Volume: The amount of data available is much higher than in previous years.
- Velocity: Big data is streamed at great speed, allowing for real-time analysis.
- Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
- Veracity: Refers to the trustworthiness and accuracy of the data.
What is structured data?
Structured data is organized with a specific purpose and inherent structure, typically derived from website clicks or specific actions. Examples include: Created data: Data purposefully created by an organization for research or products. Provoked data: Data obtained from users expressing their views. Transacted data: Data from transactions like sales or website traffic. Compiled data: Data collected by third parties, such as market research or credit ratings.
Define Structured data
Structured data: is data which is obtained with a particular purpose in mind, so has an inherent structure derived from the way in which it is collected, typically from website clicks or particular actions:
Define Structured data - Created data
Created data – data which has been created on purpose by an organisation, usually for product or market research
Define Structured data - Provoked data
Provoked data – data obtained from people who have been given the opportunity to express their views
Define Structured data - Transacted data
Transacted data – data collected about actual transactions such as sales, including all the steps of website traffic that led up to each transaction
Define Structured data - Compiled data
Compiled data – data collected by a third party such as a market research, credit rating or polling organisation and accessed by a business
What is unstructured data?
Unstructured data lacks an inherent structure and is often obtained without a specific purpose. Examples include: Captured data: Created passively from unrelated activities. User-generated data: Voluntarily created content like social media posts.
Define Unstructured data
Unstructured data is obtained without a particular objective so has no inherent structure within itself.
Define Unstructured data - Captured data
Captured data – data which is created passively from unrelated activity and captured without a specific purpose
Define Unstructured data - User-generated data
User-generated data – data which internet users create and voluntarily place online
What are some sources of big data? 4
- Processed data: Derived from traditional business systems.
- Open data: Publicly available data, such as geo-spatial or government data.
- Human-sourced data: Data from social networks, blogs, and emails.
- Machine-generated data: Data from the Internet of Things (e.g., Fitbit devices).
What is data science?
Data science deals with collecting, preparing, managing, analyzing, interpreting, and visualizing large and complex datasets.
What is data analytics?
The process of using fields within the source data itself, rather than predetermined formats, to collect, organise and analyse large sets of data to discover patterns and other useful information which an organisation can use for its future business decisions.
What are the four types of data analytics, and what do they address? 4
- Descriptive analytics: Addresses ‘What has happened?’ (e.g., How did sales change when the price changed?).
- Diagnostic analytics: Addresses ‘Why has this happened?’ (e.g., Why did sales decrease when the price was lowered?).
- Predictive analytics: Addresses ‘What if this happens in the future?’ (e.g., What will happen if we revert the price?).
- Prescriptive analytics: Addresses ‘What next?’ (e.g., What is the best course of action, such as determining a future pricing strategy?).
Data Analytics - Descriptive Analytics
What: has happened? Eg how did sales change when the price changed
Data Analytics - Diagnostic Analytics
Why: has this happened? Eg why sales went down when the price lowered
Data Analytics - Predictive Analytics
What if: this happens in future? Eg what will happen if we changed the price back
Data Analytics - Prescriptive Analytics
What next: is the best course of action? Eg determining a future pricing strategy
What are the key benefits of big data, data science, and data analytics? 6
- Enhanced transparency.
- Performance improvement.
- Market segmentation and customization.
- Improved decision-making.
- Encourages innovation.
- Enables risk management.
What are the key risks of big data, data science, and data analytics? 6
- Running out of storage space.
- Requiring greater skill from the workforce.
- Becoming too dependent on data.
- Information overload.
- Breaching data privacy legislation.
- Breach of cybersecurity.
How can entities protect commercially sensitive information?
Entities can protect commercially sensitive information through intellectual property (IP) laws.
What is copyright protection?
Copyright provides automatic protection for written, dramatic, musical, and artistic work. Lasts for 70 years from the author’s death. Layout of published editions lasts 25 years from publication.
What is a patent, and how long does it last?
A patent protects inventions and products. It must be applied for and granted, lasting 20 years.
What is a design right?
A design right provides automatic protection over a design. It lasts for: 15 years after creation, or 10 years from when it is sold, whichever comes first.
What is a registered design, and how long does it last?
A registered design protects designs for longer than a design right. It must be applied for and granted, lasting 25 years.
What is a trademark, and how long does it last?
A trademark protects product names, jingles, and logos. It must be applied for and granted, lasting 10 years.
What is data ethics?
Data ethics refers to the ethical issues arising from the collection and analysis of data, especially personal data about individuals.
What are the key ethical issues in data ethics? 6 ACRONYMN FOTCOP
- Transparency: Ensuring the use of data given to entities is clear.
- Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
- Privacy: Ensuring information is collected only with consent.
- Ownership of Data: Clarifying who owns data and whether it can be sold.
- Consent: Ensuring individuals understand how their data is used and the implications.
- Open Data: Advocating that data should be publicly available for societal benefit.
FOTCOP - Ethical issues in data ethics
- Transparency: Ensuring the use of data given to entities is clear.
- Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
- Privacy: Ensuring information is collected only with consent.
- Ownership of Data: Clarifying who owns data and whether it can be sold.
- Consent: Ensuring individuals understand how their data is used and the implications.
- Open Data: Advocating that data should be publicly available for societal benefit.