Chapter 13 - Data Analysis Flashcards

Question 1

Q

Define Data

Answer

A

Data: distinct bits of information, in whatever form such as numbers, text, bytes stored in electronic memory or as facts in someone’s mind

Question 2

Q

Define Information

Answer

A

Information: the output of whatever system is used to process data or organise it in a useful way. Data by itself is useless and it’s only when we turn it into information does it become useful.

Question 3

Q

What is the relationship between data and information?

Answer

A

Data by itself is useless and it’s only when we turn it into information does it become useful.

Question 4

Q

Define Quantitative data

Answer

A

Quantitative data is data in the form of numbers, such as the number of units of a product sold each day, and lends itself to statistical analysis. We may say that we are measuring quantitative variables.

Question 5

Q

Define Qualitative data

Answer

A

Qualitative data is data about variables that cannot be expressed numerically, such as nationality, favourite colour or how someone is feeling. We may say we are measuring qualitative attributes.

Question 6

Q

Define Discrete data

Answer

A

Discrete data can only take exact values such as the number of products sold in a day. This kind of data is usually counted.

Question 7

Q

Define Continuous variables

Answer

A

Continuous variables can take any number within a range. For example, a range of 170cm to 171cm would include observations such as 170.4cm, 170.9cm and so on.

Question 8

Q

What is the primary use of data in business?

Answer

A

Data is used to inform decision-making, improve efficiency, identify trends, and support strategic planning.

Question 9

Q

What are some common sources of data and information in a business context?

Answer

A

Sources include internal data (e.g., sales records, employee data), external data (e.g., market research, industry reports), and public data (e.g., government statistics).

Question 10

Q

What is the role of planning in data usage?

Answer

A

Ensures sufficient resources are available by making accurate forecasts to support better decision-making.

Question 11

Q

Why is decision-making an important use of data?

Answer

A

It helps managers evaluate mutually exclusive options and manage varying levels of risk.

Question 12

Q

How does control benefit from data analysis?

Answer

A

Financial and non-financial information helps determine whether the business is meeting its objectives and identifies areas requiring corrective action.

Question 13

Q

Examples of Internal data sources 6

Answer

A

Internal data sources
Organisations can capture data/information internally from a number of different sources:
 transactions
 communication between managers and between managers and their staff
 accounting records
 human resources and payroll records
 machine logs
 procurement data
 timesheets

Question 14

Q

Examples of External data sources 4

Answer

A

External data sources
Data/information collected outside of the organisation may be formal or informal:
 New legislation
 Market research
 Research and development functions may look outside the business to what they should look into
 Companies House to source the financial statements of competitors, customers and suppliers

Question 15

Q

What are the qualities of good information? ACRONYMN ACCURATE

Answer

A

Good information is accurate, relevant, complete, timely, cost-effective, understandable, and actionable.

Question 16

Q

ACCURATE

Answer

A

Qualities of good information
Whatever the information is, it will be deemed to be of good quality if it meets the following criteria:
 Accurate
 Complete
 Cost-beneficial - E.G. HOW MUCH DOES IT COST TO ACQUIRE THE INFO VS ITS USE - DOES IT COST 2K BUT ONLY SAVES 1K ETC
 User-targeted - E.G. AUDIENCE
 Relevant
 Authoritative
 Timely
 Easy to use E.G. IS IT ACCESSIBLE - FORMAT - WHERE ITS STORED - HOW ITS DELIVERED

Question 17

Q

Stages of data analysis 5

Answer

A

Identify info needed
Collect the data e.g. method are we doing a survey, phone call
Analyse the data
Present the information (changing data into information)
Use the information

Question 18

Q

What data set is data analysis carried out on?

Answer

A

Data analysis may be based upon the whole population of data or upon a sample within it. We may say that we analyse a data set, which could either the population or a sample.

Question 19

Q

What are some methods used in data analysis?

Answer

A

Methods include statistical analysis, data visualization, predictive modeling, and data mining techniques.

Question 20

Q

What are the main methodologies of data analysis? 4

Answer

A

Descriptive statistics: Summarizing all data in the dataset.
Inferential statistics: Drawing conclusions about a population based on a sample.
Exploratory data analysis: Identifying relationships and patterns in the data.
Confirmatory data analysis: Testing a hypothesis using statistical methods.

Question 21

Q

Define Descriptive statistics

Answer

A

Descriptive statistics: the statistical summarisation of all of the data in the data set.

Question 22

Q

Define Inferential statistics

Answer

A

Inferential statistics: the statistical findings of a relatively small sample of data are taken to be applicable to the characteristics of the larger population.

Question 23

Q

Define Exploratory data analysis

Answer

A

Exploratory data analysis: the identification of relationships within a sample of data and thus the attributes of those in the relationship. A good example of this is churn which is a set of customers that switch to alternative suppliers.

Question 24

Q

Define Confirmatory data analysis

Answer

A

Confirmatory data analysis: the use of statistical analysis to confirm a pre-determined hypothesis. A good example would be the production manager whose instinct tells her that 5% of products off a particular line are faulty. She investigates to see if this is correct.

Question 25

Q

What are the challenges of sampling in data analysis?

Answer

A

Results may not represent the population exactly, as they are estimates. Sampling bias and insufficient sample size can affect accuracy.

Question 26

Q

What are some ways to improve sampling estimates? 2

Answer

A

Choosing sampling methods that reduce bias.
Increasing the sample size to make it more representative of the population.

Question 27

Q

What are the three main sampling methods? 3

Answer

A

Simple random sampling: Every item in the population has an equal chance of selection using a random number generator.
Systematic sampling: Selecting every nth observation in the population after a random initial selection.
Stratified sampling: Dividing the population into subgroups (strata) and randomly sampling from each strata to ensure representation.

Question 28

Q

Define Simple random sampling

Answer

A

Simple random sampling: a random number generator is used to select a sample from within the population. The disadvantage is that the resulting sample may, through chance, not be representative of the population.

Question 29

Q

Define Systematic sampling

Answer

A

Systematic sampling: following a random initial selection, every nth observation from within a population is selected. This avoids the chance of an unrepresentative sample being taken.

Question 30

Q

Define Stratified sampling

Answer

A

Stratified sampling: the population is divided into sub populations (strata) based on a particular characteristic. A number of observations are then taken randomly from each strata. This ensures that all strata are represented in the sample.

Question 31

Q

What is a survey in the context of data collection?

Answer

A

A survey is a method of acquiring information about a population by asking questions to targeted respondents.

Question 32

Q

What are some good practices for creating survey questions? 4

Answer

A

Use simple, short, direct, and specific questions.
Avoid leading questions that hint at the correct answer.
Avoid double-barreled questions that introduce ambiguity (e.g., ‘Do you like cats and dogs?’).
Use scales to gauge the level of an answer.

Question 33

Q

Why should long surveys be avoided?

Answer

A

Long surveys may cause recipients to suffer from survey fatigue and abandon the process before completion.

Question 34

Q

Why is prioritization important in survey design?

Answer

A

Prioritizing questions ensures that key data is captured before participants may abandon the survey.

Question 35

Q

What is the purpose of conducting surveys in data analysis?

Answer

A

Surveys gather specific, actionable information directly from stakeholders or target audiences to inform business decisions.

Question 36

Q

How can surveys ensure representative responses?

Answer

A

Target respondents should be representative of the population as a whole.

Question 37

Q

How can surveys avoid self-selection bias?

Answer

A

Achieving a high response rate ensures that respondents are not only those with extreme views or opinions, avoiding self-selection bias.

Question 38

Q

When might a survey not be the right tool?

Answer

A

Surveys may not be suitable when in-depth discussions are needed; focus groups might be a better option in such cases.

Question 39

Q

What are spreadsheets commonly used for in data analysis?

Answer

A

Spreadsheets are used for data organization, analysis, calculation, visualization, and reporting.

Question 40

Q

What are the three basic Excel functions you need to know for your exam?

Answer

A

The three basic Excel functions are SUM, AVERAGE, and COUNTIF.

Question 41

Q

How is the SUM function written in Excel?

Answer

A

SUM is written as =SUM(A1:A10), where you are summing cells A1 through A10.

Question 42

Q

How is the AVERAGE function written in Excel?

Answer

A

AVERAGE is written as =AVERAGE(A1:A10), where you are averaging cells A1 through A10.

Question 43

Q

How is the COUNTIF function written in Excel?

Answer

A

COUNTIF is written as =COUNTIF(A1:A10,B1) where A1:A10 is the range you are counting and B1 is the cell containing the criteria. This can also be a word presented in quotation marks. For example =COUNTIF(A1:A10,“anexample”) where “anexample” is the criteria.

Question 44

Q

What are some risks associated with poor spreadsheet design? 4

Answer

A

Risks include:
1. Inconsistent design between people and departments.
2. Poor design and presentation of results.
3. Lack of documentation, making spreadsheets hard to use.
4. Loss of data through corruption or deletion.

Question 45

Q

What are some principles of good spreadsheet design? 9

Answer

A

Ensure it is the right tool for the job.
Adopt a standard layout and construction.
Peer review spreadsheets.
Train users in their use.
Design for the long term with adaptable construction.
Keep formulas short, simple, and consistent.
Avoid embedding variable numbers in functions.
Use backups and version control with built-in checks and alerts.
Use the ‘protect cells’ feature to restrict editing.

Question 46

Q

Why should accountants maintain professional skepticism when reviewing data?

Answer

A

Accountants should remain skeptical to ensure the validity of data and information provided, as it may not always be accurate or reliable.

Question 47

Q

What are comparability issues in data?

Answer

A

Comparability issues arise when data from multiple sources differ in definition or measurement. For example, different countries will use different methods to recognise unemployed people differently (e.g. not classifying someone as being unemployed until they have been out of a job for at least three months or not at all if they left work voluntarily).

Question 48

Q

What are outliers, and why are they significant in data analysis?

Answer

A

Outliers are observations that deviate significantly from the norm. They can skew averages and may not reflect typical performance. For example, a runner runs 50 miles for four weeks and then sustains an ankle injury after running only 3 miles in the fifth. Her mean/average would be 40.6 miles (4 × 50 + 3 all divided by 5). However, this is not indicative of her usual performance.

Question 49

Q

What is data bias, and how does it affect representative samples?

Answer

A

Data bias occurs when the sample is not representative of the population, often due to improper sampling techniques or inherent biases.

Question 50

Q

What is selection bias?

Answer

A

Selection bias occurs when data is not randomly selected, leading to a sample that is not representative of the population.

Question 51

Q

What is self-selection bias?

Answer

A

Self-selection bias happens when individuals voluntarily opt into the sample, such as customers participating in an online survey, leading to skewed results.

Question 52

Q

What is observer bias?

Answer

A

Observer bias arises when researchers’ assumptions influence their observations, potentially distorting results. e.g. In a population of schoolchildren, the researcher decides to select those that look happy

Question 53

Q

What is omitted variable bias?

Answer

A

Omitted variable bias occurs when important variables are excluded, leading to incorrect findings or incomplete conclusions. e.g. For example, they could ask the public if they like a product but not whether they would actually be interested in buying it.

Question 54

Q

What is cognitive bias?

Answer

A

Cognitive bias relates to how data is presented and perceived, potentially leading to misleading interpretations, such as overstating the significance of a growth rate. e.g. For example, a company could boast of profit growth of 20%, which sounds impressive to shareholders until they learn that the market grew by 30%!

Question 55

Q

What is confirmation bias?

Answer

A

Confirmation bias happens when researchers accept data that supports their beliefs while ignoring contradictory data. e.g. A car company decides to launch a radical new model despite market research suggesting it will flop in the market.

Question 56

Q

What is survivorship bias?

Answer

A

Survivorship bias arises when only successful data points are considered, ignoring failures, which can lead to misleading conclusions. e.g. A firm could let students sit their BTF exam if they achieve over 45% in their mock exam. The firm later boasts that 95% of their students passed BTF in the last sitting but can only do so because they prevented some students from take the exam.

Question 57

Q

What is hypothesis testing?

Answer

A

Hypothesis testing uses data to confirm whether a predetermined idea, called the ‘null hypothesis,’ is true, or whether an alternative hypothesis is true.

Question 58

Q

What is the null hypothesis?

Answer

A

The null hypothesis is the assumption that there is no significant difference in the data being tested. It is rejected if the sample shows statistically significant differences.

Question 59

Q

What is a Type I error in hypothesis testing?

Answer

A

A Type I error, or false positive, occurs when the null hypothesis is true but is rejected because the sample result is significantly different.

Question 60

Q

What is a Type II error in hypothesis testing?

Answer

A

A Type II error, or false negative, occurs when the null hypothesis is false but is accepted because the sample result is not significantly different from the null hypothesis.

Question 61

Q

Provide an example of a Type II error.

Answer

A

A sports retail company believes the average age of its customers is 32. A sample of 100 customers shows a mean not significantly different from 32, so the hypothesis is accepted. However, the true average age is 24. This is a Type II error.

Question 62

Q

What are potential problems with data in a business context?

Answer

A

Problems include data inaccuracies, incompleteness, redundancy, lack of timeliness, and security breaches.

Question 63

Q

What are best practices for the presentation of information?

Answer

A

Effective presentation involves clarity, appropriate visualization tools, concise summaries, and consideration of the audience’s needs.

Question 64

Q

What are the principles of effective visualizations? 4

Answer

A

Visualizations should enlighten, not confuse the user.
Use an appropriate scale to avoid exaggerating or minimizing variations.
Ensure charts are correctly titled, labeled, and include legends where appropriate.
Use colors and shading to distinguish components.

Answer 65

A

A bar chart uses bars to represent data values and is useful for displaying discrete data and making comparisons across different datasets.

Answer 66

A

Clustered bar charts show breakdowns of data with separate bars for each category. Stacked bar charts combine data into one column, breaking it into components.

Answer 67

A

A pie chart shows components as proportions of a total. The size of each segment reflects its share of the total, but pie charts are limited to one time period.

Answer 68

A

A line chart visualizes trends over time, such as quarterly sales and profits, by plotting data points connected by lines.

Answer 69

A

Big data refers to large, complex datasets that traditional data-processing tools cannot handle. It is important for uncovering insights and trends at scale, enabling advanced analytics and decision-making.

Answer 70

A

Volume: The amount of data available is much higher than in previous years.
Velocity: Big data is streamed at great speed, allowing for real-time analysis.
Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
Veracity: Refers to the trustworthiness and accuracy of the data.

Answer 71

A

Volume: The amount of data available is much higher than in previous years.
Velocity: Big data is streamed at great speed, allowing for real-time analysis.
Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
Veracity: Refers to the trustworthiness and accuracy of the data.

Answer 72

A

Structured data is organized with a specific purpose and inherent structure, typically derived from website clicks or specific actions. Examples include: Created data: Data purposefully created by an organization for research or products. Provoked data: Data obtained from users expressing their views. Transacted data: Data from transactions like sales or website traffic. Compiled data: Data collected by third parties, such as market research or credit ratings.

Answer 73

A

Structured data: is data which is obtained with a particular purpose in mind, so has an inherent structure derived from the way in which it is collected, typically from website clicks or particular actions:

Answer 74

A

Created data – data which has been created on purpose by an organisation, usually for product or market research

Answer 75

A

Provoked data – data obtained from people who have been given the opportunity to express their views

Answer 76

A

Transacted data – data collected about actual transactions such as sales, including all the steps of website traffic that led up to each transaction

Answer 77

A

Compiled data – data collected by a third party such as a market research, credit rating or polling organisation and accessed by a business

Answer 78

A

Unstructured data lacks an inherent structure and is often obtained without a specific purpose. Examples include: Captured data: Created passively from unrelated activities. User-generated data: Voluntarily created content like social media posts.

Answer 79

A

Unstructured data is obtained without a particular objective so has no inherent structure within itself.

Answer 80

A

Captured data – data which is created passively from unrelated activity and captured without a specific purpose

Answer 81

A

User-generated data – data which internet users create and voluntarily place online

Answer 82

A

Processed data: Derived from traditional business systems.
Open data: Publicly available data, such as geo-spatial or government data.
Human-sourced data: Data from social networks, blogs, and emails.
Machine-generated data: Data from the Internet of Things (e.g., Fitbit devices).

Answer 83

A

Data science deals with collecting, preparing, managing, analyzing, interpreting, and visualizing large and complex datasets.

Answer 84

A

The process of using fields within the source data itself, rather than predetermined formats, to collect, organise and analyse large sets of data to discover patterns and other useful information which an organisation can use for its future business decisions.

Answer 85

A

Descriptive analytics: Addresses ‘What has happened?’ (e.g., How did sales change when the price changed?).
Diagnostic analytics: Addresses ‘Why has this happened?’ (e.g., Why did sales decrease when the price was lowered?).
Predictive analytics: Addresses ‘What if this happens in the future?’ (e.g., What will happen if we revert the price?).
Prescriptive analytics: Addresses ‘What next?’ (e.g., What is the best course of action, such as determining a future pricing strategy?).

Answer 86

A

What: has happened? Eg how did sales change when the price changed

Answer 87

A

Why: has this happened? Eg why sales went down when the price lowered

Answer 88

A

What if: this happens in future? Eg what will happen if we changed the price back

Answer 89

A

What next: is the best course of action? Eg determining a future pricing strategy

Answer 90

A

Enhanced transparency.
Performance improvement.
Market segmentation and customization.
Improved decision-making.
Encourages innovation.
Enables risk management.

Answer 91

A

Running out of storage space.
Requiring greater skill from the workforce.
Becoming too dependent on data.
Information overload.
Breaching data privacy legislation.
Breach of cybersecurity.

Answer 92

A

Entities can protect commercially sensitive information through intellectual property (IP) laws.

Answer 93

A

Copyright provides automatic protection for written, dramatic, musical, and artistic work. Lasts for 70 years from the author’s death. Layout of published editions lasts 25 years from publication.

Answer 94

A

A patent protects inventions and products. It must be applied for and granted, lasting 20 years.

Answer 95

A

A design right provides automatic protection over a design. It lasts for: 15 years after creation, or 10 years from when it is sold, whichever comes first.

Answer 96

A

A registered design protects designs for longer than a design right. It must be applied for and granted, lasting 25 years.

Answer 97

A

A trademark protects product names, jingles, and logos. It must be applied for and granted, lasting 10 years.

Answer 98

A

Data ethics refers to the ethical issues arising from the collection and analysis of data, especially personal data about individuals.

Answer 99

A

Transparency: Ensuring the use of data given to entities is clear.
Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
Privacy: Ensuring information is collected only with consent.
Ownership of Data: Clarifying who owns data and whether it can be sold.
Consent: Ensuring individuals understand how their data is used and the implications.
Open Data: Advocating that data should be publicly available for societal benefit.

Answer 100

A

Transparency: Ensuring the use of data given to entities is clear.
Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
Privacy: Ensuring information is collected only with consent.
Ownership of Data: Clarifying who owns data and whether it can be sold.
Consent: Ensuring individuals understand how their data is used and the implications.
Open Data: Advocating that data should be publicly available for societal benefit.