Chapter 13 - Data Analysis Flashcards

1
Q

Define Data

A

Data: distinct bits of information, in whatever form such as numbers, text, bytes stored in electronic memory or as facts in someone’s mind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Information

A

Information: the output of whatever system is used to process data or organise it in a useful way. Data by itself is useless and it’s only when we turn it into information does it become useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the relationship between data and information?

A

Data by itself is useless and it’s only when we turn it into information does it become useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Quantitative data

A

Quantitative data is data in the form of numbers, such as the number of units of a product sold each day, and lends itself to statistical analysis. We may say that we are measuring quantitative variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Qualitative data

A

Qualitative data is data about variables that cannot be expressed numerically, such as nationality, favourite colour or how someone is feeling. We may say we are measuring qualitative attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Discrete data

A

Discrete data can only take exact values such as the number of products sold in a day. This kind of data is usually counted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Continuous variables

A

Continuous variables can take any number within a range. For example, a range of 170cm to 171cm would include observations such as 170.4cm, 170.9cm and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the primary use of data in business?

A

Data is used to inform decision-making, improve efficiency, identify trends, and support strategic planning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some common sources of data and information in a business context?

A

Sources include internal data (e.g., sales records, employee data), external data (e.g., market research, industry reports), and public data (e.g., government statistics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the role of planning in data usage?

A

Ensures sufficient resources are available by making accurate forecasts to support better decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is decision-making an important use of data?

A

It helps managers evaluate mutually exclusive options and manage varying levels of risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does control benefit from data analysis?

A

Financial and non-financial information helps determine whether the business is meeting its objectives and identifies areas requiring corrective action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Examples of Internal data sources 6

A

Internal data sources
Organisations can capture data/information internally from a number of different sources:
 transactions
 communication between managers and between managers and their staff
 accounting records
 human resources and payroll records
 machine logs
 procurement data
 timesheets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Examples of External data sources 4

A

External data sources
Data/information collected outside of the organisation may be formal or informal:
 New legislation
 Market research
 Research and development functions may look outside the business to what they should look into
 Companies House to source the financial statements of competitors, customers and suppliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the qualities of good information? ACRONYMN ACCURATE

A

Good information is accurate, relevant, complete, timely, cost-effective, understandable, and actionable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ACCURATE

A

Qualities of good information
Whatever the information is, it will be deemed to be of good quality if it meets the following criteria:
 Accurate
 Complete
 Cost-beneficial - E.G. HOW MUCH DOES IT COST TO ACQUIRE THE INFO VS ITS USE - DOES IT COST 2K BUT ONLY SAVES 1K ETC
 User-targeted - E.G. AUDIENCE
 Relevant
 Authoritative
 Timely
 Easy to use E.G. IS IT ACCESSIBLE - FORMAT - WHERE ITS STORED - HOW ITS DELIVERED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Stages of data analysis 5

A
  1. Identify info needed
  2. Collect the data e.g. method are we doing a survey, phone call
  3. Analyse the data
  4. Present the information (changing data into information)
  5. Use the information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What data set is data analysis carried out on?

A

Data analysis may be based upon the whole population of data or upon a sample within it. We may say that we analyse a data set, which could either the population or a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some methods used in data analysis?

A

Methods include statistical analysis, data visualization, predictive modeling, and data mining techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the main methodologies of data analysis? 4

A
  1. Descriptive statistics: Summarizing all data in the dataset.
  2. Inferential statistics: Drawing conclusions about a population based on a sample.
  3. Exploratory data analysis: Identifying relationships and patterns in the data.
  4. Confirmatory data analysis: Testing a hypothesis using statistical methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define Descriptive statistics

A

Descriptive statistics: the statistical summarisation of all of the data in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define Inferential statistics

A

Inferential statistics: the statistical findings of a relatively small sample of data are taken to be applicable to the characteristics of the larger population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define Exploratory data analysis

A

Exploratory data analysis: the identification of relationships within a sample of data and thus the attributes of those in the relationship. A good example of this is churn which is a set of customers that switch to alternative suppliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define Confirmatory data analysis

A

Confirmatory data analysis: the use of statistical analysis to confirm a pre-determined hypothesis. A good example would be the production manager whose instinct tells her that 5% of products off a particular line are faulty. She investigates to see if this is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the challenges of sampling in data analysis?

A

Results may not represent the population exactly, as they are estimates. Sampling bias and insufficient sample size can affect accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are some ways to improve sampling estimates? 2

A
  1. Choosing sampling methods that reduce bias.
  2. Increasing the sample size to make it more representative of the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the three main sampling methods? 3

A
  1. Simple random sampling: Every item in the population has an equal chance of selection using a random number generator.
  2. Systematic sampling: Selecting every nth observation in the population after a random initial selection.
  3. Stratified sampling: Dividing the population into subgroups (strata) and randomly sampling from each strata to ensure representation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Define Simple random sampling

A

Simple random sampling: a random number generator is used to select a sample from within the population. The disadvantage is that the resulting sample may, through chance, not be representative of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Define Systematic sampling

A

Systematic sampling: following a random initial selection, every nth observation from within a population is selected. This avoids the chance of an unrepresentative sample being taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Define Stratified sampling

A

Stratified sampling: the population is divided into sub populations (strata) based on a particular characteristic. A number of observations are then taken randomly from each strata. This ensures that all strata are represented in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a survey in the context of data collection?

A

A survey is a method of acquiring information about a population by asking questions to targeted respondents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are some good practices for creating survey questions? 4

A
  1. Use simple, short, direct, and specific questions.
  2. Avoid leading questions that hint at the correct answer.
  3. Avoid double-barreled questions that introduce ambiguity (e.g., ‘Do you like cats and dogs?’).
  4. Use scales to gauge the level of an answer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why should long surveys be avoided?

A

Long surveys may cause recipients to suffer from survey fatigue and abandon the process before completion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Why is prioritization important in survey design?

A

Prioritizing questions ensures that key data is captured before participants may abandon the survey.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the purpose of conducting surveys in data analysis?

A

Surveys gather specific, actionable information directly from stakeholders or target audiences to inform business decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can surveys ensure representative responses?

A

Target respondents should be representative of the population as a whole.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How can surveys avoid self-selection bias?

A

Achieving a high response rate ensures that respondents are not only those with extreme views or opinions, avoiding self-selection bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

When might a survey not be the right tool?

A

Surveys may not be suitable when in-depth discussions are needed; focus groups might be a better option in such cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are spreadsheets commonly used for in data analysis?

A

Spreadsheets are used for data organization, analysis, calculation, visualization, and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the three basic Excel functions you need to know for your exam?

A

The three basic Excel functions are SUM, AVERAGE, and COUNTIF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How is the SUM function written in Excel?

A

SUM is written as =SUM(A1:A10), where you are summing cells A1 through A10.

42
Q

How is the AVERAGE function written in Excel?

A

AVERAGE is written as =AVERAGE(A1:A10), where you are averaging cells A1 through A10.

43
Q

How is the COUNTIF function written in Excel?

A

COUNTIF is written as =COUNTIF(A1:A10,B1) where A1:A10 is the range you are counting and B1 is the cell containing the criteria. This can also be a word presented in quotation marks. For example =COUNTIF(A1:A10,“anexample”) where “anexample” is the criteria.

44
Q

What are some risks associated with poor spreadsheet design? 4

A

Risks include:
1. Inconsistent design between people and departments.
2. Poor design and presentation of results.
3. Lack of documentation, making spreadsheets hard to use.
4. Loss of data through corruption or deletion.

45
Q

What are some principles of good spreadsheet design? 9

A
  1. Ensure it is the right tool for the job.
  2. Adopt a standard layout and construction.
  3. Peer review spreadsheets.
  4. Train users in their use.
  5. Design for the long term with adaptable construction.
  6. Keep formulas short, simple, and consistent.
  7. Avoid embedding variable numbers in functions.
  8. Use backups and version control with built-in checks and alerts.
  9. Use the ‘protect cells’ feature to restrict editing.
46
Q

Why should accountants maintain professional skepticism when reviewing data?

A

Accountants should remain skeptical to ensure the validity of data and information provided, as it may not always be accurate or reliable.

47
Q

What are comparability issues in data?

A

Comparability issues arise when data from multiple sources differ in definition or measurement. For example, different countries will use different methods to recognise unemployed people differently (e.g. not classifying someone as being unemployed until they have been out of a job for at least three months or not at all if they left work voluntarily).

48
Q

What are outliers, and why are they significant in data analysis?

A

Outliers are observations that deviate significantly from the norm. They can skew averages and may not reflect typical performance. For example, a runner runs 50 miles for four weeks and then sustains an ankle injury after running only 3 miles in the fifth. Her mean/average would be 40.6 miles (4 × 50 + 3 all divided by 5). However, this is not indicative of her usual performance.

49
Q

What is data bias, and how does it affect representative samples?

A

Data bias occurs when the sample is not representative of the population, often due to improper sampling techniques or inherent biases.

50
Q

What is selection bias?

A

Selection bias occurs when data is not randomly selected, leading to a sample that is not representative of the population.

51
Q

What is self-selection bias?

A

Self-selection bias happens when individuals voluntarily opt into the sample, such as customers participating in an online survey, leading to skewed results.

52
Q

What is observer bias?

A

Observer bias arises when researchers’ assumptions influence their observations, potentially distorting results. e.g. In a population of schoolchildren, the researcher decides to select those that look happy

53
Q

What is omitted variable bias?

A

Omitted variable bias occurs when important variables are excluded, leading to incorrect findings or incomplete conclusions. e.g. For example, they could ask the public if they like a product but not whether they would actually be interested in buying it.

54
Q

What is cognitive bias?

A

Cognitive bias relates to how data is presented and perceived, potentially leading to misleading interpretations, such as overstating the significance of a growth rate. e.g. For example, a company could boast of profit growth of 20%, which sounds impressive to shareholders until they learn that the market grew by 30%!

55
Q

What is confirmation bias?

A

Confirmation bias happens when researchers accept data that supports their beliefs while ignoring contradictory data. e.g. A car company decides to launch a radical new model despite market research suggesting it will flop in the market.

56
Q

What is survivorship bias?

A

Survivorship bias arises when only successful data points are considered, ignoring failures, which can lead to misleading conclusions. e.g. A firm could let students sit their BTF exam if they achieve over 45% in their mock exam. The firm later boasts that 95% of their students passed BTF in the last sitting but can only do so because they prevented some students from take the exam.

57
Q

What is hypothesis testing?

A

Hypothesis testing uses data to confirm whether a predetermined idea, called the ‘null hypothesis,’ is true, or whether an alternative hypothesis is true.

58
Q

What is the null hypothesis?

A

The null hypothesis is the assumption that there is no significant difference in the data being tested. It is rejected if the sample shows statistically significant differences.

59
Q

What is a Type I error in hypothesis testing?

A

A Type I error, or false positive, occurs when the null hypothesis is true but is rejected because the sample result is significantly different.

60
Q

What is a Type II error in hypothesis testing?

A

A Type II error, or false negative, occurs when the null hypothesis is false but is accepted because the sample result is not significantly different from the null hypothesis.

61
Q

Provide an example of a Type II error.

A

A sports retail company believes the average age of its customers is 32. A sample of 100 customers shows a mean not significantly different from 32, so the hypothesis is accepted. However, the true average age is 24. This is a Type II error.

62
Q

What are potential problems with data in a business context?

A

Problems include data inaccuracies, incompleteness, redundancy, lack of timeliness, and security breaches.

63
Q

What are best practices for the presentation of information?

A

Effective presentation involves clarity, appropriate visualization tools, concise summaries, and consideration of the audience’s needs.

64
Q

What are the principles of effective visualizations? 4

A
  1. Visualizations should enlighten, not confuse the user.
  2. Use an appropriate scale to avoid exaggerating or minimizing variations.
  3. Ensure charts are correctly titled, labeled, and include legends where appropriate.
  4. Use colors and shading to distinguish components.
65
Q

What is a bar chart, and when is it useful?

A

A bar chart uses bars to represent data values and is useful for displaying discrete data and making comparisons across different datasets.

66
Q

What is the difference between clustered and stacked bar charts?

A

Clustered bar charts show breakdowns of data with separate bars for each category. Stacked bar charts combine data into one column, breaking it into components.

67
Q

What is a pie chart, and what is it used for?

A

A pie chart shows components as proportions of a total. The size of each segment reflects its share of the total, but pie charts are limited to one time period.

68
Q

What is a line chart, and what is it used for?

A

A line chart visualizes trends over time, such as quarterly sales and profits, by plotting data points connected by lines.

69
Q

What is ‘big data,’ and why is it important?

A

Big data refers to large, complex datasets that traditional data-processing tools cannot handle. It is important for uncovering insights and trends at scale, enabling advanced analytics and decision-making.

70
Q

What are the four key characteristics of big data? 4 V’s

A
  1. Volume: The amount of data available is much higher than in previous years.
  2. Velocity: Big data is streamed at great speed, allowing for real-time analysis.
  3. Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
  4. Veracity: Refers to the trustworthiness and accuracy of the data.
71
Q

4 V’s characteristics of big data

A
  1. Volume: The amount of data available is much higher than in previous years.
  2. Velocity: Big data is streamed at great speed, allowing for real-time analysis.
  3. Variety: Big data includes diverse types of information, such as customer transactions and social media activity.
  4. Veracity: Refers to the trustworthiness and accuracy of the data.
72
Q

What is structured data?

A

Structured data is organized with a specific purpose and inherent structure, typically derived from website clicks or specific actions. Examples include: Created data: Data purposefully created by an organization for research or products. Provoked data: Data obtained from users expressing their views. Transacted data: Data from transactions like sales or website traffic. Compiled data: Data collected by third parties, such as market research or credit ratings.

73
Q

Define Structured data

A

Structured data: is data which is obtained with a particular purpose in mind, so has an inherent structure derived from the way in which it is collected, typically from website clicks or particular actions:

74
Q

Define Structured data - Created data

A

Created data – data which has been created on purpose by an organisation, usually for product or market research

75
Q

Define Structured data - Provoked data

A

Provoked data – data obtained from people who have been given the opportunity to express their views

76
Q

Define Structured data - Transacted data

A

Transacted data – data collected about actual transactions such as sales, including all the steps of website traffic that led up to each transaction

77
Q

Define Structured data - Compiled data

A

Compiled data – data collected by a third party such as a market research, credit rating or polling organisation and accessed by a business

78
Q

What is unstructured data?

A

Unstructured data lacks an inherent structure and is often obtained without a specific purpose. Examples include: Captured data: Created passively from unrelated activities. User-generated data: Voluntarily created content like social media posts.

79
Q

Define Unstructured data

A

Unstructured data is obtained without a particular objective so has no inherent structure within itself.

80
Q

Define Unstructured data - Captured data

A

Captured data – data which is created passively from unrelated activity and captured without a specific purpose

81
Q

Define Unstructured data - User-generated data

A

User-generated data – data which internet users create and voluntarily place online

82
Q

What are some sources of big data? 4

A
  1. Processed data: Derived from traditional business systems.
  2. Open data: Publicly available data, such as geo-spatial or government data.
  3. Human-sourced data: Data from social networks, blogs, and emails.
  4. Machine-generated data: Data from the Internet of Things (e.g., Fitbit devices).
83
Q

What is data science?

A

Data science deals with collecting, preparing, managing, analyzing, interpreting, and visualizing large and complex datasets.

84
Q

What is data analytics?

A

The process of using fields within the source data itself, rather than predetermined formats, to collect, organise and analyse large sets of data to discover patterns and other useful information which an organisation can use for its future business decisions.

85
Q

What are the four types of data analytics, and what do they address? 4

A
  1. Descriptive analytics: Addresses ‘What has happened?’ (e.g., How did sales change when the price changed?).
  2. Diagnostic analytics: Addresses ‘Why has this happened?’ (e.g., Why did sales decrease when the price was lowered?).
  3. Predictive analytics: Addresses ‘What if this happens in the future?’ (e.g., What will happen if we revert the price?).
  4. Prescriptive analytics: Addresses ‘What next?’ (e.g., What is the best course of action, such as determining a future pricing strategy?).
86
Q

Data Analytics - Descriptive Analytics

A

What: has happened? Eg how did sales change when the price changed

87
Q

Data Analytics - Diagnostic Analytics

A

Why: has this happened? Eg why sales went down when the price lowered

88
Q

Data Analytics - Predictive Analytics

A

What if: this happens in future? Eg what will happen if we changed the price back

89
Q

Data Analytics - Prescriptive Analytics

A

What next: is the best course of action? Eg determining a future pricing strategy

90
Q

What are the key benefits of big data, data science, and data analytics? 6

A
  1. Enhanced transparency.
  2. Performance improvement.
  3. Market segmentation and customization.
  4. Improved decision-making.
  5. Encourages innovation.
  6. Enables risk management.
91
Q

What are the key risks of big data, data science, and data analytics? 6

A
  1. Running out of storage space.
  2. Requiring greater skill from the workforce.
  3. Becoming too dependent on data.
  4. Information overload.
  5. Breaching data privacy legislation.
  6. Breach of cybersecurity.
92
Q

How can entities protect commercially sensitive information?

A

Entities can protect commercially sensitive information through intellectual property (IP) laws.

93
Q

What is copyright protection?

A

Copyright provides automatic protection for written, dramatic, musical, and artistic work. Lasts for 70 years from the author’s death. Layout of published editions lasts 25 years from publication.

94
Q

What is a patent, and how long does it last?

A

A patent protects inventions and products. It must be applied for and granted, lasting 20 years.

95
Q

What is a design right?

A

A design right provides automatic protection over a design. It lasts for: 15 years after creation, or 10 years from when it is sold, whichever comes first.

96
Q

What is a registered design, and how long does it last?

A

A registered design protects designs for longer than a design right. It must be applied for and granted, lasting 25 years.

97
Q

What is a trademark, and how long does it last?

A

A trademark protects product names, jingles, and logos. It must be applied for and granted, lasting 10 years.

98
Q

What is data ethics?

A

Data ethics refers to the ethical issues arising from the collection and analysis of data, especially personal data about individuals.

99
Q

What are the key ethical issues in data ethics? 6 ACRONYMN FOTCOP

A
  1. Transparency: Ensuring the use of data given to entities is clear.
  2. Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
  3. Privacy: Ensuring information is collected only with consent.
  4. Ownership of Data: Clarifying who owns data and whether it can be sold.
  5. Consent: Ensuring individuals understand how their data is used and the implications.
  6. Open Data: Advocating that data should be publicly available for societal benefit.
100
Q

FOTCOP - Ethical issues in data ethics

A
  1. Transparency: Ensuring the use of data given to entities is clear.
  2. Fairness: Avoiding discrimination in the collection, storage, and analysis of data.
  3. Privacy: Ensuring information is collected only with consent.
  4. Ownership of Data: Clarifying who owns data and whether it can be sold.
  5. Consent: Ensuring individuals understand how their data is used and the implications.
  6. Open Data: Advocating that data should be publicly available for societal benefit.