Presenting Categorical Data Flashcards

1
Q

What is E.D.A.?

A

EDA, or Exploratory Data Analysis, is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It’s an essential first step in data analysis, enabling you to understand the data’s structure, spot any anomalies, identify patterns, and make initial insights before applying more complex statistical models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key aspects of EDA?

A
  1. Data Visualization: Using charts, graphs, and plots to visualize data. This helps in understanding data distribution, detecting outliers, and identifying relationships between variables.
  2. Summary Statistics: Calculating measures like mean, median, mode, variance, and standard deviation to get a sense of the central tendency and dispersion of the data.
  3. Identifying Data Quality Issues: Detecting missing values, incorrect data, and inconsistencies that may affect the analysis.
  4. Hypothesis Generation: EDA helps generate hypotheses about the data that can be tested using more formal statistical techniques later.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are outliers?

A

Outliers are data points that significantly differ from the other observations in a dataset. They are values that are either much higher or much lower than the rest of the data. Outliers can occur due to variability in the data or might indicate measurement errors, data entry mistakes, or unusual conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the impact of outliers on analysis?

A

Outliers can skew statistical analyses, such as the mean or standard deviation, leading to misleading results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can outliers be identified?

A

Outliers can be identified through various methods, including visual inspections (like box plots) and statistical tests (such as Z-scores).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can outliers be dealt with?

A

Depending on the context, outliers may be excluded, transformed, or analyzed separately to avoid distortion of the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can nominal data be analyzed and presented?

A
  • Frequency Tables: To show the number of occurrences for each category.
  • Relative Frequency Tables: To compare the proportion of each category.
  • Visual Tools: Such as pie charts, pareto charts and bar charts, to make data easier to understand.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Frequency Table?

A

A frequency table is a simple way to organize data to show the frequency, or count, of different categories or classes. It is particularly useful for summarizing categorical data, where each row represents a category and the corresponding count is listed in the adjacent column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are relative frequencies and percetages and what is it used for?

A

To make comparisons easier, especially across different datasets, you can convert the counts to relative frequencies (proportions) and percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is relative frequency calculated?

A
  • Relative frequency= frequency of category/ n (total number of observations)
  • Percentage is simply the relative frequency multiplied by 100.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why are frequency tables important and where are they used?

A

Frequency tables are important because they:

  • Simplify Data: They provide a clear and concise way to organize and summarize categorical data.
  • Enable Comparison Across Groups: Relative frequencies and percentages allow for easier comparison across groups of different sizes.
  • Serve as a Basis for Visualizations: They form the foundation for creating visual representations of data, like bar charts and pie charts, which further enhance understanding.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the uses of pie charts and bar charts?

A
  • Pie Charts: Used to show the proportion of different categories within a whole, making it easy to see how each part compares to the total.
  • Bar Charts: Useful for comparing multiple categories side by side, without implying any order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Production costs?

A

Production costs refer to the expenses incurred by a business or organization in the process of manufacturing a product or providing a service. These costs are essential for determining the overall cost of producing goods and are critical in setting the selling price and determining profitability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Different types of production costs?

A
  • Direct costs
  • Indirect costs
  • Fixed costs
  • Variable costs
  • Semi-variable costs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a pie chart?

A

A pie chart is a circular graphic divided into slices to represent numerical proportions. Each slice corresponds to a category, and the size of the slice is proportional to the quantity it represents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When should a pie chart be used?

A
  • Visually represent the relative proportions or percentages of different categories within a whole dataset.
  • Compare parts of a whole quickly and easily.
  • Highlight the dominance of a particular category when there are a limited number of categories.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the key components of a pie chart?

A
  • Title: Indicates what the chart represents.
  • Slices: Represent each category, sized according to its proportion of the total.
  • Labels: Provide the category name and percentage for each slice.
  • Legend: Explains the colors or patterns used for different slices.
  • Colors/Patterns: Differentiate between categories visually.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you create a pie chart?

A
  1. Categorize the data
  2. Calculate the total
  3. Divide the categories
  4. Convert into percentages
  5. Finally, calculate the degrees

The total value of the pie is always 100%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the pie chart formula?

A

(Given Data/Total value of Data) × 360°

To find the angles of each pie sector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to convert values into percentages?

A

Once you get the total of the values, you divide the individual value by the total and multiply by 100 to get a percentage.

E.X: If 10 out of 40 students like playing football, percentage would be- (10/40) x 100=25%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the advantages of using pie charts?

A
  • Easy Interpretation: Quickly shows the proportionate relationships between categories.
  • Visual Appeal: Attractive and engaging presentation of data.
  • Simplicity: Straightforward to construct and understand, especially for small datasets.
  • Immediate Impact: Effective for highlighting dominant categories or significant differences.
22
Q

What are the disadvantages and limitations of pie charts?

A
  • Ineffective with Many Categories: Becomes cluttered and hard to read when there are too many slices.
  • Difficult for Exact Comparisons: Not ideal for precise comparisons, especially when differences are slight.
  • Misleading Perspectives: 3D effects and exploding slices can distort perception.
  • Area Misinterpretation: Human eyes find it challenging to judge areas and angles accurately, leading to potential misinterpretation.
  • Static Representation: Not effective for showing changes over time or complex relationships.
23
Q

What are bar charts used for in presenting categorical data?

A

Bar charts are commonly used for presenting categorical data because they provide a simple and clear visual representation of data. The bars can be either vertical or horizontal, and they are used to show the frequency or percentage of different categories. The length of each bar corresponds to the value of each category. Bar charts are especially useful when comparing multiple groups.

24
Q

How does Excel treat bar charts?

A

In Excel, what most people refer to as “bar charts” are called “column charts,” where bars are drawn vertically. True bar charts, where the bars are horizontal, are treated similarly in Excel but are just flipped horizontally. Excel automates the creation of these charts and provides options to customize the colors, labels, and format to suit the user’s needs.

25
Q

Why should the bars in a bar chart not touch each other?

A

The bars in a bar chart should not touch because bar charts represent categorical data, which has distinct categories. If the bars touch, it can misleadingly suggest a relationship or continuity between the categories, which isn’t appropriate for nominal data where the categories are unrelated.

26
Q

What is an advantage of using bar charts over pie charts?

A

One major advantage of using bar charts over pie charts is that bar charts allow for easy comparison of multiple groups simultaneously through side-by-side comparisons.

27
Q

What are some potential problems with bar charts?

A

One common problem with bar charts is the addition of visual enhancements like 3D effects, which may make the chart more visually appealing but harder to interpret accurately. For instance, a 3D effect can distort the lengths of the bars, making it difficult to estimate the actual values they represent.

28
Q

What is a Pareto Chart?

A

A Pareto Chart is a type of bar chart that displays categories in descending order of importance or frequency, showing which factors have the biggest impact. It is used to illustrate the Pareto principle.

29
Q

What is the Pareto principle?

A

The Pareto Principle, or 80/20 rule, suggests that 80% of effects come from 20% of causes. For example, 80% of problems might come from 20% of issues.

Named after the Italian economist Vilfredo Pareto (1848-1923)

30
Q

How do you create a Pareto Chart in Excel?

A

In Excel, there are two main methods to create a Pareto Chart:

  • Manually sorting: Start by creating a frequency table and then sorting it from highest to lowest frequency.
  • Using Excel’s Pareto option: Excel provides a built-in Pareto chart tool under the Statistics charts option, which simplifies the process.
31
Q

What kind of information does a Pareto Chart provide?

A

A Pareto Chart offers both a visual breakdown of categories (through bars) and the cumulative effect of these categories (through a line graph). For example, it helps highlight how much each category contributes to a total, making it easier to identify the most critical categories or issues that require attention.

32
Q

What are some limitations of a Pareto Chart in Excel?

A
  • The bars in the chart are touching, which might imply continuity in data when it’s categorical.
  • The line connecting data points suggests continuity, but the categories are discrete, which may mislead viewers.
  • Excel does not automatically generate a chart title from the data, and the line graph values might be incorrectly joined, suggesting continuous data.
33
Q

How can you fix the issues with bars touching and lines connecting in Excel’s Pareto Chart?

A
  • Increase the gap between the bars to emphasize that they represent discrete categories.
  • Remove the connecting line in the line graph to avoid implying continuity between the categories.
34
Q

Is it recommended to use Pareto charts for ordinal data?

A

No, Pareto charts are not typically useful for ordinal data. Ordinal data should be plotted while maintaining the correct order of categories .

35
Q

What problem might arise when sorting ordinal data in Excel?

A

Excel can mistakenly sort ordinal data alphabetically instead of by its logical order.

36
Q

How can you ensure proper sorting of ordinal data in Excel?

A

Two methods can help:
1) Creating a custom list to sort by a specific order.
2) Using coding, where each category is assigned a numerical code to preserve the correct sequence when sorting.

37
Q

What are cross-tabulations?

A

Cross-tabulations, also known as contingency tables, are used to examine the relationship between two or more categorical variables. They help in organizing and summarizing data, making it easier to identify patterns and relationships.

38
Q

Why use Cross-Tabulations?

A
  • Identify Relationships Between Variables: Cross-tabulations help in identifying potential associations between two variables, such as how one variable might affect another.
    Example: Analyzing whether gender (male/female) is related to the type of car a person prefers (SUV/sedan).
  • Compare Group Distributions: By displaying the frequency of various outcomes across different categories, they allow you to compare the distribution of one variable across the levels of another variable.
    Example: Comparing customer satisfaction levels (high, medium, low) across different regions (North, South, East).
  • Spot Trends and Patterns: Cross-tabulations make it easier to spot trends, patterns, or anomalies within the dataset by organizing data into an easy-to-read table.
39
Q

How Cross-Tabulations Are Constructed?

A
  • The table rows represent one variable, and the columns represent another.
  • Each cell in the table represents the count or frequency of occurrences for the intersection of those categories.
  • Row and column totals (marginal distributions) provide the total frequencies for each category independently.
40
Q

Why would you want to convert cross-tabulation data into percentages?

A

Cross-tabulation tables can be enhanced by adding row percentages and column percentages, which help in making comparisons between categories.

41
Q

What are row percentages?

A

This means the percentage is computed based on the total of each row.

Row percentage= (cell value/row total) x 100

42
Q

What Row Percentages Show?

A

Focus on comparing the distribution of the second variable within each category of the first variable (e.g., distribution of men and women across different organizations).

43
Q

What are column percentages?

A

Column percentages are calculated based on the total number of occurrences in each column.

Column percentage= (cell value/ column total) x 100

44
Q

How else can data in a cross- tabulation be represented?

A

It is possible to represent the data in a cross tabulation in bar chart form by using a
component bar chart (which Excel calls a stacked column chart).

45
Q

What are the limitations of cross tabulations?

A

Cross-tabulations allow us to explore the relationship between two categorical
variables, but sometimes relationships can be even more complicated. There may be
more levels

46
Q

What are Sunburst Charts and why are they used?

A

Sunburst charts are a visual tool used to represent hierarchical data. They are useful for breaking down data into multiple layers or categories, making it easier to see the relationships and subcategories within the data.

47
Q

How are Sunburst Charts structured?

A

Sunburst charts have a circular layout, with inner rings representing main categories and outer rings representing subcategories. Each layer expands from the center outward, showing subdivisions at each level of the hierarchy.

48
Q

Why are Sunburst Charts easier to understand than other charts?

A

They simplify the visualization of complex hierarchical structures, allowing users to quickly grasp relationships between different categories and subcategories by showing the data in an intuitive, visual format.

49
Q

How should the data be structured for creating a Sunburst Chart?

A

The data should be structured with the main category listed first, followed by its subcategories. A third column should include the frequency or count of each subcategory.

50
Q

What are the steps to create a Sunburst Chart in Excel?

A
  1. Highlight the relevant data (e.g., columns with categories and frequencies).
  2. Go to the Hierarchy Charts menu in Excel.
  3. Select the Sunburst Chart option.
51
Q

What are the limitations of Sunburst Charts?

A
  • Label Visibility: Labels might overlap or be cut off if the chart is small.
  • No Frequency Display: Unlike pie charts, sunburst charts do not automatically display frequencies or percentages on the plot, which limits detailed numeric analysis.