Presenting Categorical Data Flashcards
What is E.D.A.?
EDA, or Exploratory Data Analysis, is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It’s an essential first step in data analysis, enabling you to understand the data’s structure, spot any anomalies, identify patterns, and make initial insights before applying more complex statistical models.
What are the key aspects of EDA?
- Data Visualization: Using charts, graphs, and plots to visualize data. This helps in understanding data distribution, detecting outliers, and identifying relationships between variables.
- Summary Statistics: Calculating measures like mean, median, mode, variance, and standard deviation to get a sense of the central tendency and dispersion of the data.
- Identifying Data Quality Issues: Detecting missing values, incorrect data, and inconsistencies that may affect the analysis.
- Hypothesis Generation: EDA helps generate hypotheses about the data that can be tested using more formal statistical techniques later.
What are outliers?
Outliers are data points that significantly differ from the other observations in a dataset. They are values that are either much higher or much lower than the rest of the data. Outliers can occur due to variability in the data or might indicate measurement errors, data entry mistakes, or unusual conditions.
What is the impact of outliers on analysis?
Outliers can skew statistical analyses, such as the mean or standard deviation, leading to misleading results.
How can outliers be identified?
Outliers can be identified through various methods, including visual inspections (like box plots) and statistical tests (such as Z-scores).
How can outliers be dealt with?
Depending on the context, outliers may be excluded, transformed, or analyzed separately to avoid distortion of the results.
How can nominal data be analyzed and presented?
- Frequency Tables: To show the number of occurrences for each category.
- Relative Frequency Tables: To compare the proportion of each category.
- Visual Tools: Such as pie charts, pareto charts and bar charts, to make data easier to understand.
What is a Frequency Table?
A frequency table is a simple way to organize data to show the frequency, or count, of different categories or classes. It is particularly useful for summarizing categorical data, where each row represents a category and the corresponding count is listed in the adjacent column.
What are relative frequencies and percetages and what is it used for?
To make comparisons easier, especially across different datasets, you can convert the counts to relative frequencies (proportions) and percentages.
How is relative frequency calculated?
- Relative frequency= frequency of category/ n (total number of observations)
- Percentage is simply the relative frequency multiplied by 100.
Why are frequency tables important and where are they used?
Frequency tables are important because they:
- Simplify Data: They provide a clear and concise way to organize and summarize categorical data.
- Enable Comparison Across Groups: Relative frequencies and percentages allow for easier comparison across groups of different sizes.
- Serve as a Basis for Visualizations: They form the foundation for creating visual representations of data, like bar charts and pie charts, which further enhance understanding.
What are the uses of pie charts and bar charts?
- Pie Charts: Used to show the proportion of different categories within a whole, making it easy to see how each part compares to the total.
- Bar Charts: Useful for comparing multiple categories side by side, without implying any order.
What are Production costs?
Production costs refer to the expenses incurred by a business or organization in the process of manufacturing a product or providing a service. These costs are essential for determining the overall cost of producing goods and are critical in setting the selling price and determining profitability.
Different types of production costs?
- Direct costs
- Indirect costs
- Fixed costs
- Variable costs
- Semi-variable costs
What is a pie chart?
A pie chart is a circular graphic divided into slices to represent numerical proportions. Each slice corresponds to a category, and the size of the slice is proportional to the quantity it represents.
When should a pie chart be used?
- Visually represent the relative proportions or percentages of different categories within a whole dataset.
- Compare parts of a whole quickly and easily.
- Highlight the dominance of a particular category when there are a limited number of categories.
What are the key components of a pie chart?
- Title: Indicates what the chart represents.
- Slices: Represent each category, sized according to its proportion of the total.
- Labels: Provide the category name and percentage for each slice.
- Legend: Explains the colors or patterns used for different slices.
- Colors/Patterns: Differentiate between categories visually.
How do you create a pie chart?
- Categorize the data
- Calculate the total
- Divide the categories
- Convert into percentages
- Finally, calculate the degrees
The total value of the pie is always 100%.
What is the pie chart formula?
(Given Data/Total value of Data) × 360°
To find the angles of each pie sector.
How to convert values into percentages?
Once you get the total of the values, you divide the individual value by the total and multiply by 100 to get a percentage.
E.X: If 10 out of 40 students like playing football, percentage would be- (10/40) x 100=25%