Data Insights Flashcards
DI notes
> goal of Data Insights section is to test your ability to MAKE SENSE OF COMPLICATED DATA efficiently —-> which data you need to answer the question
READ WORDING CAREFULLY
> even though you will have an on-screen calculator, ESTIMATION is an important skills to master for DI questions that use the words “approximately”, “nearest to” or “closest to” —-> before resorting to the calculator, see if you can solve the problem in a smarter way
> in fact, it IS a SOUND STRATEGY to attempt to estimate your way through correct answers whenever possible in data insights
> not all DI questions require calculations (even in quant ones) –> some answers will be answerable through simply interpreting the information provided to us
> be comfortable with interacting with DI questions displayed on the computer screen
e.g., multiple tabs for Multi-Source reasoning; sort function for Table analysis
> Develop a smart timing strategy (e.g., spending 1 min to Graphics Qs and saving more time for complex questions)
> still better to FINISH all questions with some guesses than leave answers blank
> keep in mind may of the DI questions involve multiple parts –> ALL parts need to be answered correctly (no partial credit)
> watch out for EXTRANEOUS INFO in DI questions that won’t be needed to answer the questions —-> USE THE ANSWER CHOICES as a guide
Graphics Interpretation
What: select correct answer from a DROP-DOWN LIST based on information presented in a GRAPH or CHART
> typically TWO questions / statements
> “From each drop-down menu, select the option that creates the most accurate statement based on the information provided”
Types of charts:
> bar charts
> scatterplots **
> column charts / pareto charts
> cluster charts
> stacked column charts ***
> bubble diagrams
> Venn diagram
> Other DIAGRAMS with symbols
> Flow charts
> Frequency table and Histograms
> line graphs **
> Concept maps
> Pie charts
> hybrid charts with double axis
etc.
Most often, graphics interpretation questions are QUANTITATIVE
> ask about probability, slope of a line, direct and indirect variation, averages, ratios, standard deviation, mean, median, mode, range, percent of and percent change
Strategy for Solving:
(1) Summarize the high level message of the graph + textual information (“this graph shows the number of mangos sold during each ten-year period from 1980 to 2020.”) – “simple story”
> no need to get caught up in the details yet
(2) Go to the sentences — BEFORE DETAILS (might not need all the info)
(3) Pay attention to DETAILS of the graphic and its components (title, units of measurement, axes, axis titles, axis GRIDLINES, legend, colours, SYMBOLS patterns)
> READ the labels
> Symbols might be used to represent numbers
> be careful of y axis scales that DON’T start at 0 and can be misleading in terms of relative size (avoid using visual comparisons to make conclusions about values); use actual values instead
> Some visual estimation requires precision of reading values from the chart –> add imaginary gridlines if you must! And compare relative to other values in the chart (e.g., max value for Range calcs)
Notes:
> when you see estimation markers like “approximately”, “nearest to”, “closest to” —> solve question using savvy and disciplined estimation
> another trigger for estimation is when the ANSWER OPTIONS are SPREAD RELATIVELY FAR apart
> when a chart has NO numerical scales on its y axis –> can only perform RELATIVE comparisons across categories (e.g., one value is greater than or less than another value) ——> cannot say anything related to absolute value differences or ratios of values
ESTIMATION STRATEGIES:
> Division —> round to easy DECIMALS (NOT always nearest integer), then set fraction = x so you can cross multiply
e.g., 6.25/0.62 = x —-> ~6.2/0.62 = x —–> 6.2 = 0.62x —-> x = 10
> can also write down necessary info from charts in a TABLE (helpful to calculate percent change)
Graphics interpretation: Column and pareto charts
Category on x axis, and frequency or relative frequency on y axis
Questions focus on comparing RELATIVE heights of vertical bars
See word doc
Watch out for:
> y axis that don’t start at 0 –> cannot rely on visual aid
> y axis with no numerical values at all –> cannot determine actual values or ratios of values; just know if something is bigger/biggest vs smaller/smallest
e.g., 1, 2, 3 increments OR 101, 102, 103 increments
> HOWEVER, if we are given: axis starts at 0 and the increments are CONSTANT (difference) —> then we can determine RATIO VALUES on the chart
Try: One line’s value / lower line’s value —> see if this ratio is constant when you vary values
Graphics interpretation: Stacked column chart **
Allows us to compare relative frequencies of a single column (% split) and nicely illustrates the SUM of the series of numbers
Be careful when calculating the PARTS
e.g., if A + B = total
Then A = Total - B
Also don’t get overwhelmed by too many information in the CHART AND text —> keep track of what you need to solve
Other tips:
> can shorten the list of potential answers asking about proportion of a series using VISUAL APPROACH –> does the series represent over or less 50% of the bar?
Graphics Interpretation: Histograms and frequency tables
Represent FREQUENCY (count) of certain INDIVIDUAL VALUES or RANGES OF VALUES
Cumulative Frequency => helpful for “at least” or “at most” questions (sum of multiple categories)
e.g., How many attorneys at at least 4 pieces of fruit per day? = number of attorneys who ate 4 pieces + number who ate 5 pieces + number who ate 6 pieces
e.g., How many attorneys at at most 3 pieces of fruit per day? = number of attorneys who at 0 pieces + 1 pieces + 2 pieces + 3 pieces
Or total number of attorneys - number of attorneys who at at least 4 pieces per day
Frequency of certain RANGES of values e.g., 3 ppl aged 40-49
Histograms are similar to column chart except the x axis has NUMBER RANGES
Graphical Interpretation: Hybrid and double axis
Usually a bar and line chart with 2Y axis
> PAY ATTENTION to which axis to use (especially pernicious when the units are the SAME on both sides, like $)
Graphical Interpretation: Scatter plots
Allows us to analyze any RELATIONSHIPS between TWO VARIABLES (represented by the x and y axis)
> positive relationship
> negative relationship
> no relationship (almost horizontal line)
Trend lines make it very easy to identify relationship between two variables
(you can add a trend line to see relationships more clearly)
Scatterplots can aid us in making predictions
> e.g., temperature (x axis) vs number of customers (y axis). When temperature is 70 F, we find the region in the scatterplot WHERE MOST OF THE DOTS ARE. Then find the respective y axis (can also determine the max and min prediction based on actual data points near it)
HOWEVER: when working with scatterplots, do NOT make predictions about data OUTSIDE of the data that was measured, UNLESS language in the question states that we can extend the relationship (“extrapolation)
> extrapolation can be dangerous unless the QUESTION makes an explicit assumption that the trend will continue (who knows whether the opposite trend or unexpected trend could happen!)
Graphics Interpretation: Correlation
Line charts and scatter plots can depict relationships:
> positive relationship (as x increases, y increases; as x decreases, y decreases)
> negative relationship (as x increases, y decreases; as x decreases, y increases)
> no relationship (almost horizontal trend line)
Remember: Correlation =/ causation
Graphics Interpretation Question also will often present us with a LINE GRAPH(s) and ask us what type of correlation exists between the data sets (groups) represented by the line(s)
Detecting correlation among two or more lines?
> see if the lines MOVE TOGETHER from left to right (don’t need to move perfectly by the same amount every time, just IN THE SAME DIRECTION)
> Always TRACK INTERVALS along the X axis for EACH LINE (which denote a CHANGE IN DIRECTION for that line) –> observe whether those intervals line up and what happens to each line
KEEP TRACK OF TWO CORRELATIONS:
> 1) X variable and Y variable (generally, applies to both groups of data)
> 2) Group A vs Group B (moving together or moving in opposite directions over same interval)
—-> test by: as x increases, A (increases/decreases/stable) and B (increases/decreases/stable)
What happens if there are a few cases where data does NOT follow a trend?
> COULD still be a correlation between the two variables –> look at the GENERAL TREND
> sometimes though there could just be NO GENERAL
Bivariate data
Data point representing TWO VALUES (x, y)
> scatter plots
line charts
Graphics Interpretation: Scatter plots with double axis and PAIR of symbols
Keep careful track of which symbols tie to which axis AND GRIDLINES
Pair of points aligned vertically, and SHARE the same x axis
> Careful to ALIGN CORRECTLY
Graphics Interpretation: Bubble chart
Center of the circle = data point
Represents 3 variables –> x, y and size of the bubble
Graphics Interpretation: Pie charts
Be careful of complex pie chart questions involving:
> Pie chart
> Plus column or bar chart
> Plus tables
You need to expertly decide which data to use (you may not need to use them all!)
Supplemental data could be a double click into one slice of the pie chart or be something completed unrelated
Tips:
> might not need to calculate actual Total count to know actual count of a slice —> can creatively use other slices and their % (proportions of actual counts)
e.g., A = 12% * T
Looking for B = 36%*T
B/A = 3
So if A = 240, B = 3*240 = 720
Graphics Interpretation: Venn Diagrams
Venn Diagram Qs are similar to what we learned in PS
> often accompanied by probability questions —> be careful which region you are taking the values from
Be careful:
> AND vs OR
e.g., ketchup AND mayonnaise =/ ketchup only + mayonnaise only + overlap
Tip:
> use the sub-part view (4 or 8) to understand which sub-parts must be included or excluded
> for counting symbols:
» Go top to bottom (3 sets)
» Go left to right (2 sets)
Fractions (for estimating probability)
7/8
87.5%
Graphics Interpretation: Flowcharts
Describes a process using shapes, arrows, and text
e.g., decision process
Tip:
> read the accompanying description of how the flow chart works and LOOK AT THE CHART each time the text mentions a part of the chart
> Before reading the questions, develop a GENERAL UNDERSTANDING of how the chart works, but don’t seek to understand every detail of the chart
Table Insights
What: Data is provided in a table with COLUMNS that CAN BE SORTED (least to greatest)
> generally quantitative info stored in tables accompanied by an explanatory text (but can also be text in cells e.g., “High / Low”, “Yes / No”, “Name 1”)
> followed by 3 True or False statements
> “For each of the following statements, select True if the statement can be verified to be true based on the information provided. Otherwise select False” —–> does the data SUPPORT / validate the statement?
> True = data supports the statement
> False = data does not support the statement (incorrect or not enough data)
—> ADD a QUESTION MARK to evaluate the statement
Specially worded Qs: still two options per statement, but just need to read carefully
> “Less than the median” vs “Greater than or equal to the median”
Content:
> change
> percent difference
> average, median, range, standard deviation
> ratios (of values or counts) and proportions (part to whole involving values or counts)
> probability (often will be = criteria within subgroup / subgroup count)
> Correlation
> comparative ranking (highest, lowest, higher, lower)
> DS variations (does the table have enough info to support a conclusion?)
> nonstandard table analysis Qs (incl. verbal data)
How to solve:
(1) Read explanatory text to understand the table and come up with a single sentence that captures your understanding of the info in the tab
(2) Focus on high level understanding and look for most obvious patterns, relationships and trends
> don’t get into the details yet
(3) Go to questions and determine EXACTLY what data you need (without getting distracted by all the other info in the table)
Tips:
>Skillful sorting: not all Table Analysis qs will require using info from multiple sorting screens
> Estimation is key for qs that don’t need high level of accuracy for SAVING TIME (e.g., if N is the same in average calculations, just compare numerators, eyeballing data that is always greater to determine highest average, using fractions to compare sizes of ratios)
> Double check that you are using the right column (labels)
> sometimes the table with provide TOTALS at the BOTTOM of the columns –> can be useful when calculating average
> be careful NOT to count the total or mean rows as part of datasets
> COUNT CAREFULLY
> be careful when doing “mental filter” –> sorting using function, THEN mentally or manually sorting again (don’t expect sub-group of values to be sorted)
e.g., first sort for all chocolate chip cookies, then need to arrange manually prices from least to greatest
Table Analysis: Values that fit a specified criterion in a table
Often to answer a Table Analysis question, we must determine something about a SUBSET of data that fits a specified criterion
> e.g., sort by Country first, then find the median value of a specific country
(kind of like creating a “mental filter” using Sort –> helps GROUP relevant values together, then you have to manually sort again to find median)
Sorting can help GROUP relevant values together and focus our attention to the right subgroups