Sect 4- data visualization, R programming, python, statistical significance, & ANOVA Flashcards
advantages of data visualization
Our eyes are drawn to colors and patterns. We can quickly identify red from blue, and squares from circles. Our culture is visual, including everything from art and advertisements to TV and movies. Data visualization is another form of visual art that grabs our interest and keeps our eyes on the message. When we see a chart, we quickly see trends and outliers. If we can see something, we internalize it quickly. It’s storytelling with a purpose. If you’ve ever stared at a massive spreadsheet of data and couldn’t see a trend, you know how much more effective a visualization can be.
Some other advantages of data visualization include:
Easily sharing information.
Interactively explore opportunities.
Visualize patterns and relationships.
disadvantages of data visualization
While there are many advantages, some of the disadvantages may seem less obvious. For example, when viewing a visualization with many different datapoints, it’s easy to make an inaccurate assumption. Or sometimes the visualization is just designed wrong so that it’s biased or confusing.
Some other disadvantages include:
Biased or inaccurate information.
Correlation doesn’t always mean causation.
Core messages can get lost in translation.
The importance of data visualization is simple: it helps people see, interact with, and better understand data. Whether simple or complex, the right visualization can bring everyone on the same page, regardless of their level of expertise.
It’s hard to think of a professional industry that doesn’t benefit from making data more understandable. Every STEM field benefits from understanding data—and so do fields in government, finance, marketing, history, consumer goods, service industries, education, sports, and so on.
While we’ll always wax poetically about data visualization (you’re on the Tableau website, after all) there are practical, real-life applications that are undeniable. And, since visualization is so prolific, it’s also one of the most useful professional skills to develop. The better you can convey your points visually, whether in a dashboard or a slide deck, the better you can leverage that information. The concept of the citizen data scientist is on the rise. Skill sets are changing to accommodate a data-driven world. It is increasingly valuable for professionals to be able to use data to make decisions and use visuals to tell stories of when data informs the who, what, when, where, and how.
While traditional education typically draws a distinct line between creative storytelling and technical analysis, the modern professional world also values those who can cross between the two: data visualization sits right in the middle of analysis and visual storytelling.
As the “age of Big Data” kicks into high gear, visualization is an increasingly key tool to make sense of the trillions of rows of data generated every day. Data visualization helps to tell stories by curating data into a form easier to understand, highlighting the trends and outliers. A good visualization tells a story, removing the noise from data and highlighting useful information.
However, it’s not simply as easy as just dressing up a graph to make it look better or slapping on the “info” part of an infographic. Effective data visualization is a delicate balancing act between form and function. The plainest graph could be too boring to catch any notice or it make tell a powerful point; the most stunning visualization could utterly fail at conveying the right message or it could speak volumes. The data and the visuals need to work together, and there’s an art to combining great analysis with great storytelling.
General types of data visualization
Chart: Information presented in a tabular, graphical form with data displayed along two axes. Can be in the form of a graph, diagram, or map.
Table: A set of figures displayed in rows and columns.
Graph: A diagram of points, lines, segments, curves, or areas that represents certain variables in comparison to each other, usually along two axes at a right angle.
Geospatial: A visualization that shows data in map form using different shapes and colors to show the relationship between pieces of data and specific locations.
Infographic: A combination of visuals and words that represent data. Usually uses charts or diagrams.
Dashboards: A collection of visualizations and data displayed in one place to help with analyzing and presenting data.
Specific types of data visualization
Area Map: A form of geospatial visualization, area maps are used to show specific values set over a map of a country, state, county, or any other geographic location. Two common types of area maps are choropleths and isopleths.
Bar Chart: Bar charts represent numerical values compared to each other. The length of the bar represents the value of each variable.
Box-and-whisker Plots: These show a selection of ranges (the box) across a set measure (the bar).
Bullet Graph: A bar marked against a background to show progress or performance against a goal, denoted by a line on the graph.
Gantt Chart: Typically used in project management, Gantt charts are a bar chart depiction of timelines and tasks.
Heat Map: A type of geospatial visualization in map form which displays specific data values as different colors (this doesn’t need to be temperatures, but that is a common use).
Highlight Table: A form of table that uses color to categorize similar data, allowing the viewer to read it more easily and intuitively.
Histogram: A type of bar chart that split a continuous measure into different bins to help analyze the distribution.
Pie Chart: A circular chart with triangular segments that shows data as a percentage of a whole.
Treemap: A type of chart that shows different, related values in the form of rectangles nested together.
Types of Information Visualization
Information visualization tools can help users compare different values, show the bigger picture, track trends in the data, and understand different relationships between variables. The following visualization formats are most commonly used for these purposes:
Column chart
Bar graph
Network graph
Stacked bar graph
Histogram
Line chart
Pie chart
Scatter plot or 3D scatter plot
Box plot
Bubble chart
Dual-axis chart
Stream graph
Sankey diagram
Chord diagram
Choropleth map
Hex map
Voronoi polygon diagram
Ridgeline plot
Interactive decision tree
Heatmap
Tree map
Circle packing
Violin plot
Real-time tracker
Almost everyone within modern organizations is demanding access to data, making the representation of that data in an easy-to-understand format even more important. Business users need a way to interpret data and interact with it in an intuitive way. Information visualization tools help these decision-makers navigate the data with less difficulty and therefore deliver value to the entire organization.
Information visualization is a key skill today as more companies look to digitally transform and make data a key asset across the organization. With ever-growing volumes of data, being able to present data in a meaningful way for others to understand has become crucial for a business to remain competitive. Information visualization turns data into actionable insights.
What Makes an Information Visualization Successful?
Information visualization is an art and therefore relies on the following aspects of design:
The subject matter: information or data being represented
The story: the concept being portrayed in the visualization
The goal: meeting the purpose with the right visualization
The visual: using key elements of structure and design
select() — Selecting Columns in your Data Set
Selecting only the columns continent, year, and pop.
gapminder %>%
select(continent, year, pop) %>%
head(rows)
Selecting all columns but the year column.
gapminder %>%
select(-year) %>%
head(rows)
Selecting all columns that start with co using starts_with(). Please have a look at the documentation for additional useful functions, including ends_with() or contains().
gapminder %>%
select(starts_with(“co”)) %>%
head(rows)
rename() — Renaming Columns
Rename the columms year into Year and lifeExp into Life Expectancy.
gapminder %>%
select(country, year, lifeExp) %>%
rename(
Year = year,
“Life Expectancy” = lifeExp
) %>%
head(rows)
arrange() — Sorting your Data Set
Sort by year.
gapminder %>%
select(continent, year, lifeExp) %>%
arrange(year) %>%
head(rows)
Sort by lifeExp and the by year (descending).
gapminder %>%
select(continent, year, lifeExp) %>%
arrange(lifeExp, desc(year)) %>%
head(rows)
filter() — Filtering Rows in your Data Set
Filter rows with the year 1972.
gapminder %>%
select(country, year, lifeExp) %>%
filter(year == 1972) %>%
head(rows)
Filter rows with the year 1972 and with a life expectancy below average.
gapminder %>%
select(country, year, lifeExp) %>%
filter(
year == 1972,
lifeExp < mean(lifeExp)
) %>%
head(rows)
Filter rows with the year 1972 and with a life expectancy below average, and with the country either to be Bolivia OR Angola.
gapminder %>%
select(country, year, lifeExp) %>%
filter(
year == 1972,
lifeExp < mean(lifeExp),
country == “Bolivia” | country == “Angola”
) %>%
head(rows)
mutate () — Generate new Rows in your Data Set
Create a column that combines continent and coountry information, and another column that shows the rounded lifeExp information.
gapminder %>%
arrange(year, pop) %>%
mutate(
con_country = paste(continent, “-“, country),
rn_lifeExp = round(lifeExp)
) %>%
select(continent, country, con_country, lifeExp, rn_lifeExp) %>%
head(rows)
summarize() — Create Summary Calculations in your Data Set
For the whole data set calculate mean and standard deviation for population and life expectations.
gapminder %>%
summarize(
pop_mean = mean(pop),
pop_sd = sd(pop),
le_mean = mean(lifeExp),
le_sd = sd(lifeExp)
)