Data Exploration and Visualisation Flashcards
What is the primary goal of data exploration and visualization?
To understand your data.
What are the two main categories in data exploration?
- Data visualization
- Summary statistics
Why is data visualization important in AI?
It communicates complex information effectively and helps identify summaries, structures, relationships, differences, and abnormalities in data.
What did F.J. Anscombe emphasize in 1973 about data analysis?
“Make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”
Name one experiment related to design elements of graphs.
In the 1980s, William Cleveland and Robert McGill measured how accurately humans perceive quantitative information from different graphical cues.
What types of chart are suitable for showing relationships?
Scatter plots and line charts.
When should pie charts be used?
To show the composition of categorical data as a proportion of the whole.
Why is learning the grammar of graphics important?
It helps create and think about new, improved graphics, providing a theoretical foundation instead of relying on special cases.
What is the ggplot2 package in R used for?
Creating plots using the Grammar of Graphics framework.
How can you install the ggplot2 package in R?
or
install.packages(“tidyverse”)
or
install.packages(“ggplot2”)
What is feature transformation?
Mapping a set of values for a feature to a new set of values to simplify data representation for analysis.
What are some common data quality issues?
Missing values
Duplicate data
Inconsistent data
Noise
Outliers
Name one example of data wrangling.
Removing data with missing values.
What is a dataset in the context of diamonds?
A dataset containing prices and attributes of diamonds, like carat, cut, color, clarity, and dimensions.