Plotting data Flashcards
What are the main ways to think about data when plotting data?
- Categorical (e.g., types of rice, leaf positions): Use bar plots, box charts, or count plots, violin plots.
- Numerical (e.g., stomatal count, NumReads): Consider histograms, line graphs, or scatter plots.
- Time Series (e.g., gene expression over time): Use line graphs.
When plotting relationships what are the 3 main relationships to plot, and which graphs do you use to plot them?
- Comparison across categories: Use bar plots or box plots.
- Correlation between variables: Use scatter plots or pair plots.
- Distributions: Use histograms, box plots, or violin plots.
When comparing two numerical values what is the best ways to polt them?
- Scatter plot (to show relationships or clusters).
- Heatmap (if data can be binned).
How would you use seaborn to plot two different numerical values into a plot which you can change the kind of easily
sns.relplot(x=’dependant variable data’, y=’independant variable data’, data=df, kind=’line’)
could also be scatter
What is a response variable?
Response variable: Dependant variable. What you’re measuring in the experiment. It’s the outcome or result you’re interested in, the effect in cause and effect.
What is an explanatory variable?
This is similar to the independent variable. It’s the factor you suspect might explain or cause changes in the response variable.
Which axis do you plot the explanatory variable on?
X-axis
Which axis do you plot the response variable on?
Y-axis
What sentance helps you remember explanatory/ response variables and where to plot them?
Independent Explorers Cause X-citement
How can you choose which column of data to colour your data by?
hue = ‘column name’
sns.scatterplot(data=data, x=”sepal_length”, y=”sepal_width”, hue=”species”)
plt.show()
What is linear regression?
Linear regression is a statistical method used to model the relationship between two variables by fitting a straight line to the data. It’s used to predict the value of one variable (the dependent or response variable) based on the value of another (the independent or explanatory variable).
Fitting the line: The goal of linear regression is to find the best-fitting line that minimizes the difference between the observed data points and the line.
What is lmplot
A useful type of relational plot,
which is used for adding the results of a linear regression to a scatter plot. Specifically, it adds the linear regression line for y~x along with the 95% confidence interval.
Figure level VS Axis level plots
Axis-Level Plots:
* Plot on a single axis.
* More control over details (e.g., scatterplot(), lineplot(), boxplot() )
Customizeable via Matplotlib
Figure-Level Plots:
* Handle the whole figure (axes, titles, etc.).
* Less control, but easier to use (e.g., sns.lmplot()). relplot(), catplot(), pairplot()