quiz 3 data visualization Flashcards
2 types of data visualization
declaritive purpose and exploritory purpose to make better decisions
3 types of data
quantitative, categorical, and ordinal
quantitative data
descrete and continuous
categorical
nominal(order doesnt matter) and only subset of values
M or F
Hair is blonde, brunette, black etc.
who cares about order
ordinal
subset of values but order does matter
such as low,med, high
income low,med,high
education high school, college
Find the Median of: 9, 3, 44, 17, 15
3,9,15,17,44 median is 17. line up ascending pick middle
8, 3, 44, 17, 12, 6
3,6,8,12,17,44. since even amount 8+12/2=10
explain why a plot could be helpful
as part of the exploratory analysis to identify outliers. or s part of the end goal.
create plot with a range of 0-9
data=np.arange(9)
x=plt.plot(data)
create a plot with -1,3,5,7
x=plt.plot([-1,3,5,7])
create an empty figure object
x=plt.figure(). You can’t make it appear without subplots though.
ax1 = fig.add_subplot(2, 2, 1)
create a figure object with a random array of 10
data=np.random.randn(10)
create figure object 1x1 with plot of 1.5,3,5,-2,1.6
fig=plt.figure() #Create a figure object
ax=fig.add_subplot(1,1,1) #create a AxesSubplot object
ax.plot([1.5,3.5,-2,1.6])
plot a series array
ser=pd.Series(np.random.randn(10).cumsum(),index=np.arange(0,100,10) ser.plot #The Series object’s index is passed to matplotlib for plotting on the x-axis, though you can disable this by passing use_index=False.
plot a dataframe
df=pd.DataFrame(np.random.randn(10,4).cumsum(0), columns=[‘A’,’B’,’C’,’D’],index=np.arange(0,100,10))
df.plot()
this will make 4 random dataframes labels abcd
create 3 series bar and horizonal bar plots
#fig, axes=plt.subplots(2,1) fig, axes=plt.subplots(3, 1) #data = pd.Series(np.random.randn(16),index=list('abcdefghijklmnop')) data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop')) data1 = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop')) data.plot.bar(ax=axes[0],color='k',alpha=.7) data.plot.barh(ax=axes[1],color='r',alpha=.7) data.plot.barh(ax=axes[2],color='g',alpha=.7)
create a dataframe with 6 of 4 columns in vertical bar chart.
df=pd.DataFrame(np.random.rand(6,4),index=[‘one’,’two’,’three’,’four’,’five’,’six’],columns=pd.Index[‘A’,’B’,’C’,’D’],,name=’Genus’))
df.plot.bar()
create same dataframe but horizonal stacked
df=pd.DataFrame(np.random.rand(6,4),index=[‘one’,’two’,’three’,’four’,’five’,’six’],columns=pd.Index[‘A’,’B’,’C’,’D’],,name=’Genus’))
df.plot.barh(stacked=True, alpha=0.9)
what are black lines on bar chart
95% confidence level
lim,
xticks, and xticklabels
The pyplot interface, designed for interactive use, consists of methods like xlim,
xticks, and xticklabels. These control the plot range, tick locations, and tick labels,
respectively. They can be used in two ways:
put the x axis to 0 to 10
lt.xlim([0, 10]), sets
the x-axis range to 0 to 10)
1 figure cumulitive random 1-1000
In [37]: fig = plt.figure()
In [38]: ax = fig.add_subplot(1, 1, 1)
In [39]: ax.plot(np.random.randn(1000).cumsum())
change x axis ticks and labels
To change the x-axis ticks, it’s easiest to use set_xticks and set_xticklabels. The
former instructs matplotlib where to place the ticks along the data range; by default
ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels([‘one’, ‘two’, ‘three’, ‘four’, ‘five’],
….: rotation=30, fontsize=’small’)
set title:
ax.set_title(‘My first matplotlib plot’)
how do you create a legend
first put labels for each subplot , then call the legend f4.plot(np.random.randn(50).cumsum(),'k--',label='black') f4.plot(np.random.randn(40).cumsum(),'b--',label='blue') #f4.legend() f4.legend(loc='best')
how to save a plot figure
plt.savefig(‘figpath.svg’)