Categorical Variables Flashcards
What is the .value_counts()
the function returns an object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
dataframe[‘column’]= dateframe[‘column’].value.counts()
What is the .str.strip()?
Removes the white space characters
What is qcut()?
“Quantile-based discretization function.” This basically means that .qcut() tries to divide up the underlying data into equal-sized bins. The function defines the bins using percentiles based on the distribution of the data
How do you collapse data categories?
- create the range: range=
- Create names
- dataframe[‘column’]= pd.cut(dataframe[‘other cloumn’}, bins= range, labels= names)
- dataframe[[‘column’, ‘other colum]]
What is the pd.cut()?
cut() function is used to separate the array elements into different bins . The cut function is mainly used to perform statistical analysis on scalar data
How do you map fewer categories?
- Create a mapping dictionary
- dataframe[‘column’]= dataframe[‘column’].replace(mapping)
- dataframe[‘column’].unique()
what is unique()?
the function is used to find the unique elements of an array. Returns the sorted unique elements of an array.