504 - 550 Flashcards
pandas.DataFrame.abs()
method returns a DataFrame with the absolute value of each value. Убрать минусы с чисел.
data = [[-50, 40, 30], [-1, 2, -2]] df = pd.DataFrame(data) print(df.abs()) 0 1 2 0 50 40 30 1 1 2 2
pandas.DataFrame.le(other, axis=’columns’, level=None)
method compares each value in a DataFrame to check if it is less than, or equal to a specified value, or a value from a specified DataFrame objects, and returns a DataFrame with boolean True/False for each comparison.
df = pd.DataFrame([[10, 12, 2], [3, 4, 7]]) print(df.le(7)) 0 1 2 0 False False True 1 True True True
pandas.DataFrame.items() or iteritems()
method generates an iterator object of the DataFrame, allowing us to iterate each column of the DataFrame.
data = { "firstname": ["Sally", "Mary", "John"], "age": [50, 40, 30] } df = pd.DataFrame(data) for x, y in df.items(): print(x) print(y) firstname 0 Sally 1 Mary 2 John Name: firstname, dtype: object age 0 50 1 40 2 30 Name: age, dtype: int64
pandas.DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.
df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, np.nan], [np.nan, 3, np.nan, 4]], columns=list("ABCD")) df A B C D 0 NaN 2.0 NaN 0.0 1 3.0 4.0 NaN 1.0 2 NaN NaN NaN NaN 3 NaN 3.0 NaN 4.0 df.fillna(0) A B C D 0 0.0 2.0 0.0 0.0 1 3.0 4.0 0.0 1.0 2 0.0 0.0 0.0 0.0 3 0.0 3.0 0.0 4.0
pandas.Series.unique
Return unique values of Series object.
pd.Series([2, 1, 3, 3], name='A').unique() array([2, 1, 3])
pd.Series(pd.Categorical(list('baabc'))).unique() ['b', 'a', 'c'] Categories (3, object): ['a', 'b', 'c'] pd.Series(pd.Categorical(list('baabc'), categories=list('abc'), ordered=True)).unique() ['b', 'a', 'c'] Categories (3, object): ['a' < 'b' < 'c']
pandas.DataFrame.ndim
property returns the number of dimension of the DataFrame.
df = pd.read_csv('data.csv') print(df.ndim) 👉 2
s = pd.Series({'a': 1, 'b': 2, 'c': 3}) s.ndim 👉 1
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) df.ndim 👉 2
pandas Series.dt.strftime(*args, **kwargs)
used to convert to Index using specified date_format.
rng = pd.date_range(pd.Timestamp("2018-03-10 09:00"), periods=3, freq='s') rng.strftime('%B %d, %Y, %r') Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM', 'March 10, 2018, 09:00:02 AM'], dtype='object')
result = sr.dt.strftime('% B % d, % Y, % r')
result = sr.dt.strftime('% d % m % Y, % r')
pandas Series.str.contains(pat, case=True, flags=0, na=None, regex=True)
used to test if pattern or regex is contained within a string of a Series or Index.
s1 = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN]) s1.str.contains('og', regex=False) 0 False 1 True 2 False 3 False 4 NaN
s1.str.contains('oG', case=True, regex=True) 0 False 1 False 2 False 3 False 4 NaN
s2 = pd.Series(['40', '40.0', '41', '41.0', '35']) s2.str.contains('.0', regex=True) 0 True 1 True 2 False 3 True 4 False
pandas.DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
method allows one or more column values become the row index.
df = pd.DataFrame({'month': [1, 4, 7, 10], 'year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}) df.set_index(['year', 'month']) sale year month 2012 1 55 2014 4 40 2013 7 84 2014 10 31
data = { "name": ["Sally", "Mary", "John", "Monica"], "age": [50, 40, 30, 40], "qualified": [True, False, False, False] } df = pd.DataFrame(data) newdf = df.set_index('name') print(newdf) age qualified name Sally 50 True Mary 40 False John 30 False Monica 40 False
pandas.DataFrame.index
property returns the index information of the DataFrame. The index information contains the labels of the rows. If the rows has NOT named indexes, the index property returns a RangeIndex object with the start, stop, and step values.
df = pd.read_csv('data.csv') print(df.index) RangeIndex(start=0, stop=169, step=1)
pandas.DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=’’)
method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the “old” idexes in a column named “index”, to avoid this, use the drop parameter.
data = { "name": ["Sally", "Mary", "John"], "age": [50, 40, 30], "qualified": [True, False, False] } idx = ["X", "Y", "Z"] df = pd.DataFrame(data, index=idx) newdf = df.reset_index() print(newdf) index name age qualified 0 X Sally 50 True 1 Y Mary 40 False 2 Z John 30 False
df = pd.DataFrame([('bird', 389.0), ('bird', 24.0), ('mammal', 80.5), ('mammal', np.nan)], index=['falcon', 'parrot', 'lion', 'monkey'], columns=('class', 'max_speed')) df.reset_index() index class max_speed 0 falcon bird 389.0 1 parrot bird 24.0 2 lion mammal 80.5 3 monkey mammal NaN
pandas.DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, sort_remaining=True, ignore_index=False, key=None)
method sorts the DataFrame by the index.
df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A']) df.sort_index() A 1 4 29 2 100 1 150 5 234 3
df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd']) df.sort_index(key=lambda x: x.str.lower()) a A 1 b 2 C 3 d 4
data = { "age": [50, 40, 30, 40, 20, 10, 30], "qualified": [True, False, False, False, False, True, True] } idx = ["Mary", "Sally", "Emil", "Tobias", "Linus", "John", "Peter"] df = pd.DataFrame(data, index = idx) newdf = df.sort_index() print(newdf) age qualified Emil 30 False John 10 True Linus 20 False Mary 50 True Peter 30 True Sally 40 False Tobias 40 False
pandas.DataFrame.size
property returns the number of elements in the DataFrame. The number of elements is the number of rows * the number of columns.
In our example the DataFrame has 169 rows and 4 columns: 169 * 4 = 676
df = pd.read_csv('data.csv') print(df.size) 👉 676
s = pd.Series({'a': 1, 'b': 2, 'c': 3}) s.size 👉 3
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) df.size 👉 4
pandas.Series.is_unique
returns True if the data in the given Series object is unique else it return False (have duplicates or not).
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Chicago']) # Creating the row axis labels sr.index = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] sr.is_unique 👉 False
pandas.Series.is_monotonic
It returns True if the data in the given Series object is monotonically increasing else it return False.
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon']) sr.index = ['City 1', 'City 2', 'City 3', 'City 4'] sr.is_monotonic 👉 False
sr = pd.Series(['1/1/2018', '2/1/2018', '3/1/2018', '4/1/2018']) sr.index = ['Day 1', 'Day 2', 'Day 3', 'Day 4'] sr.is_monotonic 👉 True
pandas.DataFrame.product(axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
method multiplies all values in each column and returns the product for each column. By specifying the column axis (axis=’columns’), the product() method searches column-wise and returns the product of each row. The product() method does the same as the prod() method.
data = [[10, 18, 11], [13, 15, 8], [9, 20, 3]] df = pd.DataFrame(data) print(df.product()) 0 1170 1 5400 2 264
seaborn.boxplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)
Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable.
sns.set_theme(style="whitegrid") tips = sns.load_dataset("tips") ax = sns.boxplot(x=tips["total_bill"])
ax = sns.boxplot(x="day", y="total_bill", hue="smoker", data=tips, palette="Set3")
sns.set_style("whitegrid") sns.boxplot(x = 'day', y = 'total_bill', data = tips)
pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header=’infer’, names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, prefix=NoDefault.no_default, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=’.’, lineterminator=None, quotechar=’”’, quoting=0,
doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors=’strict’, dialect=None, error_bad_lines=None, warn_bad_lines=None, on_bad_lines=None, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options=None)
Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks.
filepath_or_buffer - Any valid string path is acceptable. The string could be a URL.
sep - Разделитель
header - Row number(s) to use as the column names, and the start of the data.
names - List of column names to use.
index_col - Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.
usecols - Return a subset of the columns. If list-like, all elements must either be positional
mangle_dupe_cols - Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’.
dtype - Data type for data or columns.
converters - Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
skipinitialspace - Skip spaces after delimiter.
skiprows - Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
skipfooterint - Number of lines at bottom of file to skip (Unsupported with engine=’c’).
nrows - Number of rows of file to read. Useful for reading pieces of large files.
na_values - Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.
na_filter - Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.
skip_blank_lines - If True, skip over blank lines rather than interpreting as NaN values.
parse_dates - bool or list of int or names or list of lists or dict, default False
infer_datetime_format - If True and parse_dates is enabled
keep_date_colbool - If True and parse_dates specifies combining multiple columns then keep the original columns.
date_parser - Function to use for converting a sequence of string columns to an array of datetime instances.
dayfirst - DD/MM format dates, international and European format.
lineterminator - Character to break file into lines. Only valid with C parser.
escapechar - One-character string used to escape other characters.
comment - Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character.
encoding - Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
dialect - If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting.
on_bad_lines - {‘error’, ‘warn’, ‘skip’} Specifies what to do upon encountering a bad line (a line with too many fields).
delim_whitespace - Specifies whether or not whitespace (e.g. ‘ ‘ or ‘ ‘) will be used as the sep.
pd.read_csv('data.csv')
pandas.Series.to_dict(into=<class ‘dict’>)
Convert Series to {label -> value} dict or dict-like object.
s = pd.Series([1, 2, 3, 4]) s.to_dict() 👉 {0: 1, 1: 2, 2: 3, 3: 4}
pandas.DataFrame.to_dict(orient=’dict’, into=<class ‘dict’>)
Convert the DataFrame to a dictionary. The type of the key-value pairs can be customized with the parameters (see below).
orient - {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['row1', 'row2']) df.to_dict() {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}