Untitled Deck Flashcards
What is the syntax to read a CSV file in pandas?
pd.read_csv(‘filename.csv’)
How do you read an Excel file with pandas?
pd.read_excel(‘filename.xlsx’, sheet_name=’Sheet1’)
What’s the syntax to save a DataFrame to a CSV file?
df.to_csv(‘filename.csv’, index=False)
How do you read a JSON file in pandas?
pd.read_json(‘filename.json’)
What parameter sets the column delimiter when reading a CSV?
sep=’,’ or delimiter=’,’
How do you create a DataFrame from a dictionary?
pd.DataFrame({‘col1’: [1, 2], ‘col2’: [3, 4]})
What’s the syntax to create a Series?
pd.Series([1, 2, 3, 4])
How do you create a DatetimeIndex?
pd.date_range(start=’2023-01-01’, periods=10, freq=’D’)
How do you create a DataFrame with specific index values?
pd.DataFrame(data, index=[‘a’, ‘b’, ‘c’])
What’s the syntax to create a MultiIndex DataFrame?
pd.DataFrame(data, index=pd.MultiIndex.from_tuples([(‘a’, 1), (‘a’, 2), (‘b’, 1)]))
How do you select a column from a DataFrame?
df[‘column_name’] or df.column_name
What’s the difference between loc and iloc?
loc uses labels for indexing, iloc uses integer positions
How do you select rows 5 through 10 with iloc?
df.iloc[5:11]
How do you select rows where column ‘A’ > 5?
df[df[‘A’] > 5] or df.loc[df[‘A’] > 5]
How do you select the first 5 rows of a DataFrame?
df.head(5) or df.iloc[:5]
How do you drop rows with missing values?
df.dropna()
How do you fill missing values with a specific value?
df.fillna(value)
How do you drop duplicate rows?
df.drop_duplicates()
How do you replace all instances of ‘old_value’ with ‘new_value’?
df.replace(‘old_value’, ‘new_value’)
How do you check for missing values in a DataFrame?
df.isna() or df.isnull()
What’s the syntax for applying a function to each element in a DataFrame?
df.applymap(func)
How do you apply a function to each column in a DataFrame?
df.apply(func)
How do you apply a function to each element in a Series?
series.map(func)
How do you rename columns in a DataFrame?
df.rename(columns={‘old_name’: ‘new_name’})
How do you convert a column’s data type?
df[‘column’] = df[‘column’].astype(‘int64’)
What’s the basic syntax for a GroupBy operation?
df.groupby(‘column’).agg({‘target_column’: ‘mean’})
How do you calculate column means in a DataFrame?
df.mean() or df.mean(axis=0)
How do you calculate row sums in a DataFrame?
df.sum(axis=1)
How do you get descriptive statistics for a DataFrame?
df.describe()
How do you create a pivot table in pandas?
pd.pivot_table(df, values=’D’, index=[‘A’, ‘B’], columns=[‘C’])
How do you concatenate two DataFrames vertically?
pd.concat([df1, df2], axis=0)
How do you merge two DataFrames on a common column?
pd.merge(df1, df2, on=’common_column’)
What’s the syntax for a left join in pandas?
pd.merge(df1, df2, on=’key’, how=’left’)
How do you join DataFrames using their indices?
pd.merge(df1, df2, left_index=True, right_index=True)
How do you concatenate DataFrames horizontally?
pd.concat([df1, df2], axis=1)
How do you resample a time series to monthly frequency?
df.resample(‘M’).mean()
How do you create a DatetimeIndex from a string column?
df[‘date’] = pd.to_datetime(df[‘date_str’])
How do you set a datetime column as index?
df.set_index(‘date_column’, inplace=True)
How do you get the year from a datetime column?
df[‘date’].dt.year
How do you calculate the difference between two dates?
(df[‘end_date’] - df[‘start_date’]).dt.days
How do you perform a rolling window calculation?
df.rolling(window=3).mean()
What’s the syntax for creating a crosstab?
pd.crosstab(df[‘A’], df[‘B’])
How do you reshape data from wide to long format?
pd.melt(df, id_vars=[‘A’], value_vars=[‘B’, ‘C’])
How do you create dummies (one-hot encoding) from a categorical column?
pd.get_dummies(df[‘category_column’])
How do you calculate correlation between columns?
df.corr()
How do you display all columns of a DataFrame?
pd.set_option(‘display.max_columns’, None)
What method shows basic information about a DataFrame?
df.info()
How do you check the memory usage of a DataFrame?
df.memory_usage(deep=True)
How do you reset a DataFrame’s index?
df.reset_index()
How do you save a DataFrame to an HDF5 store?
df.to_hdf(‘store.h5’, key=’df’)