Importing Flat Files & Other Data Flashcards
Importing Data in Python (Part 1)
how to access the system shell in IPython (on DataCamp)
!
display directory contents
! ls
open a text file as read-only
open(‘file.txt’, ‘r’)
print an open file
print(file.read())
check if a file is closed
file.closed
close a file
file.close()
alternative to opening and closing a file
context manager: with open(‘file.txt’) as file:
read one line of a file
file.readline()
flat files
table data without structural relationships (like a database would have)
packages to import flat files
NumPy or pandas
how to import a flat file with NumPy
np.loadtext(file, delimiter=, skiprows=, usecols=, dtype=
tab delimiter
‘\t’
how to import mixed datatypes with NumPy
np.genfromtxt(file, delimiter=, names=, dtype=None)
names argument
if =True, tells us there is a header
what does genfromtxt() produce
a structured array; 1D array where each element is a row of the flat file imported
access row of a structured array
array[index]
access column of a structured array
array[‘Column name’]
similar to genfromtxt() with default argument dtype=None
np.recfromcsv()
np.recfromcsv() defaults
delimiter=’,’ names=True dtype=None
importing flat file with pandas as DataFrame
pd.read_csv(‘file’)
converting a DataFrame to numpy array
df.values
missing values in a DataFrame
NA or NaN (use na_values argument to specify string to replace)
pandas equivalent of delimiter
sep=
comment argument
removes comments after a given character (eg: comment=’#’)
explore working directory in Python
import os
os.listdir(os.getcwd())
importing pickle files
import pickle
pickle.load(file) (after first opening up the context manager)
import Excel with pandas
pd.ExcelFile(file)
Excel sheet names
spreadsheet.sheet_names
import a given sheet
spreadsheet.parse(‘specific sheet’)
how to import SAS files
import SAS7BDAT from sas7bdat
SAS7BDAT.to_data_frame(file)
context manager for SAS files
with SAS7BDAT(‘file’) as file:
import stata (.dta) files with pandas
pd.read_stata(‘file’)
importing HDF5 files
h5py.File(file, ‘r’)
importing MATLAB files
import scipy.io
scipy.io.loadmat(‘filename’)