Importing Data Flashcards
What does the command ! ls do?
The IPython magic command ! lswill display the contents of your current directory
np.loadtext() what are some of the arguments it takes?
There are a number of arguments that np.loadtxt() takes that you’ll find useful: delimiter changes the delimiter that loadtxt() is expecting, for example, you can use ‘,’ and ‘\t’ for comma-delimited and tab-delimited respectively; skiprows allows you to specify how many rows (not indices) you wish to skip; usecols takes a list of the indices of the columns you wish to keep.
What does np.genfromtxt() do?
There is another function, np.genfromtxt(), which can handle such structures. If we pass dtype=None to it, it will figure out what types each column should be.
Import ‘titanic.csv’ using the function np.genfromtxt() as follows:
data = np.genfromtxt(‘titanic.csv’, delimiter=’,’, names=True, dtype=None)
Here, the first argument is the filename, the second specifies the delimiter , and the third argument names tells us there is a header. Because the data are of different types, data is an object called a structured array. Because numpy arrays have to contain elements that are all the same type, the structured array solves this by being a 1D array, where each element of the array is a row of the flat file imported. You can test this by checking out the array’s shape in the shell by executing np.shape(data).
Good to remember
There is also another function np.recfromcsv() that behaves similarly to np.genfromtxt(), except that its default dtype is None.
How do you convert a dataframe df into a numpy array?
df.values converts the dataframe df into a numpy array
What is a pickled file?
There are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries. If you want your files to be human readable, you may want to save them as text files in a clever manner. JSONs, which you will see in a later chapter, are appropriate for Python dictionaries.
However, if you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream.
How do you open an excel file in pandas?
How do you check the worksheet names?
df=pd.ExcelFile(filename)
df.sheet_names
How do you Load a sheet into a DataFrame by name?
How do you Load a sheet into a DataFrame by index?
df1 = xl.parse(‘2004’)
where x1 is the dataframe which contains all the worksheets
df2 = xl.parse(0)
Parse the second sheet by index. In doing so, parse only the first column with the parse_cols parameter, skip the first row and rename the column ‘Country’. The argument passed to parse_cols also needs to be of type list.
df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=[‘Country’])
How do you correctly import the function SAS7BDAT() from the package sas7bdat?
from sas7bdat import SAS7BDAT
How do you read Stata files as Dataframes?
pd.read_stata(filename)
How do you make a histogram from a column in a dataframe?
pd.DataFrame.hist(df[[‘filename’]])
What is the correct way of using the h5py function, File(), to import the file in h5py_file into an object, h5py_data, for reading only?
h5py_data = h5py.File(h5py_file, ‘r’)
How do you load a Matlab file?
scipy.io.loadmat(‘albeck_gene_expression.mat’)
How do you create a connection to a relational database?
#Import necessary module **from sqlalchemy import create\_engine**
Create engine: engine
engine=create_engine( ‘sqlite:///Chinook.sqlite’)