Importing Data Flashcards

Question 1

Q

What does the command ! ls do?

Answer

A

The IPython magic command ! lswill display the contents of your current directory

Question 2

Q

np.loadtext() what are some of the arguments it takes?

Answer

A

There are a number of arguments that np.loadtxt() takes that you’ll find useful: delimiter changes the delimiter that loadtxt() is expecting, for example, you can use ‘,’ and ‘\t’ for comma-delimited and tab-delimited respectively; skiprows allows you to specify how many rows (not indices) you wish to skip; usecols takes a list of the indices of the columns you wish to keep.

Question 3

Q

What does np.genfromtxt() do?

Answer

A

There is another function, np.genfromtxt(), which can handle such structures. If we pass dtype=None to it, it will figure out what types each column should be.

Import ‘titanic.csv’ using the function np.genfromtxt() as follows:

data = np.genfromtxt(‘titanic.csv’, delimiter=’,’, names=True, dtype=None)

Here, the first argument is the filename, the second specifies the delimiter , and the third argument names tells us there is a header. Because the data are of different types, data is an object called a structured array. Because numpy arrays have to contain elements that are all the same type, the structured array solves this by being a 1D array, where each element of the array is a row of the flat file imported. You can test this by checking out the array’s shape in the shell by executing np.shape(data).

Question 4

Q

Good to remember

Answer

A

There is also another function np.recfromcsv() that behaves similarly to np.genfromtxt(), except that its default dtype is None.

Question 5

Q

How do you convert a dataframe df into a numpy array?

Answer

A

df.values converts the dataframe df into a numpy array

Question 6

Q

What is a pickled file?

Answer

A

There are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries. If you want your files to be human readable, you may want to save them as text files in a clever manner. JSONs, which you will see in a later chapter, are appropriate for Python dictionaries.

However, if you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream.

Question 7

Q

How do you open an excel file in pandas?

How do you check the worksheet names?

Answer

A

df=pd.ExcelFile(filename)

df.sheet_names

Question 8

Q

How do you Load a sheet into a DataFrame by name?

How do you Load a sheet into a DataFrame by index?

Answer

A

df1 = xl.parse(‘2004’)

where x1 is the dataframe which contains all the worksheets

df2 = xl.parse(0)

Question 9

Q

Parse the second sheet by index. In doing so, parse only the first column with the parse_cols parameter, skip the first row and rename the column ‘Country’. The argument passed to parse_cols also needs to be of type list.

Answer

A

df2 = xl.parse(1, parse_cols=[0], skiprows=[0], names=[‘Country’])

Question 10

Q

How do you correctly import the function SAS7BDAT() from the package sas7bdat?

Answer

A

from sas7bdat import SAS7BDAT

Question 11

Q

How do you read Stata files as Dataframes?

Answer

A

pd.read_stata(filename)

Question 12

Q

How do you make a histogram from a column in a dataframe?

Answer

A

pd.DataFrame.hist(df[[‘filename’]])

Question 13

Q

What is the correct way of using the h5py function, File(), to import the file in h5py_file into an object, h5py_data, for reading only?

Answer

A

h5py_data = h5py.File(h5py_file, ‘r’)

Question 14

Q

How do you load a Matlab file?

Answer

A

scipy.io.loadmat(‘albeck_gene_expression.mat’)

Question 15

Q

How do you create a connection to a relational database?

Answer

A

#Import necessary module
**from sqlalchemy import create\_engine**

Create engine: engine
engine=create_engine( ‘sqlite:///Chinook.sqlite’)

Question 16

Q

How do you view all tables in a relational database?

Answer

Study These Flashcards

A

table_names=engine.table_names()

Question 17

Q

What is the workflow of SQL querying?

Answer

Study These Flashcards

A

Workflow of SQL querying

● Import packages and functions

● Create the database engine

● Connect to the engine

● Query the database

● Save query results to a DataFrame

● Close the connection

Question 18

Q

How do you query a sql database using pandas?

Answer

Study These Flashcards

A

df = pd.read_sql_query(“SELECT * FROM Orders”, engine)

Question 19

Q

Answer

Study These Flashcards

A

Importing Data Flashcards

(19 cards)