Opening Files Flashcards
What are the two main types of files?
Plain files: Text legible by humans and computers. Made entirely of text characters meaning they contain unformatted text characters without binary encoding or special formatting. (eg .txt, .csv files)
Binary files: Not legible by humans, have special encription. (eg .xlsx, .docx, .jpeg)
What are the two main types of plain file?
.txt or Tab separated values
.csv or Comma Separated values
How do you read in a .txt file using pandas?
df = pd.read_csv(‘../Datasets/count.txt’, sep = “\t”)
We need to format the data in a way that is tab separated not comma sepa
How do you read in a .csv file using pandas?
df = pd.read_csv(‘../Datasets/count.csv’)
Our .csv file is called ‘count.csv’ we are reading it into pandas and re
How do you read in an excel file using pandas?
df = pd.read_excel(‘file_path.xlsx’)
df = pd.read_excel(‘file_path.xls’)
Both .xls and .xlsx are types of excel file, one is just older.
Can .csv files be tab separated?
What if this is your output?
Yes, and you can tell by looking at the data to see if it has been read in properly.
In the photo example, the .txt file is misread as a comma separated file
After:
What are the ways of checking your data has loaded in correctly/ that you have the correct data?
- df.shape: Returns the number of rows and columns in the DataFrame (tuple: (rows, columns)).
- df.head(n): Displays the first n rows of the DataFrame (default is 5).
- df.tail(n): Displays the last n rows of the DataFrame (default is 5).
- df.info(): Provides a concise summary of the DataFrame, including the number of non-null values , column data types, and memory usage.
All need brackets() except for shape or they WONT WORK
How to find a maximum value in a column, and the corresponding value in another column?
max_C1 = df[‘C1’].max()
corresponding_C2 = df[df[‘C1’] == max_C1][‘C2’].iloc[0]
print(corresponding_C2)
Here C1 is the column we want the maximum for and C2 is the correspondng
Whats a really useful shortcut in python?
TAB KEY –>|
How do you check what files are present within a folder?
! ls File_name/
Will tell you the names of all the files within this file/folder
What can’t be in a variable name?
Variable name may not start with a digit or underscore, and may not end with an underscore. Double underscores are not permitted in variable name. Also no spaces
How do you navigate file paths?
If the file you want to open is in the same place as your file: nothing but the name of your file is needed. If it is in another place you have to say where to find it.
../ = go back a file
../../ = go back two files (you dont need to put the names of the files you go back into)
Then after this you would have to put the names of the files your file is in.