Opening Files Flashcards

1
Q

What are the two main types of files?

A

Plain files: Text legible by humans and computers. Made entirely of text characters meaning they contain unformatted text characters without binary encoding or special formatting. (eg .txt, .csv files)

Binary files: Not legible by humans, have special encription. (eg .xlsx, .docx, .jpeg)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main types of plain file?

A

.txt or Tab separated values
.csv or Comma Separated values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you read in a .txt file using pandas?

A

df = pd.read_csv(‘../Datasets/count.txt’, sep = “\t”)

We need to format the data in a way that is tab separated not comma sepa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you read in a .csv file using pandas?

A

df = pd.read_csv(‘../Datasets/count.csv’)

Our .csv file is called ‘count.csv’ we are reading it into pandas and re

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you read in an excel file using pandas?

A

df = pd.read_excel(‘file_path.xlsx’)

df = pd.read_excel(‘file_path.xls’)

Both .xls and .xlsx are types of excel file, one is just older.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can .csv files be tab separated?

What if this is your output?

A

Yes, and you can tell by looking at the data to see if it has been read in properly.

In the photo example, the .txt file is misread as a comma separated file

After:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the ways of checking your data has loaded in correctly/ that you have the correct data?

A
  • df.shape: Returns the number of rows and columns in the DataFrame (tuple: (rows, columns)).
    • df.head(n): Displays the first n rows of the DataFrame (default is 5).
    • df.tail(n): Displays the last n rows of the DataFrame (default is 5).
    • df.info(): Provides a concise summary of the DataFrame, including the number of non-null values , column data types, and memory usage.
    df.describe(): Provides the mean, std, count and other statistics of your data.

All need brackets() except for shape or they WONT WORK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to find a maximum value in a column, and the corresponding value in another column?

A

max_C1 = df[‘C1’].max()
corresponding_C2 = df[df[‘C1’] == max_C1][‘C2’].iloc[0]
print(corresponding_C2)

Here C1 is the column we want the maximum for and C2 is the correspondng

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats a really useful shortcut in python?

A

TAB KEY –>|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you check what files are present within a folder?

A

! ls File_name/

Will tell you the names of all the files within this file/folder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can’t be in a variable name?

A

Variable name may not start with a digit or underscore, and may not end with an underscore. Double underscores are not permitted in variable name. Also no spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you navigate file paths?

A

If the file you want to open is in the same place as your file: nothing but the name of your file is needed. If it is in another place you have to say where to find it.

../ = go back a file
../../ = go back two files (you dont need to put the names of the files you go back into)

Then after this you would have to put the names of the files your file is in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly