Combining datasets Flashcards

1
Q

What are the three main ways to combine data?

A

concat()
merge()
join()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is concat and what do all the components mean within it?

A

concat()
is used to append one (or more) dataframes one below the other (or next
to each other, depending on whether the axis option is set to 0 or 1).
The function takes the form pd.concat([dataframes], axis, join, keys...).

  • [dataframes] is the list of dataframes you want to concatenate.
  • axis specifies the axis to concatenate along.
  • join is the type of join (inner or outer). The default for pd.concat() is outer.
  • keys allows you to add labels to the resulting dataframe so you can determine where the data came from.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of indexing does python use?

A

Python uses zero based indexing.
*In Python’s pandas library, an index is a label that identifies each row in a DataFrame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of joins

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does df.merge do?

A

Joins columns or dataframes with an inner join (as in keeping only the overlapping data) by default.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does df.join() do?

A

Joins on indexes by default and gives an outer join (showing all the dfs) df1.join(df2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Whats the difference between df.loc and df.iloc?

A

df.iloc[start row:end row, start column :end column]

*	.loc: Uses label-based indexing, meaning you specify rows and columns using their labels (names).
*	.iloc: Uses integer-based indexing, meaning you specify rows and columns by their numerical positions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it mean to filter with conditional masks in dataframes mean?

A

You are applying a condition to the rows and removing anything without them: eg if you wanted to remove any ages under 18 in column age:

df_old = df[df[‘Age’] > 18]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do keys do

A

It is often useful to add a label to our data, so that we know which dataset it originated from. df = pd.concat([infected, control], keys = [“infected”, “control”], axis = 0)
df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to find specific values in a dataset

A

.isin

Another useful way to filter data-frames is to extract rows that contain values within a specified list. To do this, we use theisincommand. For example, we could select the rows from the count dataframe that contain the soils ‘Clay’ or ‘Loam’ usingcount[count[‘Soil’].isin([‘Clay’, ‘Loam’])].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you concatenate vertically?

A

axis = 0

aka default

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you concatenate data horizontally?

A

axis = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly