W2: Merging datasets Flashcards

1
Q

Why would we need to merge data sets?

A

Because data comes from multiple sources when we are collecting it. Need to merge it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which function do we use to merge data?

A

merge() function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many types of merges are there in R?

A

Four.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is joins and merge the same thing?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can merge/joins ever involve just one data set?

A

No. It involves two. You do two at a time even if you have more than two datasets to merge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a natural join?

A

resulting data has only rows present in both x and y.

Argument: all = FALSE .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a full outer join?

A

the data has all rows in x and all rows in y.

Argument: all = TRUE .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a left outer join?

A

resulting data has all rows in x.

Argument: all.x = TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a right outer join?

A

resulting data has all rows in y.

Argument: all.y = TRUE .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Natural Join example

Different arguments tell R which type of join/merge this is.

For example, at the end of a code, you would tell R by = (whichever new variable you wanted to name the combining of data) followed by the argument all = XX for the type of join it will be.

A
merge(
  x = surveys,
  y = acti,
  by = "ID",
  all = FALSE)

When we write by = ID what is meant is that it should be by the variable called ID

The word ID is not special, its just that we happened to call the variable containing our IDs, ID. If we had called the variable containings IDs Email or Name then we would write: by = Email or by = Name. In other words, match the text in quotes to an actual variable name in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do natural joins include?

A

Natural joins have only the rows / observations that are present in both datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do full outer joins include?

A

cases that are present in either dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When you do a natural join

A

You will end up with a smaller or more limited data set unless your two data sets have EXACTLY same participants.

But if one has more or less participants, natural join will only return data when rows were present in both x and y. This would usually have smaller numers of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the opposite to natural join?

A

Full outer join.

This data set will give ALL of the rows in x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You merge two datasets X and Y together. Y has more participants than X. Why does X now have missing data from participants?

A

Because those participants were never in the X dataset in the first place. It was Y dataset that had more particiapnts.

Say there were extra participants in Y data set that weren’t in X data set, they would still be represented in your NEW data set (the merge/join one) after a full outer join.

Just that any variables that came from the X data set would now be missing because that participant wasn’t even IN the X data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly