NNNWeek 2 - Basic R, Data Structure/Manipulation Flashcards

1
Q

What are logical operators used in?

A

Data management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do logical operators return in R?

A

True or false (a Boolean value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To find outliers, values greater than or less than a specific score, or check values fall within a certain range, or recode continous variables in categorial variables (low/medium/high) what would you use?

A

A logical operator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What might a logical operator look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do operators deal with data - where do they pull from?

A

Operators take data on the LEFT hand side an a few options arguments on the right hand side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In an analysis, we often subset data. Why would we do this?

A

Because subsetting means we can select a portion of data that is relevant in the criteria you are wanting to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If you wanted to exclude outliers or select participants who meet a certain criteria in your data, what task would you use ?

A

Subset the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false: logical operators are commonly used in subsetting to pick specific rows of a dataset or specific values from a single variable?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If we wanted to run analyses in data excluding people outside 25 years of age and 20 years of age, which logical operator would we use in subsetting?

A

%gele% because it captures people greater than or equal to.

You can see we are working with just a single variable as Age is only word to the left of the logical operator. If more than one, would be & or vertical line if more than one. e.g [Age == 18 & Female == 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What operators do we use to chain conitions together?

A

& or the single vertical line

e.g

d[Age == 18 & Female == 1, .(UserID, Age)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What if you had several values that you wanted to test within a data set, what operator could do this?

A

the %in% one.

select anyone whose age is in 18, 19, or 20

d[Age %in% c(18, 19, 20), .(UserID, Age)]

So this is saying, in this variable of AGE on the left, find anyone whose age is in 18,19 or 20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If you had two variables to consider - not just Age, but also Gender as an example, how would you chain these together?

A

Using brackets! (parentheses)

So say you wanted to see a 19 year old female participant or 18 year old male participants. You’ve used vertical line below to put two together but could do & potentially..

d[(Age == 19 & Female == 1) | (Age == 18 & Female == 0),
.(UserID, Age)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is this operator asking the data to pull out?

d[Age < 20, .(UserID, Age)]

A

Show anyone under the age of 20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is this operator asking the data to pull out?

## anyone age 20 or under
d[Age <= 20, .(UserID, Age)]
A

Show anyone age 20 or under

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the operator ! say

A

who is NOT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does this operator say?

d[Age %!in% c(18, 19, 20), .(UserID, Age)]

A

Return data for the variable age for anyone who is NOT age 18, 19 or 20.

17
Q

What does the function is.na() mean?

A

It returns true if a value is NA. So if we don’t want missing values, we might include is.na in an operation order like this:

d2

18
Q

How many main parts do data tables have?

A

DT , i , J and by

19
Q

What does DT represent in a data table?

A

The name of the data table

20
Q

What does i represent in datatable?

A

Which row(s) we want to select

21
Q

if We leave the i portion blank, what does this mean?

A

Select ALL rows

22
Q

What does the J represent?

A

Columns/variables.

That could be selecting only certain columns/variables to display or creating / modifying a column or variable.

23
Q

DT i and J mean

A

Name of data table, which rows we want to select, and columns and variables

24
Q

What does by mean - last part of DT argument

A

By represents a grouping variable or some way of organising out operations

25
Q

If you wanted to perform the SAME operation for each ID in the data set, which part of the data.tables would be important - the i J or by?

A

The by as it is the grouping variable and organises out operations

26
Q

If there are NO commas in data.tables, what is this telling R?

A

Give me the rows/cases that match my criteria, and also give me all the columns.variables.

27
Q

What is the below asking for?

d2 3][, Count := .N,

by = UserID]

[Count >= 30]

A

Assign to d2, in this order:

Here, the order is:

first, remove missing stress observations,

second take only surveys/rows where stress scores > 3,

third, count how many observations are not missing by ID.

Finally:

we are only taking people with 30 non missing stress values > 3

28
Q

We have decided that negative affect scores above 4 are outliers and that participant 56 was an outlier overall. We can exclude ID 56 and select only observations with negative affect at or below 4 as below.

What would this equation look like?

A

d[UserID != 56 & NegAff <= 4,

.(UserID, NegAff)]

29
Q

How can you ask R to show you the type of data a particular variable is?

A

Code:

class(dataset$variable)

EXAMPLE:
your dataset is metacars. you want to know what type of data the cartype is.

equation would be:
class (metacars$cartype)

30
Q

What is the logical data type?

A

Used for logical data, which are either TRUE or FALSE. If data are logical, it is a very efficient format and useful for many cases. Logical variables can be compared. Arithmetic can be used for logical variables, in which case TRUE is treated as 1 and FALSE as 0.

31
Q

What is the Integer data type?

A

Used for integer type data, that is whole numbers like 0, 1, 2. For variables that are only whole numbers, integer format is more efficient than real numbers or numeric data (e.g., 1.4).

32
Q

What is the Numbers/real/numeric data type?

A

Used for real numbers, such as 1.1, 4.8. It also can be used for integer data (i.e., whole numbers only) but is a less efficient format. In R these are represented by the class numeric abbreviated num.

33
Q

What is the Text/character/string data type?

A

Used for text type data, such as names, qualitative data, etc. Also, any numbers can be stored as strings. In R these are represented by the class character abbreviated chr. Character data do not work with arithmetic operators, but can be sorted (e.g., alphabetically).

34
Q

Why might we treat variables that are only whole numbers with INTEGER format instead of real numbers/numeric data?

A

Because integer format is more efficient than real numbers are real number will show you the decimals.