WEEK 3 Flashcards

1
Q

INDEXING

A

With R we can relate one group of vector with another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

INDEXING EXAMPLE PROGRAM

A

MURDER$RATE <- #MURDER$TOTAL/MURDERS$POPULATION * 100000

#MURDERS$RATE<=0.71
#MURDERS$STATE[MURDERS$RATE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

THE SUM FUNCTION

A

The function sum returns the sum of the entries oF a vector and logical vectors get coerced to numeric with TRUE coded as 1 and FALSE as 0.
Thus we can count the states using:
SUM[MURDERS$RATE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LOGICAL OPERATOR PROGRAMMING EXAMPLE

A

WEST <- MURDER$REGION == “WEST”
SAFE <- MURDERS$RATE < 1
INDEX <- WEST & SAFE
MURDERS$STATE [INDEX]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHICH FUNCTION

A

This helps us to find the specific entry by converting vectors of logical into indexes

example
index <- murder$state == “California”
murder$rate[index]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MATCH

A

This function tells us which
indexes of a second vector match each of the entries of a first vector
example
index<- match(c(“California”,”New York”, “Florida”), murder$state)
ind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

%in%

A

If rather than an index we want a logical that tells us whether or not each element of a
first vector is in a second, we can use the function %in%.
c(“Boston”, “Dakota”, “Washington”) %in% murders$state
#> [1] FALSE FALSE TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PLOT

A

PLOT FUNCTION CAN BE USED TO MAKE SCATTERPLOTS

EXAMPLE
X<- MURDERS$POPULATION / 10^6
Y<- MURDERS$TOTAL
PLOT(X,Y)

ALSO

X <-WITH(MURDERS(POPULATION/10^6,TOTAL)
PLOT(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HISTOGRAM

A

HISTOGRAMS ARE A POWERFUL GRAPHICAL SUMMARY OF A LIST OF NUMBERS THAT GIVES YOU A GENERAL OVERVIEW OF NUMBERS YOU HAVE.

HIST()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BOXPLOT

A

They provide a
more terse summary than histograms, but they are easier to stack with other boxplots.

murders$rate <- with(murders, total / population * 100000)
boxplot(rate~region, data = murders)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DPLYR

A

Library(dplyr)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MUTATE FUNCTION

A

This function is used to change the date table by adding more columns, or rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FILTER FUNCTION

A

This is used to filter the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to select a specific column in a data table?

A

By using select function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EXAMPLE FOR MUTATE - ADD A NEW COLUMN CALLED RATE IN MURDERS DATA TABLE

A

murders <- mutate(murders, rate = total / population * 100000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filter the states with murder rate less than 0.71

A

filter(murders, rate <= 0.71)

17
Q

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate?

A

new_table <- select(murders, state, region, rate)
filter(new_table, rate<= 0.71)

18
Q

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate in a single line of code?

A

murders %>% select(murders,state,region,rate) %>% filter(rate<=0.71)

19
Q

MUTATE FUNCTION

A

The mutate function is used to add a column to a dataset. A mutate takes the dataframe as first argument, and names and value as the second argument.

20
Q

ADD MURDER RATE USING MUTATE FUNCTION

A

library(dslabs)
data(“murders”)
murders <- mutate(murders, rate = total / population * 100000)

21
Q

Filter function to filter data

A

The filter function, which takes the data
table as the first argument and then the conditional statement as the second.

22
Q

Selecting columns with select

A

new_table <- select(murders, state, region, rate)
filter(new_table, rate <= 0.71)

This selects only rate, state, region column of murders dataset

23
Q

The pipe function

A

murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
In general, the pipe sends the result of the left side of the pipe to be the first argument of
the function on the right side of the pipe

24
Q

summarize() function

A

1, The main purpose is to create new summary table.
example:
s <- heights %>%
filter(sex == “Female”) %>%
summarize(average = mean(height), standard_deviation = sd(height))
This takes our original data table as input, filters it to keep only females, and then produces
a new summarized table with just the average and the standard deviation of heights

25
Q

Pull() function

A

us_murder_rate <- murders %>%
summarize(rate = sum(total) / sum(population) * 100000) %>%
pull(rate)

The resulting value is numeric not a data frame.

26
Q

groupby()

A

heights %>%
group_by(sex) %>%
summarize(average = mean(height), standard_deviation = sd(height))

The summarize function applies the summarization to each group separately.

27
Q

Arrange()

A

murders %>%
arrange(rate) %>%

Note that the default behavior is to order in ascending order. In dplyr, the function desc
transforms a vector so that it is in descending order.

example:
murders %>%
arrange(desc(rate))

28
Q

Nested Sorting/ Arrange

A

murders %>%
arrange(region, rate) %>%
Here
we order by region, then within region we order by murder rate:

29
Q

What is tibbles

A

The functions group_by and
summarize always return this type of data frame. The group_by function returns a special
kind of tbl, the grouped_df.

30
Q

Tibbles display it better?

A

The print method for tibbles is more readable than that of a data frame. We
can do this using as_tibble(murders).

31
Q

Subset of tibbles are tibbles?

A

If you subset the columns of a data frame, you may get back an object that is not a data
frame, such as a vector or scalar.
With tibbles this does not happen.
class(as_tibble(murders)[,4])
if you want to access the vector that defines a column, and not get back a
data frame, you need to use the accessor $:
class(as_tibble(murders)$population)

32
Q

Create a tibble using tibble?

A

To create a data frame in the
tibble format, you can do this by using the tibble function.
grades <- tibble(names = c(“John”, “Juan”, “Jean”, “Yao”),
exam_1 = c(95, 80, 90, 85),
exam_2 = c(90, 85, 85, 90))

33
Q

How to convert rectangular dataframe into a tibble?

A

To convert a regular data frame to a tibble, you can use the as_tibble function.
ex: as_tibble(grades) %>% class()

34
Q

The Dot Operator?

A

rates <-filter(murders, region == “South”) %>%
mutate(rate = total / population * 10^5) %>%
.$rate
median(rates)

35
Q

the do operator?

A

heights %>%
group_by(sex) %>%
do(my_summary(.))