WEEK 3 Flashcards by vibush Varshan

INDEXING

With R we can relate one group of vector with another.

How well did you know this?

Not at all

Perfectly

INDEXING EXAMPLE PROGRAM

MURDER$RATE <- #MURDER$TOTAL/MURDERS$POPULATION * 100000

#MURDERS$RATE<=0.71
#MURDERS$STATE[MURDERS$RATE]

How well did you know this?

Not at all

Perfectly

THE SUM FUNCTION

The function sum returns the sum of the entries oF a vector and logical vectors get coerced to numeric with TRUE coded as 1 and FALSE as 0.
Thus we can count the states using:
SUM[MURDERS$RATE]

How well did you know this?

Not at all

Perfectly

LOGICAL OPERATOR PROGRAMMING EXAMPLE

WEST <- MURDER$REGION == “WEST”
SAFE <- MURDERS$RATE < 1
INDEX <- WEST & SAFE
MURDERS$STATE [INDEX]

How well did you know this?

Not at all

Perfectly

WHICH FUNCTION

This helps us to find the specific entry by converting vectors of logical into indexes

example
index <- murder$state == “California”
murder$rate[index]

How well did you know this?

Not at all

Perfectly

MATCH

This function tells us which
indexes of a second vector match each of the entries of a first vector
example
index<- match(c(“California”,”New York”, “Florida”), murder$state)
ind

How well did you know this?

Not at all

Perfectly

%in%

If rather than an index we want a logical that tells us whether or not each element of a
first vector is in a second, we can use the function %in%.
c(“Boston”, “Dakota”, “Washington”) %in% murders$state
#> [1] FALSE FALSE TRUE

How well did you know this?

Not at all

Perfectly

PLOT

PLOT FUNCTION CAN BE USED TO MAKE SCATTERPLOTS

EXAMPLE
X<- MURDERS$POPULATION / 10^6
Y<- MURDERS$TOTAL
PLOT(X,Y)

ALSO

X <-WITH(MURDERS(POPULATION/10^6,TOTAL)
PLOT(X)

How well did you know this?

Not at all

Perfectly

HISTOGRAM

HISTOGRAMS ARE A POWERFUL GRAPHICAL SUMMARY OF A LIST OF NUMBERS THAT GIVES YOU A GENERAL OVERVIEW OF NUMBERS YOU HAVE.

HIST()

How well did you know this?

Not at all

Perfectly

BOXPLOT

They provide a
more terse summary than histograms, but they are easier to stack with other boxplots.

murders$rate <- with(murders, total / population * 100000)
boxplot(rate~region, data = murders)

How well did you know this?

Not at all

Perfectly

DPLYR

Library(dplyr)

How well did you know this?

Not at all

Perfectly

MUTATE FUNCTION

This function is used to change the date table by adding more columns, or rows.

How well did you know this?

Not at all

Perfectly

FILTER FUNCTION

This is used to filter the data.

How well did you know this?

Not at all

Perfectly

How to select a specific column in a data table?

By using select function.

How well did you know this?

Not at all

Perfectly

EXAMPLE FOR MUTATE - ADD A NEW COLUMN CALLED RATE IN MURDERS DATA TABLE

murders <- mutate(murders, rate = total / population * 100000)

How well did you know this?

Not at all

Perfectly

Filter the states with murder rate less than 0.71

Study These Flashcards

filter(murders, rate <= 0.71)

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate?

Study These Flashcards

new_table <- select(murders, state, region, rate)
filter(new_table, rate<= 0.71)

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate in a single line of code?

Study These Flashcards

murders %>% select(murders,state,region,rate) %>% filter(rate<=0.71)

MUTATE FUNCTION

Study These Flashcards

The mutate function is used to add a column to a dataset. A mutate takes the dataframe as first argument, and names and value as the second argument.

ADD MURDER RATE USING MUTATE FUNCTION

Study These Flashcards

library(dslabs)
data(“murders”)
murders <- mutate(murders, rate = total / population * 100000)

Filter function to filter data

Study These Flashcards

The filter function, which takes the data
table as the first argument and then the conditional statement as the second.

Selecting columns with select

Study These Flashcards

new_table <- select(murders, state, region, rate)
filter(new_table, rate <= 0.71)

This selects only rate, state, region column of murders dataset

The pipe function

Study These Flashcards

murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
In general, the pipe sends the result of the left side of the pipe to be the first argument of
the function on the right side of the pipe

summarize() function

Study These Flashcards

1, The main purpose is to create new summary table.
example:
s <- heights %>%
filter(sex == “Female”) %>%
summarize(average = mean(height), standard_deviation = sd(height))
This takes our original data table as input, filters it to keep only females, and then produces
a new summarized table with just the average and the standard deviation of heights

Pull() function

us_murder_rate <- murders %>% summarize(rate = sum(total) / sum(population) * 100000) %>% pull(rate) The resulting value is numeric not a data frame.

groupby()

heights %>% group_by(sex) %>% summarize(average = mean(height), standard_deviation = sd(height)) The summarize function applies the summarization to each group separately.

Arrange()

murders %>% arrange(rate) %>% Note that the default behavior is to order in ascending order. In dplyr, the function desc transforms a vector so that it is in descending order. example: murders %>% arrange(desc(rate))

Nested Sorting/ Arrange

murders %>% arrange(region, rate) %>% Here we order by region, then within region we order by murder rate:

What is tibbles

The functions group_by and summarize always return this type of data frame. The group_by function returns a special kind of tbl, the grouped_df.

Tibbles display it better?

The print method for tibbles is more readable than that of a data frame. We can do this using as_tibble(murders).

Subset of tibbles are tibbles?

If you subset the columns of a data frame, you may get back an object that is not a data frame, such as a vector or scalar. With tibbles this does not happen. class(as_tibble(murders)[,4]) if you want to access the vector that defines a column, and not get back a data frame, you need to use the accessor $: class(as_tibble(murders)$population)

Create a tibble using tibble?

To create a data frame in the tibble format, you can do this by using the tibble function. grades <- tibble(names = c("John", "Juan", "Jean", "Yao"), exam_1 = c(95, 80, 90, 85), exam_2 = c(90, 85, 85, 90))

How to convert rectangular dataframe into a tibble?

To convert a regular data frame to a tibble, you can use the as_tibble function. ex: as_tibble(grades) %>% class()

The Dot Operator?

rates <-filter(murders, region == "South") %>% mutate(rate = total / population * 10^5) %>% .$rate median(rates)

the do operator?

heights %>% group_by(sex) %>% do(my_summary(.))

WEEK 3 Flashcards

(35 cards)