WEEK 3: Working with R Flashcards

Question 1

Q

A data frame

Answer

A

is a collection of columns.

Question 2

Q

Data frames rules

Answer

A

First, columns should be named. Using empty column names can create problems with your results later on.
The data stored in your data frame can be many different types, like numeric, factor, or character.
Each column should contain the same number of data items, even if some of those data items are missing.

Question 3

Q

Tibbles

Answer

A

They make working with data easier
They won’t change your strings to factors or anything else.
Tibbles also never change the names of your variables
They never create row names.
Tibbles make printing in R easier.

Data frames and tibbles are the building blocks for analysis in R so having set standards for how they’re built and dealt with is pretty important.

Question 4

Q

Tidy data standard

Answer

A

Tidy data refers to the principles that make data structures meaningful and easy to understand.
Variables are organized into columns.
Observations are organized into rows
Each value must have its own cell.

Question 5

Q

The head function

Answer

A

Gives us just the first six rows.

Question 6

Q

str() and colnames()

Answer

A

Get the structure of the data frame

Question 7

Q

The mutate fonction

Answer

A

Makes changes to our data frame.
mutate(name of the data frame to change, new column with its calcutation if needed)
Ex: mutate(diamonds, grammes = kilo/1000)

Question 8

Q

There are three common sources for data

Answer

A

Apackage with data that can be accessed by loading that package
An external file like a spreadsheet or CSV that can be imported into R
Data that has been generated from scratch using R code

Question 9

Q

Create a data from scratch

Answer

A

Create individual vectors of data and then combine them into a data frame using the data.frame() function.

Question 10

Q

To preview you data frame

Answer

A

colnames()
glimpse()
str()

Question 11

Q

readr

Answer

A

read_csv(): comma-separated values (.csv) files

read_tsv(): tab-separated values files

read_delim(): general delimited files

read_fwf(): fixed-width files

read_table(): tabular files where columns are separated by white-space

read_log(): web log files

Question 12

Q

data() function

Answer

A

Loads datasets in R
If you run the data function without an argument, R will display a list of the available datasets.

Question 13

Q

readr_example
readxl_example

Answer

A

The way to use read_csv and read_excel
read_example(“dataset name”) –> displays the list of example datasets
Ex: read_csv(“dataset name”)
read_csv(readr_example (“dataset name”))
read_excel (“dataset name”)
read_excel(readxl_example (“dataset name”))

excel_sheets((“dataset name”) –> lists the names of the individual sheets inside an excel file.

Question 14

Q

Data cleaning packages

Answer

A

Here package makes referencing files easier
The Skimr package makes summarizing data really easy and let’s you skim through it more quickly.
The Janitor package has functions for cleaning data.

Question 15

Q

cleaning funtions

Answer

A

Is useful for pulling just a subset of variables from a large dataset
The rename function makes it easy to change column names.
the rename_with() function can change column names to be more consistent.

Question 16

Q

seperate function
unit function

Answer

A

seperate(table_name, column_name, into c<-(column_1, column_2), sep=’xxxx’)
unite(table_name, column which into we want to unite, first column, second column, sep’xx’)

Question 17

Q

the bias function

Answer

A

Finds the average amount that the actual outcome is greater than the predicted outcome.
If the model is unbiased, the outcome should be pretty close to zero. A high result means that your data might be biased.