WEEK 3: Working with R Flashcards

1
Q

A data frame

A

is a collection of columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data frames rules

A

First, columns should be named. Using empty column names can create problems with your results later on.
The data stored in your data frame can be many different types, like numeric, factor, or character.
Each column should contain the same number of data items, even if some of those data items are missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tibbles

A

They make working with data easier
They won’t change your strings to factors or anything else.
Tibbles also never change the names of your variables
They never create row names.
Tibbles make printing in R easier.

Data frames and tibbles are the building blocks for analysis in R so having set standards for how they’re built and dealt with is pretty important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Tidy data standard

A

Tidy data refers to the principles that make data structures meaningful and easy to understand.
Variables are organized into columns.
Observations are organized into rows
Each value must have its own cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The head function

A

Gives us just the first six rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

str() and colnames()

A

Get the structure of the data frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The mutate fonction

A

Makes changes to our data frame.
mutate(name of the data frame to change, new column with its calcutation if needed)
Ex: mutate(diamonds, grammes = kilo/1000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

There are three common sources for data

A
  • Apackage with data that can be accessed by loading that package
  • An external file like a spreadsheet or CSV that can be imported into R
  • Data that has been generated from scratch using R code
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Create a data from scratch

A

Create individual vectors of data and then combine them into a data frame using the data.frame() function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To preview you data frame

A

colnames()
glimpse()
str()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

readr

A

read_csv(): comma-separated values (.csv) files

read_tsv(): tab-separated values files

read_delim(): general delimited files

read_fwf(): fixed-width files

read_table(): tabular files where columns are separated by white-space

read_log(): web log files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data() function

A

Loads datasets in R
If you run the data function without an argument, R will display a list of the available datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

readr_example
readxl_example

A

The way to use read_csv and read_excel
read_example(“dataset name”) –> displays the list of example datasets
Ex: read_csv(“dataset name”)
read_csv(readr_example (“dataset name”))
read_excel (“dataset name”)
read_excel(readxl_example (“dataset name”))

excel_sheets((“dataset name”) –> lists the names of the individual sheets inside an excel file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data cleaning packages

A

Here package makes referencing files easier
The Skimr package makes summarizing data really easy and let’s you skim through it more quickly.
The Janitor package has functions for cleaning data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

cleaning funtions

A

Is useful for pulling just a subset of variables from a large dataset
The rename function makes it easy to change column names.
the rename_with() function can change column names to be more consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

seperate function
unit function

A

seperate(table_name, column_name, into c<-(column_1, column_2), sep=’xxxx’)
unite(table_name, column which into we want to unite, first column, second column, sep’xx’)

17
Q

the bias function

A

Finds the average amount that the actual outcome is greater than the predicted outcome.
If the model is unbiased, the outcome should be pretty close to zero. A high result means that your data might be biased.