Data analysis with R Programming Flashcards

1
Q

What you have learnt so far?

A

-Use structured thinking to define a problem and ask the right questions.

  • Work with spreadsheets, databases, and tools like SQL to organize and transform data.

-Clean your data to make sure it has integrity before you analyze it.

  • Create impactful data visualizations to illustrate key points.
  • Craft a compelling story to communicate insights to stakeholders.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Computer programming

A

Giving instructions to a computer to perform an action or set of instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What you will learn?

A
  • Introduction to programming languages.
  • Explore main features and functions.
  • Basic programming concepts in R.
  • How to work with data in R.
  • Clean, transform, visualize, report data in R.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R Programing language

A

Used for statistical analysis, visualization, and other data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Programming Languages

A
  • The words and symbols we use to write instructions for computers to follow.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Coding

A
  • is writing instructions to the computer in the syntax of a specific programming language.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Programming languages

A

-R
- Python
- JavaScript
- SAS
-Scala
-Julia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Benefits of using programming languages

A
  • Clarify the steps of your analysis.
  • Saves time.
  • Reproduce and share your work.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R

A

A programming language frequently used for statistical analysis, visualization, and other data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Open Source

A

Code that is freely available and may be modified and shared by the people who use it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R Benefits

A
  • Accessible
  • Data-centric
  • Open source
  • Community
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Uses of R

A
  • Reproducing your analysis
  • Processing lots of data
  • Creating data visualizations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Integrated Development Environment (IDE)

A

A software application that brings together all the tools you may want to use in a single place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R code known as pipe

A

Helps make a sequence of code easier to work with and read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Basic concepts of R

A
  • Functions
    -Comments
  • Variables
  • Data types
  • Vectors
    -Pipes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Functions (R)

A

A body of reusable code to perform specific tasks in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Argument (R)

A

Information that a function in R needs in order to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Variable (R)

A

A representation of a value in R that can be stored for use later during programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Vector (R)

A

A group of data elements of the same type stored in a sequence in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pipe(R)

A

A tool in R for expressing a sequence of multiple operations, represented with “%>%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Pipe (R) example

A

Tooth Growth %>%
filter(dose==0.5)%>%
arrange(Len)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data Structure

A

Data structure is a format for organizing and storing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Types of atomic vectors

A

-Logical
-Double
-integer
-Character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Logical Vector

A

True/False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Logical vector example

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Integer vector

A

Positive and negative whole values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Integer vector example

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Double vector

A

Decimal values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Double vector example

A

101.175

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Character vector

A

String/ character values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Character vector example

A

“Coding”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Data Frames

A

are the most common way of storing and analyzing data in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Matrix

A

is a two-dimensional collection of data elements. This means it has both rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Operator

A

A symbol that names the type of operation or calculation to be performed in a formula.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Assignment operators

A

Used to assign values to variables and vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Assignment operator Example

A

sales _1 <-1 c(67.00,75.50,90.00,54.75)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Arithmetic Operators

A

Used to complete math calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Athematic Operators

A

+ (addition)
-(subtraction)
*(multiplication)
/(division)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Function

A

A body of reusable code for performing specific tasks in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Argument

A

Information needed by function in R in order to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Comment

A

Helpful text that describes or explains R code, preceded by#.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Variable

A

A representation of a value in R that can be stored for later use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Data Types

A

An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Vector

A

A group of data elements of the same type stored in a one-dimensional sequence in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Pipe

A

A tool in R for expressing a sequence of multiple operations, represented with %>%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Packages (R)

A

Units of reproducible R code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Packages include:

A
  • Reusable R functions
  • Documentation about the functions
  • Sample datasets
  • Tests for checking your code.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

CRAN(Comprehensive R Archive Network)

A

An online archive with R packages, source code, manuals, and documentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

R Packages

A

Packages offer a helpful combination of code, reusable R functions, descriptive documentation, tests for checking operability, and sample data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Tidyverse (R)

A

A system of packages in R with a common design philosophy for data manipulation, exploration, and visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How do Conflicts in R studio happen?

A

Conflicts happen when packages have functions with the same names as other functions.

52
Q

8 Core tidy verse Packages

A

-ggplot2
-Tibble
-tidyr
-readr
-purrr
-dplyr
-stringr
-forcats

53
Q

Conflict notifications

A

are just one type of message that can show up in the console.

54
Q

Vignette

A

is documentation that acts as a guide to an R package.

55
Q

Four Packages that are an essential part of the workflow for data analysts:

A
  • ggplot2
    -dplyr
    -tidyr
    -readr
56
Q

ggplot2 (R)

A

Create a variety of data viz by applying different visual properties to the data variables in R.

57
Q

tidyr(R)

A

A package used for data cleaning to make tidy data.

58
Q

readr(R)

A

Used for importing data

59
Q

dplyr(R)

A

Offers a consistent set of functions that help you complete some common data manipulation tasks.

60
Q

Factors (R)

A

Store categorical data in R where the data values are limited and usually based on a finite group like country or year.

61
Q

What you have Learnt so far.

A
  • Fundamentals of R from variables to vectors and more.
    -Explored the different operations in R and saw how they can help you complete calculations.
  • Check out pipes and how they can make your programming more efficient.
    -Unpacked packages to find out how they are a big part of what you can do in R.
62
Q

Nested

A

In Programming, describes code that performs a particular function and is contained within code that performs a broader function.

63
Q

Nested function

A

A function that is completely contained within another function.

64
Q

Keyboard shortcuts for inserting pipe operators

A
  • PC/ Chromebook: ctrl+shift+m
    -Mac: cmd+shift+m
65
Q

Things to consider when using pipes:

A

-Add the pipe operator at the end of each line of the piped operation except the last one.
-Check your code after you have programmed your pipe.
- Revisit piped operations to check for parts of your code to fix.

66
Q

Data Frame

A

A collection of columns

67
Q

Data Frames rules

A
  • Columns should be named
  • Data stored can be many different types, like numeric, factor, or character.
  • Each column should contain the same number of data items.
68
Q

In Tidy verse

A
  • Tibbles are like streamlined data frames
69
Q

Tibbles

A

-Never change the data types of the inputs.
- Never change the names of your variables.
- Never create row names
- Make printing easier

70
Q

Tidy data (R)

A

A way of standardizing the organization of data within R.

71
Q

Tidy data standards

A
  • Variables are organized into columns.
  • Observations are organized into rows.
  • Each value must have its own cell.
72
Q

.CVS (comma-separated values )

A

a .csv file is a plain text file that contains a list of data. They mostly use commas to separate (or delimit) data, but sometimes they use other characters, like semicolons.

73
Q

.TSV(tab-separated values)

A

a tsv file stores a data table in which the columns of data are separated by tabs. For example, a database table or spreadsheet data.

74
Q

.FWF (Fixed width files)

A

a. fwf file has a specific format that allows for the savings for textual data in an organised fashion

75
Q

.LOG

A

a log file is a computer-generated file that records events from operating systems and other software programs.

76
Q

Arithmetic Operators

A

let you perform both math operations like addition, subtraction, multiplication, and division.

77
Q

Relational Operators

A

Relational operators, also known as comparators, allow you to compare values. Relational operators identify how one R object relates to another ex <,>, <=.

78
Q

Logical operators

A

allow you to combine logical values. Logical operators return a logical data type or Boolean (TRUE or FALSE).

79
Q

Assignment Operators

A

let you assign values to variables. ex <-

80
Q

Organizational functions

A

Help you sort, filter, and summarize your data.

81
Q

Cleaning functions

A

help you preview and rename data so its easier to work with.

82
Q

Transformational functions

A

help you separate and combine data, as well as create new variables.

83
Q

Anscombe’s quartet

A

Four datasets that have nearly identical summary statistics.

84
Q

Popular Visualizations packages in R .

A

-ggplot2
-Plotly
-Lattice
-RGL
-Dygraphs
-Leaflet
-Highcharter
-Patchwork
-gganimate
-ggridges

85
Q

The basics of ggplot2

A

The ggplot2 package lets you make high-quality, customizable plots of your data. ggplot-2 is based on the grammar of graphics, which is a system for describing and building visualizations.

86
Q

Benefits of ggplot-2

A

-Create different types of plots
-Customize the look and feel of plots
-Create high quality visuals
- Combine data manipulation and visualization.

87
Q

Our focus on core concepts in ggplot-2

A

-Aesthetics
-Geoms
-Facets
- Labels and annotations

88
Q

Aesthetic (R)

A

A visual property of an object in your plot.

89
Q

Geom (R)

A

The geometric object used to represent your data.

90
Q

Facets (R)

A

Let you display smaller groups, or subsets, of your data.

91
Q

Labels and annotations (R)

A

Let you customize your plot

92
Q

Mapping (R)

A

Matching up a specific variable in your dataset with a specific aesthetic.

93
Q

Steps to Create your plot in R programming

A

1) Start with the ggplot function and choose a dataset to work with.
2) Add a geom_funtion to display your data.
3) Map the variables you want to plot in the arguments of the aes() function.

94
Q

Aesthetics for points

A

-X
-Y
-Color
-Shape
-Size
-Alpha

95
Q

Geom functions

A

-geom_point
-geom_bar
-geom_line

96
Q

Smoothing

A

enables that detection of a data trend when you can’t easily notice a trend from a plotted data points.

97
Q

Loess smoothing

A

The loess smoothing process is the best for smoothing plots with less than 1000 points.

98
Q

Gam smoothing

A

Gam smoothing or generalized additive model smoothing is useful for something plots with a large number of points. i.e. more than 1000 points.

99
Q

Facet functions

A

-Facet_wrap()
-Facet_grid()

100
Q

To add a title to a chart

A

label function= title= Average product rating.

101
Q

Blue and yellow bars

A

To highlight underperforming products, use an aesthetics function: col = ifelse (x<2, ‘blue’, ‘yellow’).

102
Q

Bar chart

A

To create the bars on the chart, use a geom function: geom_bar ().

103
Q

Trend line

A

To create a trend line, use a geom function: geom_smooth ().

104
Q

Scatter plot chart

A

To create the scatter plot, use a geom function: geom_point ().

105
Q

Compare data

A

To compare data trends across average ratings, use a facets function: facet_wrap (~Average Rating)

106
Q

Axis labels

A

To label the axes, use an aesthetics function: aes (x = Average price (USD), y = Product)

107
Q

Annotate

A

To add notes to a document or diagram to explain or comment upon it.

108
Q

R Markdown

A

A file format for making dynamic documents with R.

109
Q

Course Overview for R markdown

A
  • An Overview for R Markdown
    -How to install R Markdown in RStudio
  • How to Create an R Markdown document
  • The Structure and components of the document
  • How to insert and edit pieces of code called chunks in your document.
  • The Process of exporting your documentation.
110
Q

Markdown

A

A syntax for formatting plain text files.

111
Q

Markdown formatting

A

-Add a_single_underscore
- or asterisk

112
Q

Markdown report output

A

Add a single underscore or asterisk.

113
Q

R Notebook

A

Lets users run your code and show the graphs and charts that visualize the code.

114
Q

R Markdown file formats

A
  • HTML, PDF and Word documents.
    -Slide presentation
    -Dashboard
115
Q

HTML

A

The set of markup symbols or codes used to create a webpage.

116
Q

Other notebook options

A

-Jupyter
-Kaggle
- Google Colab

117
Q

Jupyter notebooks

A

are documents that contain computer code and rich text elements – such as comments, links, or descriptions of your analysis and results

118
Q

YAML

A

A Language for data that translates it so it’s readable.

119
Q

Code Chunk

A

Code added in an.Rmd file

120
Q

Delimiter

A

A character that indicates the beginning or end of a data item.

121
Q

Code chunk delimiters

A
{r } and
122
Q

Code chunk keyboard shortcuts

A

PC/Chromebook: ctrl+alt+I

123
Q

What we have explored so far?

A
  • What R Markdown is
  • How to use R Markdown in Rstudio to create.Rmd files
  • Structure of these files and how to format them to make reports.
  • What code chunks are and how to include them in your documentation.
  • How to take all of your analyses and transform it from an .Rmd file into a report.
124
Q

Case study

A

A common way for employers to assess job skills and gain insight into how you approach common data related challenges.

125
Q

Portfolio

A

Collection of case studies that can be shared with potential employers.

126
Q

Best Practices for Case studies and Portfolios

A

1) Make sure your case study answers the questions being asked.

2) Make sure that you are communicating the steps you have taken and the assumptions you have made.

3) The best portfolios are personal, unique and simple.

4) Make sure your portfolio is relevant and presentable.