Data Science using Python and R - 2 Flashcards
What is required to run Python code?
A Python compiler, specifically the Spyder compiler included in the Anaconda software package.
How do you download Anaconda?
Go to the Spyder installation page and select the Anaconda link under Windows or MacOS X options.
What are the three main boxes displayed when you first open Spyder?
- Left-hand box: where you write Python code
- Top-right box: lists data sets and items created by Python code
- Bottom-right box: displays output and error messages.
What are the five kinds of actions focused on in Python coding?
- Using comments
- Importing packages
- Executing commands
- Saving output
- Getting data into Python.
What character is used to start a comment in Python?
#
True or False: Comments in Python are executed by the compiler.
False
What is the purpose of comments in Python code?
To help others understand the code better.
How do you execute a single line of code in Spyder?
Place the cursor on the line and press the run button or use the keyboard shortcut.
How do you execute multiple lines of code in Spyder?
Highlight the relevant lines and press the ‘Run selection or current line’ button or use the keyboard shortcut.
What is the purpose of importing packages in Python?
To perform complex data science tasks without writing the code from scratch.
Which two packages are commonly imported in Python for data science?
- pandas
- numpy.
What command is used to import the pandas package as pd?
import pandas as pd
What command is used to import the numpy package as np?
import numpy as np
Fill in the blank: To import specific commands from a package, use the format _____ from _____ import _____.
from [package_name] import [command_name]
What is the structure of the command to get a data set into Python?
your_name_for_the_data_set = pd.read_csv(‘the_path_to_the_file’)
What does the command pd.read_csv() do?
It imports a CSV file into a pandas DataFrame.
What is the syntax for saving output in Python?
your_name_for_the_output = the_command_that_generated_the_output
What is the purpose of saving output in Python?
To use the output in later lines of code.
What Python command is used to create a contingency table?
pd.crosstab()
How do you access a specific record in a pandas DataFrame?
Use the .loc attribute followed by the record index.
How do you view the first record in a DataFrame named bank_train?
bank_train.loc[0]
How do you access multiple records in a DataFrame?
Use the .loc attribute and list the record indices.
If you want to see the first 10 rows of a DataFrame, what is the syntax?
bank_train[0:10]
How do you access a single variable in a DataFrame?
Use bank_train[‘variable_name’]
How do you access multiple variables in a DataFrame?
Use bank_train[[‘var1’, ‘var2’]]
What must you do to set up graphics in Spyder for better display?
Change the graphics settings to ‘Automatic’ in Preferences.
What is the first step to change graphics settings in Spyder?
Click on Tools in the menu bar, then select Preferences.
What is the first step to set up graphics options in Spyder?
Click on Tools in the menu bar, then select Preferences
In Spyder, where do you find the Graphics tab to change settings?
In the Preferences window, on the top of the right‐hand side
What should you select under Graphics backend to enable graphical output?
Choose Automatic from the Backend drop‐down menu
What must you do after changing the graphics options in Spyder?
Close Spyder and reopen it for the new settings to take effect
True or False: Changing the graphics backend will open graphical output in the same window.
False
What is the main purpose of the Configure subplots button in the graphics output window?
To change the margins of the plot
What is the first action required to download R?
Go to the R installation page and choose a mirror
How do you open a new R script in RStudio?
Click on File > New File > R Script
What is located in the top-left box of the RStudio interface?
Where you will type your R code
What does the bottom-right box in RStudio primarily display?
Many tabs, including the ‘Plots’ tab for graphical output
What symbol is used to start a comment in R code?
#
How do you execute a single line of R code in RStudio?
Click the Run button or use the keyboard shortcut
What are the two steps to make an R package available for use?
- Downloading the package
- Opening the package
Fill in the blank: To download a package in R, you use the command _______.
install.packages()
What command do you use to open an R package after it’s been downloaded?
library()
What is the easiest method to get a data set into R?
Using the ‘Import Dataset’ button in the RStudio Environment tab
What should be selected in the Import Dataset window to indicate the presence of column headers?
The ‘Yes’ button for ‘Heading’
What is the general form to rename a data set in R?
object_name <- object_to_be_saved
What command is used to read a CSV file into R?
read.csv()
How do you create a contingency table in R?
Use the table() function
What notation is used to access a specific record in a data set in R?
Bracket notation: data_set_name[ rows of interest , columns of interest ]
What does the command bank_train[1, ] return?
The first record in the bank_train data set
How do you access multiple records, for example, the first, third, and fourth records in R?
bank_train[c(1,3,4), ]
How can you access specific variables in a data set in R?
Use bracket notation for columns: data_set_name[, c(column_indices)]
What is the result of the command bank_train[, c(1, 3)]?
It returns the first and third variables from the bank_train data set
What are the first and third variables in the data set?
age and marital
How do you access specific variables in a data frame in R?
Use the syntax bank_train[, c(1, 3)]
How can you access the age variable from the bank_train data set?
bank_train$age
What property do data frames have that allows identifying variables of interest?
You can use a dollar sign ($)
What programming languages are covered in this book?
Python and R
What is the purpose of comments in code?
To provide explanations or notes without affecting output
What character begins a comment in Python?
#
What is the use of the ‘as’ keyword when importing Python packages?
To rename the imported package
How do you save output generated by Python code?
Use assignment to a variable
How do you save output generated by R code?
Assign output to a variable
Why is it important to specify if a data set has column headings?
To ensure proper data interpretation and manipulation
What are two ways to get a data set into R?
- Using read.csv()
- Using read.table()
What is contained in the bottom-right window of a programming environment?
Output or results of executed code
What is the output of executing a comment in R?
No output is generated
What packages should be imported for Python in the exercises?
- pandas
- numpy
What package should be imported for R in the exercises?
ggplot2
What is the name given to the imported bank_marketing_training data set?
bank_train
What is a contingency table?
A table used to display the frequency distribution of variables
What is the name of the saved output for the contingency table in Python?
crosstab_01
What is the name of the saved output for the contingency table in R?
t1
How do you save the first nine records of the bank_train data set?
Assign them to a new data frame
How do you save the age and marital records of the bank_train data set?
Assign them to a new data frame
How do you save the first three records of the age and marital variables?
Assign them to a new data frame
What should be done when importing the adult_ch3_training data set?
Use the ‘Heading: Yes’ setting
What command should be imported for Python related to decision trees?
DecisionTreeClassifier from sklearn.tree
What package should be imported for R related to decision trees?
rpart
What is the name given to the contingency table of workclass and sex?
table01
What is the name given to the contingency table of sex and marital status?
table02
What records should be displayed to analyze sex and workclass?
The first record
What records should be displayed to analyze sex and marital status?
Records 6–10
What is the name of the new data set for married individuals?
adultMarried
What is the name of the new data set for individuals older than 40?
adultOver40