Basics Flashcards
what are 4 different types of data in python?
str = string
int = integer
float = number with decimal values
bool = boolean, True or False
Can a list contain different types of data?
With what starting number are lists indexed?
zero going forward
-1 starting from the end of the list
List slicing syntax, e.g. 1:3. Is the 3 included?
start : end
No, the end is not included
What happens if you don’t specify the start or end of a list when list slicing?
e.g. mylist[:4] or mylist[2:]
If don’t specify start, it starts at index zero
If don’t specify end, includes the start and rest of the list
x = [2, 3, 4]
y = x
y[0] = 1
How do you prevent the first element changing to 1 in the original list x?
Instead of y=x, use y=list(x) or y=x[:]. Then changes to y will not affect x.
Can a numpy array contain elements with different types?
No. If you try, some of the elements’ types are changed to end up with a homogenous list (type coersion)
Create a 2D NumPy Array
How are rows and columns indexed in a 2D NumPy Array?
How do you subset a 2D NumPy Array?
np_2d = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Both rows and columns are indexed starting at 0 from the top left corner.
np_2d[0,2] or np_2d[0][2]: the first number is the row and the second the column
np_heights = array([191, 184, 185])
np_positions = array([‘GK’, ‘M’, ‘A’])
gk_heights = np_heights[np_positions == ‘GK’]
What will the gk_heights array contain?
It will contain 191.
np_positions == ‘GK’ creates a boolean array. Using it to index np_heights causes the program to return heights that correspond to “True” values in the boolean array
How do you make an x, y line plot?
Use matplotlib
import matplotlib.pyplot as plt
year = [1950, 1970, 1990, 2010]
pop = [2.5, 3.7, 5.3, 7.0]
plt. plot(year, pop)
plt. show()
Note that need to see the chart
How do you create a scatter plot?
import matplotlib.pyplot as plt
year = [1950, 1970, 1990, 2010]
pop = [2.5, 3.7, 5.3, 7.0]
plt. scatter(year, pop)
plt. show()
Note that need to see the chart
How do you make a histogram?
import matplotlib.pyplot as plt
values = [1, 2, 3, 4]
plt. hist(values, bins=2)
plt. show()
Note that bins is the number of bars you want the data to be divided into. The program automatically calculates appropriate boundaries for your data.
How do you “clean up” a plot to start afresh?
How do you add names to a plot’s axes when using matplotlib.pyplot?
How do you add a title?
plt. xlabel(‘label1’)
plt. ylabel(‘label2’)
plt. title(‘title’)
How do you specify the numbers or ticks to display on an axis?
How do you change the name of the ticks on an axis?
plt. yticks([0, 2, 4, 6, 8, 10])
plt. yticks([0, 2, 4, 6, 8, 10], [‘0’, ‘2B’, ‘4B’, ‘6B’, ‘8B’, ‘10B’])
note that the names have to correspond to the ticks listed
How do you change an axis to logarithmic scale?
plt.scatter(x, y)
how do you change the size of the dots on a scatter plot to reflect a third variable?
plt.scatter(x, y, s=z)
How do you add gridlines?
How do you create a dictionary?
Two facts about keys
dictionaryname = {key1:value1, key2:value2, … }
When you type in dictionaryname[key2], you get value2
- Keys in a dictionary must be unique. If state same key twice, dictionary will just retain the last value stated
- Keys have to be “immutable” objects, e.g. cannot be changed after they’re created. Strings, booleans, integers, and floats are immutable.
how do you access the “keys” in a dictionary?
Note that it takes no arguments
This will list all the keys in the dictionary
eHow do you add a key:value pair to a dictionary?
How do you change a key:value pair in a dictionary?
How do you delete a key:value pair?
dictionaryname[keytobeadded] = valuetobeadded
dictionaryname[key] = newvalueforkey
How do you get a value that is in a nested dictionary?
countries = {‘spain’:{‘capital’:’madrid’, ‘population’:46.77}
‘france’:{‘capital’:’paris’, ‘population’: 66.04}}
extract population of france
How do you create a Dataframe from a Dictionary?
How do you change the default row index numbers?
Make a dictionary where the keys will be the column labels (variables) of the Data frame, and the values are in list form:
dict ={‘country’:[‘Brazil’, ‘Russia’], ‘capital’:[‘Brasilia’, ‘Moscow’]}
import pandas as pd
dataframename = pd.DataFrame(dict)
For index change:
dataframename.index = [‘row1’, ‘row2’, ‘row3’}
How do you create a DataFrame from a CSV file?
dataframename = pandas.read_csv(“filepath.csv”, index_col = 0)
Note that the index_col is not necessary but use if first column in file contains the row labels
How do you output values of an entire column of a DataFrame?
What type of values do you get?
How do you keep the data in a DataFrame?
How do you output values of two columns?
panda series - it’s like a 1D labelled array
use double brackets: dataframename[[‘columnname’]]
dataframename[[‘column1’, ‘column2’]]
How do you output the certain rows from a DataFrame?
*remember to use index NOT row label
*remember rows are indexed starting at zero
How do you output a row from a DataFrame using loc (label-based)?
How do you select multiple rows?
dataframename. loc[[‘rowname’]]
dataframename. loc[[‘row1’, ‘row2’]]
*double brackets to keep type as DataFrame
how do you output specific rows and specific columns from a DataFrame?
How do you select all rows but only a couple of columns?
dataframename. loc[[‘row1’, ‘row2’], [‘column1’, ‘column2’]]
dataframename. loc[:, [‘column1’, ‘column2’]]
How do you output a row with iloc?
How do you select specific rows and specific columns with iloc?
dataframename. iloc[[indexposition]]
dataframename. iloc[[rowindex1, rowindex2], [columnindex1, columnindex2]]
String operations
my_string = ‘thisStringisAwesome’
What will the following yield?
my_string + ‘Innit’
‘m’ in my_string
my_string*2 = ‘thisStringisAwesomethisStringisAwesome’
my_string + ‘Innit’ = ‘thisStringisAwesomeInnit’
‘m’ in my_string = True
How do you subset lists of lists?
my_list= [[4,5,6,7], [3,4,5,6]]
Select 7
To select 7: my_list[1][3]
List Operations
my_list = [‘my’, ‘list’, ‘is’, ‘nice’]
my_list2 = [[4,5,6,7], [3,4,5,6]]
What do the following yield?
my_list + my_list
my_list * 2
my_list2 > 4
my_list + my_list = [‘my’, ‘list’, ‘is’, ‘nice’, ‘my’, ‘list’, ‘is’, ‘nice’]
my_list * 2 = [‘my’, ‘list’, ‘is’, ‘nice’, ‘my’, ‘list’, ‘is’, ‘nice’]
my_list2 > 4 = True
List Methods
How do you…
Get the index of an item in a list?
Count an item?
Reverse the list?
Sort the list?
List Methods
How do you…
Append an item at a time?
Append an item?
3 ways to remove an item?
Insert an item?
my_list.insert(index*, item)
*index of the item before which to insert
String Methods
How do you…
Change string to uppercase?
Change string to lowercase?
Count string elements?
How do you create a numpy array?
Use my_list = [1, 2, 3, 4]
Numpy array operations
my_array = numpy.array([1, 2, 3, 4])
What is the output of…
my_array > 3
my_array * 2
my_array + numpy.array([5, 6, 7, 8])
my_array > 3 : array([False, False, False, True]
my_array * 2 : array([2, 4, 6, 8])
my_array + numpy.array([5, 6, 7, 8]) : array([6, 8, 10, 12])
Numpy Array Functions
How do you…
Get the dimensions of an array?
Get the mean of the array?
Get the median of the array?
Get the correlation coefficient of the array?
Get the standard deviation of the array?
numpy. mean(my_array)
numpy. median(my_array)
Numpy Array Functions
How do you…
Append items to an array?
Insert items in an array?
Delete subarray in an array?
numpy. append(other_array)
numpy. insert(my_array, index*, valuestoinsert)
*index before which value is inserted
numpy.delete(my_array, indexofarraytodelete)
What happens when you do
‘carl’ < ‘chris’
‘carl’ comes first in alphabet, you get True
Less than or equal to
Greater than or equal to
equal to
not equal to
*Note =< is not correct syntax
What are the numpy array equivalents of the boolean operators and, or, and not?
numpy. logical_and()
numpy. logical_or()
numpy. logical_not()
eg. numpy.logical_and(arrayname > 21, arrayname < 22)
*note that these also work on Pandas DataFrame
if statement
if condition:
*The indentation of the expression lets python know that the expression is part of the if statement. To exit if statement, simply write code without indentation.
if else statement
if condition:
elif statement
What happens with the elif statement if the condition of the if statement is true?
if condition:
elif condition:
If condition for if is true, the output is the if’s expression. elif is never reached
while loop statement
while condition:
fofor loop statement
for var in seq:
*var is a variable you create
*seq is a list of values
enumerate function
Returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair
fam = [1, 2, 3, 4]
for index, height in enumerate(fam):
print(‘index “ + str(index) + “:” + str(height))
index 0: 1
index 1: 2
index 2: 3
index 3: 4
Loop over string
What is the output of the following?
for c in “family”:
print (c.capitalize())
Using the following dictionary, print out each key and value using the for loop.
world = {‘afghanistan’:30.55, ‘albania’:2.77, ‘algeria’:39.21}
for key, value in world.items():
print (key + str(value))
*Note that need the .items() to use for loop with the dictionary
How do you print out every element of a 2D Numpy Array (using for loop)?
np_height = np.array([1, 2, 3, 4])
np_weight = np.array([1, 2, 3, 4])
meas = np.array([np_height, np_weight])
for val in np.nditer(meas):
*This will print out all the heights, then all the weights
For loop with Pandas Dataframe
How do you print out every single element of a DataFrame?
Use dataset “countries”
for rowlabel, rowvalues in countries.iterrows():
This will print the rowlabel followed by all the column names with row values. e.g.
country Brazil
capital Brasilia
area 8.516
population 200.4
For loops with Pandas DataFrame
How do you print out row labels with only the capital cities for the dataset countries?
for rowlabel, rowvalues in countries.iterrows():
print(rowlabel + “: “ + rowvalues[“capital”])
This will print:
BR: Brasilia
RU: Moscow
What function gives you the length of a string?
Using the apply function, how do you create a new column with the countries’ name length in the DataFrame countries?
countries[“name_length”] = countries[“country”].apply(len)
Which numpy function generates random numbers? How do you manually choose a seed?
This generates psedurandom numbers between 0 and 1 starting from a seed
*seed can be any number. If don’t specify, python will automatically pick a seed. You can set the seed so that results are reproducible between simulations (same seed will generate same random numbers)
Simulate a coin toss
coin = numpy.random.randint(lowest, high)
*lowest is the lowest number and high is one above the highest number
coin = numpy.random.randint(0, 2)
if coin == 0: