Data Science Flashcards

1
Q

dictionaries

A

type of data type like list. uses { } is define it
The built-in datatypes in Python is called dictionary. It defines one-to-one relationship between keys and values. Dictionaries contain pair of keys and their corresponding values. Dictionaries are indexed by keys.

countries = {“afghanistan”: 30.55, “albania”:2.77, “algeria”: 40.33}

dict_name[key]
result: value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

pandas

A

helps import data from csv into python as tabular form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

loc in panadas

A

uses labels for slicing/filtering

print(cars.loc[[‘AUS’, ‘EG’]])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

iloc in pandas

A

uses indexes for slicing/filtering

print(cars.iloc[2])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if then stmt

A

if condition:

expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

boolen ops for Numpy arrays

A

use np.logical_or, np.logical_and ,

np.logical_and

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

while loop

A

The while loop is like a repeated if statement. The code is executed over and over again, as long as the condition is True
while condition:
expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between list and tuples?

A

Lists are mutable i.e they can be edited. Syntax: list_1 = [10, ‘Chelsea’, 20]

Tuples are immutable (tuples are lists which can’t be edited). Syntax: tup_1 = (10, ‘Chelsea’ , 20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain Inheritance in Python with an example

A

Inheritance allows One class to gain all the members(say attributes and methods) of another class. Inheritance provides code reusability, makes it easier to create and maintain an application. The class from which we are inheriting is called super-class and the class that is inherited is called a derived / child class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Help() function:

A

The help() function is used to display the documentation string and also facilitates you to see the help related to modules, keywords, attributes, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dir() function

A

The dir() function is used to display the defined symbols.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does this mean: *args, **kwargs?

A

We use *args when we aren’t sure how many arguments are going to be passed to a function, or if we want to pass a stored list or tuple of arguments to a function. **kwargsis used when we don’t know how many keyword arguments will be passed to a function, or it can be used to pass the values of a dictionary as keyword arguments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write a one-liner that will count the number of capital letters in a file. Your code should work even if the file is too big to fit in memory.

A
with open(SOME_LARGE_FILE) as fh:
count = 0
text = fh.read()
for character in text:
    if character.isupper():
count += 1
We will now try to transform this into a single line.
count sum(1 for line in fh for character in line if character.isupper())
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you randomize the items of a list in place in Python?

A

from random import shuffle
x = [‘Keep’, ‘The’, ‘Blue’, ‘Flag’, ‘Flying’, ‘High’]
shuffle(x)
print(x)
[‘Flying’, ‘Keep’, ‘Blue’, ‘High’, ‘The’, ‘Flag’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write a sorting algorithm for a numerical dataset in Python.

A

list = [“1”, “4”, “0”, “6”, “9”]
list = [int(i) for i in list]
list.sort()
print (list)

NOTE: if list held integers then you can just to list.sort()
but here we first had to convert string into integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you generate random numbers in Python?

A

import random
random.random
randrange(a, b): it chooses an integer and define the range in-between [a, b). It returns the elements by selecting it randomly from the range that is specified. It doesn’t build a range object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is pickling and unpickling?

A

Pickle module accepts any Python object and converts it into a string representation and dumps it into a file by using dump function, this process is called pickling. While the process of retrieving original Python objects from the stored string representation is called unpickling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you calculate percentiles with Python/ NumPy?

A

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) #Returns 50th percentile, e.g. median
print(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Suppose list1 is [2, 33, 222, 14, 25], What is list1[-1] ?

a) Error
b) None
c) 25
d) 2

A

c) 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how to parse strings into lists

A

l1 = s1.split(“ “)

print(l1)

21
Q

define functions and reuse them

A
def times(x,y):
    z = x*y
    return z

print(times(5,10))

22
Q

iterating through dictionaries

A

for k in my_dict1.keys():
print(k)

for k,v in my_dict1.items():
print(k,v)

23
Q

word = word + 1

A

=+1

24
Q

Saving the tweets to the csv file

A
with open('recent_tweets_crest 100.csv', 'w', newline='') as csvfile:
    wf = csv.writer(csvfile, delimiter=',',
                            quotechar='"')
for tweet in tweets:
    wf.writerow(['crest', tweet['text'].encode("utf-8")])
25
Q

numpy — how to use/code it

A

import numpy as np
*** height and weights are listed predefined before

In [7]: np_height = np.array(height)
In [8]: np_height
Out[8]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [9]: np_weight = np.array(weight)
In [10]: np_weight
Out[10]: array([ 65.4, 59.2, 63.6, 88.4, 68.7])
In [11]: bmi = np_weight / np_height ** 2
In [12]: bmi
Out[12]: array([ 21.852, 20.975, 21.75 , 24.747, 21.441])

26
Q

np1 = np.array([1,’is’, True])

A

NumPy arrays: contain only one type
array([‘1’, ‘is’, ‘True’],
dtype=’

27
Q

you can not add value of two list, only combine them.

A
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

In [22]: python_list + python_list
Out[22]: [1, 2, 3, 1, 2, 3]
In [23]: numpy_array + numpy_array
Out[23]: array([2, 4, 6])

28
Q

subsetting numpy. list of 1,2,3,4,5. find values > 3

A

np2 > 3
array([False, False, False, True, True], dtype=bool)

np2[np2>3]
array([4, 5])

29
Q

subsetting 2d numpy

A

np2d [row] [column]

30
Q

for numpy, [:] means?

A

everything in row/column

31
Q

for numpy, [1:3] means?

A

includes only row/coulmns 1 and 2

32
Q

numpy 2d structure

A

np2d = np.array( [[list1], [list2]])

33
Q

arrays ans lists count index as

A

0,1,2,3,4 etc

34
Q

how to change values in a list

A

list = [1,2,3,4,5]

list[2] = ‘33’

list[:2] = [0,0]

35
Q

how to add to a list

A

append() or list + [1,2,3]

36
Q

how to delete a list or values in list?

A

set to equal to empty list
list [:] = [ ]
list [:2] = [ ]

37
Q

opening and writing a file

A
f = open('file path', 'w')
              ('c:/test/a.txt', 'w') 
f.write('hey what'/s up')
f.close()
always close files
38
Q

reading file

A

f = open(‘file path’, ‘r’)
(‘c:/test/a.txt’, ‘r’)

f.read(#bytes)

39
Q

import panda as pd

pd.dataframe

A

cars = pd.DataFrame(my_dict)

40
Q

how to put row labels on pandas data frame

A

cars = pd.DataFrame(dict)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index = row_labels`
41
Q

how to import csv into pandas

A
cars = pd.read_csv("cars.csv")
cars = pd.read_csv('cars.csv', index_col = 0)
42
Q

slicing pandas

A
# Print out country column as Pandas Series
print(cars['country'])
# Print out country column as Pandas DataFrame
print(cars[['country']])
# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])
43
Q

how to filter pandas

A
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Create car_maniac: observations that have a cars_per_cap over 500
---cpc = cars["cars_per_cap"]
---many_cars = cpc > 500
-----car_maniac = cars[many_cars]
one liners: 
car_maniac = cars[cars["cars_per_cap"]]
ANOTHER EXAMPLE
# Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]
44
Q

for loop with 2 variables, use….

A
enumerate 
# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]
# Change for loop to use enumerate()
for x,y in enumerate(areas) :
    print("room " + str(x) + ": " + str(y))
45
Q

how to iterate through numpy arrays

A

nditers ()

for val in np.nditer(my_array):

46
Q

how to iterate through pandas dataframes

A

for lab, row in cars.iterrows():
print (lab)

for lab, row in cars.iterrows() :
print(lab + “: “ + str(row[‘cars_per_cap’]))

47
Q

how to add a column in dataframe

A
# Code for loop that adds COUNTRY column
for lab, row in cars.iterrows():
    cars.loc[lab, "COUNTRY"] = row["country"].upper()
48
Q

apply function on dataframes

A

cars[“COUNTRY”] = cars[“country”].apply(str.upper)

49
Q

to get median, avg, low, high of a dataframe

A

df.describe()