Python for Data Science Flashcards

Question

What are methods?

Answer 1

You can think of methods as \_functions\_ that "belong to" Python objects. A Python object of type string has methods, such as capitalize and replace, but also objects of type float and list have specific methods depending on the type.

Answer 2

method associated.

Answer 3

depending on the type of the object, the methods behave differently. eg. index() exists for both strings and lists

Answer 4

the objects they are called on.

Answer 5

append() is a method, and therefore also a function. In Python, practically everything is an object. Every python object can have functions associated. These functions are also called methods.

Answer 6

x.capitalize() Use the dot notation to call a method on an object, x in this case. Make sure to include the parentheses at the end, even if you don't pass any additional arguments.

Answer 7

[4, 9, 5, 7, 6] If you call append() on a list, you're actually adding the element to the list you called append() on; there's no need for an explicit assignment (with the = sign) in this case.

Answer 8

You can think of package as a directory of python scripts. Each such script is a so-called module. These modules specify functions, methods and new Python types aimed at solving particular problems

Answer 9

To use Python packages, you'll first have to install them on your system, and then put code in your script to tell Python that you want to use these packages.

Answer 10

* data science: there's numpy (toefficiently work with arrays) * matplotlib for data visualization, * scikit-learn for machine learning

Answer 11

pip pip is a very commonly used tool to install and maintain Python packages.

Answer 12

The "import" statement is arguably the easiest way to import packages and modules into Python

Answer 13

foo.array([1, 2, 3]) If Numpy is imported as np, you need np.array().

Answer 14

* The from numpy import array version will make it less clear in the code that you're using Numpy'sarray() function. * Using import numpy will require you to use numpy.array(), making it clear that you're using a Numpy function. Importing a particular function makes your code shorter, because you don't need to include the numpy.prefix. However, It becomes less clear that array() is a function from the numpy package.

Answer 15

Numpy array is pretty similar to a regular Python list, but has one additional feature: you can perform calculations over all entire arrays. It's really easy, and super-fast as well.

Answer 16

NO. Numpy array can only contain values of a single type. It's either an array of floats, either an array of booleans, and so on.

Answer 17

To create a Numpy array, you use the array( ) function. You typically pass a regular Python list as an input.

Answer 18

* The Numpy Package provides the array, a data type that can be used to do element-wise calculations. * Because Numpy arrays can only hold element of a single type, calculations on Numpy arrays can be carried out way faster than regular Python lists. Creating a Numpy array is not necessarily easier, but it is a great solution if you want to carry out element-wise calculations, something that regular Python lists aren't capable of.

Answer 19

array([4, 4, 4]) In Numpy, calculations are performed element-wise. The first element of x and the first element of yare added, giving 4. Similar for the second and third element of x and y.

Answer 20

All array elements are converted to strings Numpy arrays can only hold elements with the same basic type. The string is the most 'general' and free form to store data, so all other data types are converted to strings.

Answer 21

N dimensional array

Answer 22

`shape` is a so-called attribute of the `np2d` array, that can give you more information about what the data structure looks like.

Answer 23

improved list of lists:

Answer 24

You can create a 2D Numpy array from a regular list of lists. Multi-dimensional Numpy arrays are natural extensions of the 1D Numpy array: They can only hold a single type and can be created from a regular Python list structure. The number N in these N-dimensional Numpy arrays is not limited.

Answer 25

x[1,2] Apart from element-wise calculations, 2D Numpy arrays also offer more advanced ways of subsetting compared to regular Python lists of lists. To select the second row, use the index 1 before the comma. To select the third column, use the index 2 after the comma.

Answer 26

array( [[0, 1, 2], [0, 0, 0]])

Answer 27

http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays

Answer 28

summarizing statistics

Answer 29

Numpy offers many functions to calculate basic statistics, such as np.mean(), np.median() andnp.std(). Both the mean and median are interesting statistics to check out before you start your analysis. Visual inspection of your data is practically infeasible if you're dealing with millions of data points.

Answer 30

Numpy is a great alternative to the regular Python list if you want to do Data Science in Python. Numpy arrays can only hold elements of the same basic type. Next to an efficient data structure, Numpy also offers tools to calculate summary statistics and to simulate statistical distributions. No matter the dimension of the Numpy array, element-wise calculations will always be possible.

Answer 31

np.mean(x[:,1]) :,1 inside square brackets tells Python to get all the rows, and the second column. You can then usenp.mean() to get the average of the resulting Numpy array.

Answer 32

It's always a good idea to check both the median and the mean, to get a first hunch for the overall distribution of the entire dataset.

Answer 33

extract insights and share with other people

Answer 34

matplotlib-.

Answer 35

A scatter plot is useful to see all the individual datapoints. Unlike in the line plot, these datapoints will not be connected by a line

Answer 36

Visualization is a very powerful tool for exploring your data and reporting results. Data visualization is useful in different stages of the data analysis pipeline. The type of visualization that is most appropriate depends on the problem at hand.

Answer 37

import matplotlib.pyplot as plt The general syntax is import package.subpackage as local\_name.

Answer 38

The first argument corresponds to the horizontal, x-axis. The second argument is mapped onto the vertical, y-axis.

Answer 39

Change plot() in plt.plot() to scatter() To create a scatter plot, you'll need plt.scatter().

Answer 40

When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you're trying to assess if there's a correlation between two variables, for example, the scatter plot is the better choice.

Answer 41

The histogram is a type of visualization that's particularly useful to explore your data set. It can help you to get an idea about the distribution

Answer 42

Histogram is a great tool for getting a first impression about the distribution of your data. Histogram is useful to display any distribution, and typically consist of non-overlapping bins. The matplotlib package contains functionality to build histograms very easily.

Answer 43

The range of your values is 20. Dividing these values into 5 equally sized bins will result in bins with width 4.

Answer 44

Add a second argument to plt.hist(): plt.hist(x, bins = 4) If you do not specify the number of bins the data has to be divided into, matplotlib chooses a suitable number of bins for you. Setting the number of bins is as simple as specifying the bins argument appropriately.

Answer 45

The number of bins is pretty important. Too little bins oversimplifies reality, which doesn't show you the details. Too much bins overcomplicates reality and doesn't give the bigger picture.

Answer 46

xlabel("x-axis title") and ylabel("y-axis title") To set the axis title, use the functions xlabel() and ylabel()

Answer 47

fill\_between()

Answer 48

Python doesn't throw an error, but you won't see your customizations. The show() function displays the plot you've built up until then. If the customizations come afterwards, there is no effect on the shown output. ## Footnote The show() function displays the plot you've built up until then. If the customizations done afterwards, there is no effect on the shown output. Therefore, you should place all customization commands between the plot() call and the show() call.

Answer 49

high If your control structures get more advanced, Python can take many different paths through your code. As soon as Python encounters a condition that is True (x \> 6 in this case), the corresponding code is executed and the control structure is abandoned. The elif and else parts are not considered anymore!

Answer 50

You typically don't build a pandas data frame manually. Instead, you import data from an external file that contains all this data

Answer 51

To access a column, you typically use square brackets with the column label.

Answer 52

You'll want to use `loc`. eg. bric.loc["BR']

Answer 53

In Pandas, different columns can contain different types. Both Pandas and Numpy offer many different ways of subsetting. 2D Numpy arrays can only contain values of the same basic type, a downside compared to Pandas if you're working on typical Data Science problems.

Answer 54

The rows correspond to observations. The columns correspond to variables.

Answer 55

read\_csv() is the function you need. You can specify a ton of other arguments to customize the way the data is imported.

Answer 56

loc . Square brackets are used to get specific columns from a Pandas DataFrame. iloc is used if you want to select a row based on its position in the DataFrame, and not based on its row label.

Answer 57

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

Python for Data Science Flashcards

(89 cards)