Data Science Interview Qs Flashcards
- Name a function which is most useful to convert a multidimensional array into a one-dimensional array. For this function will changing the output array affect the original array?
The flatten( ) can be used to convert a multidimensional array into a 1D array. If we modify the output array returned by flatten( ), it will not affect the original array because this function returns a copy of the original array.
- If there are two variables defined as ‘a = 3’ and ‘b = 4’, will ID() function return the same values for a and b?
The id() function in python returns the identity of an object, which is actually the memory address. Since, this identity is unique and constant for every object, it will not return same values for a and b.
- What is Beautiful soup library used for?
Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.
- In python, if we create two variables ‘mean = 7’ and ‘Mean = 7’ , will both of them be considered as equivalent?
Python is a case-sensitive language. It has the ability to distinguish uppercase or lowercase letters and hence these variables ‘mean = 7’ and ‘Mean = 7’ will not be considered as equivalent.
- What is the use of ‘inplace’ in pandas functions?
Inplace is a parameter available for a number of pandas functions. It impacts how the function executes. Using ‘inplace = True’, the original dataframe can be modified and it will return nothing. The default behaviour is ‘inplace = False’ which returns a copy of the dataframe, without affecting the original dataframe.
- How can you change the index of a dataframe in python?
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False) keys: label or array-like or list of labels/arrays This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.
- How would check a number is prime or not using Python?
taking input from user number = int(input(“Enter any number: “)) # prime number is always greater than 1 if number > 1: for i in range(2, number): if (number % i) == 0: print(number, “is not a prime number”) break else: print(number, “is a prime number”) # if the entered number is less than or equal to 1 # then it is not a prime number else: print(number, “is not a prime number”)
- What is the difference between univariate and bivariate analysis? What all different functions can be used in python?
Univariate analysis summarizes only one variable at a time while Bivariate analysis compares two variables. Below are a few functions which can be used in the univariate and bivariate analysis: 1. To find the population proportions with different types of blood disorders. df.Thal.value_counts() 2. To make a plot of the distribution : sns.distplot(df.Variable.dropna()) 3. Find the minimum, maximum, average, and standard deviation of data. There is a function called describe() which returns the minimum, maximum, mean etc. of the numerical variables of the data frame. 4. Find the mean of the Variable df.Variable.dropna().mean() 5. Boxplot to observe outliers sns.boxplot(x = ‘ ‘, y = ‘ ‘, hue = ‘ ‘, data=df) 6. Correlation plot: data.corr()
- What is the difference between ‘for’ loop and ‘while’ loop?
- ‘for’ loop is used to obtain a certain result. In a for loop, the number of iterations to be performed is already known. - In ‘while’ loop, the number of iterations is not known. Here, the statement runs until a specific condition is met and the assertion is proven untrue.
- Differentiate between Call by value and Call by reference.
In the Call by Value method, there is no modification in the original value. In the Call by Reference method, there is a modification in the original value. In the case of Call by Value, when we pass the value of the parameter during the calling of the function, it copies them to the function’s actual local argument.
- How will you import multiple excel sheets in a data frame?
The excel sheets can be read using ‘pd.read_excel()’ function into a dataframe and then using ‘pd.concat()’, concatenate all the excel sheets- Syntax: df = pd.concat(pd.read_excel(‘sheet_name’, sheet_name=None), ignore_index=True)
- What is the difference between ‘Append’ and ‘Extend’ function?
The append() method adds an item to the end of the list. The syntax of the append() method is: list.append(item) On the other hand, the extend method extends the list by adding each element from iterable. The syntax of the extend() method is: list.extend(item)
- What are the data types available in Python?
Python has the following standard data types: - Boolean - Set - Mapping Type: dictionary - Sequence Type: list, tuple, string - Numeric Type: complex, float, int.
- Can you write a function using python to impute outliers?
import numpy as np def remove Outliers(x, outlierConstant): a = np.array(x) upper_quartile = np.percentile(a, 75) lower_quartile = np.percentile(a, 25) IQR = (upper_quartile - lower_quartile) * outlierConstant quartileSet = (lower_quartile - IQR, upper_quartile + IQR) resultList = for y in a.tolist(): if y > = quartileSet[0] and y < = quartileSet[1]: resultList.append(y) return resultList
- Can any type of string be converted into an int, in Python?
Python offers the int() method that takes a String object as an argument and returns an integer. This can be done only when the value is either of numeric object or floating-point. But keep these special cases in mind - A floating-point (an integer with a fractional part) as an argument will return the float rounded down to the nearest whole integer.