G4. Manipulating data Flashcards

Question 1

Q

Operations that can be applied on top of tabular data structures and what is the result type?

Answer

A

Projection
Selection (retrieving a subset of records)
Filter (retrieving a subset of records given a condition).
Result type: DataFrame

Question 2

Q

Retrieving a subset of columns/attributes

Answer

A

Projection~read (muestra todo con lo que voy a trabajar)

Realizar una proyección para seleccionar solo las columnas ‘Nombre’ y ‘Edad’
proyeccion = df[[‘Nombre’, ‘Edad’]]

print(proyeccion)

Question 3

Q

Retrieving a subset of records

Answer

A

Selection. (muestra un rango en el que me interesa)

edu.loc[90:94][[‘TIME’,’GEO’]]
Selection=df[df[‘Nombre’]

Question 4

Q

Another way to select a subset of data is by applying Boolean indexing.

Answer

A

Filtering (Muestra lo que cumpla con una condición lógica)

edu[edu[‘Value’] > 6.5].tail()
filtered_data = df[df[‘column’] > value][[‘column1’, ‘column2’]]

Question 5

Q

Boolean indexes

Answer

A

Uses the result of a Boolean operation over the data, returning a mask with True or False for each row. The rows marked True in the mask will be selected.

Question 6

Q

(not a number) to represent missing values.

Question 7

Q

Give examples particularly the way null values can be filtered. How does this work in R?

Answer

A

edu[edu[“Value”].isnull()].head()
# R Filtra filas sin valores faltantes en una columna específica (por ejemplo, ‘columna1’)
new_data <- original_data[!is.na(original_data$columna1), ]

Question 8

Q

Which is the form of the expressions for adding columns to a DataFrame? and Rows?

Answer

A

assign a Seriesto a selection of a column that does not exist.
edu[‘ValueNorm’] = edu[‘Value’]/edu[‘Value’].max()

Question 9

Q

Which is the form of the expressions for adding rows to a DataFrame?

Answer

A

This function receives as argument the new row, which is represented as a dictionary where the keys are the name of the columns and the values are the associated value.
edu = edu.append({“TIME”: 2000, “Value”: 5.00, “GEO”: ‘a’},
ignore_index = True)

Question 10

Q

How can rows or columns be deleted?

Answer

A

Now, if we want to remove this column from the DataFrame, we can use the function drop. This removes the indicated rows if axis=0, or the indicated columns if axis=1.

Question 11

Q

Do these operators belong to the data definition or the data manipulation language?

Answer

A

Data manipulation

Question 12

Q

How can default values be added to attributes containing missing or null values?

Answer

A

fillna(), specifying which value has to be used.

Question 13

Q

Give an example of the use of the group() method applied on a DataFrame.

Answer

A

group = edu[[“GEO”, “Value”]].groupby(‘GEO’).mean()
group.head()

Question 14

Q

How are manipulation operators associated to DataFrames related and useful
for implementing Data Science processes?

Answer

A

They provide a powerful and flexible set of tools for data scientists to explore, clean, and analyze data efficiently.