G4. Manipulating data Flashcards
Operations that can be applied on top of tabular data structures and what is the result type?
Projection
Selection (retrieving a subset of records)
Filter (retrieving a subset of records given a condition).
Result type: DataFrame
- Retrieving a subset of columns/attributes
Projection~read (muestra todo con lo que voy a trabajar)
Realizar una proyección para seleccionar solo las columnas ‘Nombre’ y ‘Edad’
proyeccion = df[[‘Nombre’, ‘Edad’]]
print(proyeccion)
Retrieving a subset of records
Selection. (muestra un rango en el que me interesa)
edu.loc[90:94][[‘TIME’,’GEO’]]
Selection=df[df[‘Nombre’]
Another way to select a subset of data is by applying Boolean indexing.
Filtering (Muestra lo que cumpla con una condición lógica)
edu[edu[‘Value’] > 6.5].tail()
filtered_data = df[df[‘column’] > value][[‘column1’, ‘column2’]]
Boolean indexes
Uses the result of a Boolean operation over the data, returning a mask with True or False for each row. The rows marked True in the mask will be selected.
(not a number) to represent missing values.
NaN
Give examples particularly the way null values can be filtered. How does this work in R?
edu[edu[“Value”].isnull()].head()
# R Filtra filas sin valores faltantes en una columna específica (por ejemplo, ‘columna1’)
new_data <- original_data[!is.na(original_data$columna1), ]
Which is the form of the expressions for adding columns to a DataFrame? and Rows?
assign a Seriesto a selection of a column that does not exist.
edu[‘ValueNorm’] = edu[‘Value’]/edu[‘Value’].max()
Which is the form of the expressions for adding rows to a DataFrame?
This function receives as argument the new row, which is represented as a dictionary where the keys are the name of the columns and the values are the associated value.
edu = edu.append({“TIME”: 2000, “Value”: 5.00, “GEO”: ‘a’},
ignore_index = True)
How can rows or columns be deleted?
Now, if we want to remove this column from the DataFrame, we can use the function drop. This removes the indicated rows if axis=0, or the indicated columns if axis=1.
Do these operators belong to the data definition or the data manipulation language?
Data manipulation
How can default values be added to attributes containing missing or null values?
fillna(), specifying which value has to be used.
Give an example of the use of the group() method applied on a DataFrame.
group = edu[[“GEO”, “Value”]].groupby(‘GEO’).mean()
group.head()
How are manipulation operators associated to DataFrames related and useful
for implementing Data Science processes?
They provide a powerful and flexible set of tools for data scientists to explore, clean, and analyze data efficiently.