SQL CLEANING Flashcards
How can we delete columns of a table? code
ALTER TABLE HouseProject..House
DROP COLUMN OwnerAddress, TaxDistrict, PropertyAddress;
How can we remove duplicate rows of a table? code
Using Window function row_number and delete as below:
WITH DuplicateTable AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY SalePrice,SaleDate,LegalReference,ParcelID,PropertyAddress ORDER BY UniqueID) AS duplicates
FROM HouseProject..House)
DELETE
FROM DuplicateTable
WHERE duplicates>1
How can we replace values in a column with certain other values? for example turning ‘Y’ to ‘Yes’ and ‘N’ to ‘No’
Using Case and Update
UPDATE HouseProject..House
SET SoldAsVacant= CASE SoldAsVacant
WHEN ‘Y’ THEN ‘Yes’
WHEN ‘N’ THEN ‘No’
ELSE SoldAsVacant
END
What are 2 ways of splitting a string in a column? How can we create new columns with the split values?
1- Using SUBSTRING and CHARINDEX for splitting and ALTER TABLE for adding a new column
SELECT
SUBSTRING(PropertyAddress,1,CHARINDEX(‘,’,PropertyAddress)-1) AS Address,
SUBSTRING(PropertyAddress,CHARINDEX(‘,’,PropertyAddress)+1,LEN(PropertyAddress)) AS City
FROM HouseProject..House
ALTER TABLE HouseProject..House
ADD Address Nvarchar(255)
UPDATE HouseProject..House
SET Address= SUBSTRING(PropertyAddress,1,CHARINDEX(‘,’,PropertyAddress)-1)
ALTER TABLE HouseProject..House
ADD City Nvarchar(255)
UPDATE HouseProject..House
SET City= SUBSTRING(PropertyAddress,CHARINDEX(‘,’,PropertyAddress)+1,LEN(PropertyAddress))
2-Using PARSENAME and ALTER TABLE
SELECT PARSENAME(REPLACE(OwnerAddress,’,’,’.’),3) AS OwnerSplitAddress,
PARSENAME(REPLACE(OwnerAddress,’,’,’.’),2)AS OwnerSplitCity,
PARSENAME(REPLACE(OwnerAddress,’,’,’.’),1)AS OwnerSplitState
FROM HouseProject..House
–Adding OwnerSplitAddress
ALTER TABLE HouseProject..House
ADD OwnerSplitAddress Nvarchar(255);
UPDATE HouseProject..House
SET OwnerSplitAddress= PARSENAME(REPLACE(OwnerAddress,’,’,’.’),3)
–Adding OwnerSplitCity
ALTER TABLE HouseProject..House
ADD OwnerSplitCity Nvarchar(255);
UPDATE HouseProject..House
SET OwnerSplitCity= PARSENAME(REPLACE(OwnerAddress,’,’,’.’),2)
–Adding OwnerSplitState
ALTER TABLE HouseProject..House
ADD OwnerSplitState Nvarchar(255);
UPDATE HouseProject..House
SET OwnerSplitState= PARSENAME(REPLACE(OwnerAddress,’,’,’.’),1)
Does PARSENAME function starts parsing from the end or the beginning?
it starts parsing from the end of str
What do we do when the delimiters are ‘,’ and we want to use PARSENAME?
Replace the ‘,’ with ‘.’ below example does this:
332 CHERRY ROAD, SACRAMENTO, CA
PARSENAME(REPLACE(OwnerAddress,’,’,’.’),3)
RESULT:
332 CHERRY ROAD
How can we fill the missing values in a column, using another column?
Using self join, coalesce and update as below:
in the dataset, parcelIDs are repeated, we can use the parcelIDs that have the value for PropertyAddress and self join, to fill the null values of other repititions of that certain ParcelID
SELECT h1.PropertyAddress,h1.ParcelID,h2.PropertyAddress,h2.ParcelID,COALESCE(h1.PropertyAddress,h2.PropertyAddress)
FROM HouseProject..House h1
JOIN HouseProject..House h2
ON h1.ParcelID=h2.ParcelID
AND h1.[UniqueID ] != h2.[UniqueID ]
AND h1.PropertyAddress IS NULL
AND h2.PropertyAddress IS NOT NULL
UPDATE h1
SET PropertyAddress=COALESCE(h1.PropertyAddress,h2.PropertyAddress)
FROM HouseProject..House h1
JOIN HouseProject..House h2
ON h1.ParcelID=h2.ParcelID
AND h1.[UniqueID ] != h2.[UniqueID ]
AND h1.PropertyAddress IS NULL
AND h2.PropertyAddress IS NOT NULL
How does coalesce work?
It either fills the missing values of a column with a fixed value.
Or fills the missing values using another column in that row
What is the difference between ALTER and UPDATE in SQL?
ALTER and UPDATE are the two modifying commands of SQL. ALTER is used to modify the structure of the relations (Tables) in the database (e.g. adding columns, changing data type of a column)
UPDATE Command is used to modify the data stored in a relation of the database (e.g. changing some values on a column)
How can we change data type of a column?
ALTER TABLE HouseProject..House
ALTER COLUMN SaleDate date;