Interview Questions Flashcards

1
Q

How do you select unique records from a table?

A

i. Using the DISTINCT clause, we can select unique records from a table

ii. If you specify one column, the DISTINCT clause will evaluate the uniqueness of rows based on the values of that single column.

iii. However, if you specify two or more columns, the DISTINCT clause will use the values of all columns to evaluate the uniqueness of the rows

iv. You can also use subqueries or window functions to get unique records from a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a Table or a View?

A

a. A table stores data (in memory) and can be modified, whereas a view does not store data and cannot be modified directly

b. The data in a view is derived from one or more tables and is only accessible through the SELECT statement that defines the view

c. A Table has its own data, which is stored on disk, while a view is simply a SELECT statement that is run every time the view is queried.

d. Views can provide a performance benefit by allowing you to create a simplified version of a table or by allowing you to join multiple tables together into a single virtual table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between an inner Join and Outer Join

A
  1. An Inner Join returns only the rows that have matching values in both tables. It only returns the rows where the join condition is true.
  2. A Left Join (Left Outer Join) returns all rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for the right table’s columns.
  3. A Right Join (Right Outer Join) returns all rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for the left table’s columns.
  4. A Full Join (Full Outer Join) returns all rows from both tables and the matched rows from both tables. If there is no match, NULL values are returned for the non-matching side’s columns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Group By work and what Aggreate Functions?

A

Group By is used to group rows in a result set based on one or more columns

Aggregate functions are functions that allow you to perform a calculation on a set of values and return a single value

When these two are combined you can run aggregate functions on grouped rows and perform calculations on all the rows in each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between WHERE and HAVING in a query?

A

The WHERE clause is used to filter rows before the group by operation, it filters rows individual records, which results in a smaller set of data.

WHERE is used to filter data based on specific conditions, such as column values, data ranges, and null values.

The HAVING clause is used to filter rows after the group by operation. It filters groups of data based on aggregate function values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a CASE statement and how do you use it?

A

The CASE Statement allows you to create a logic in the Select statement that performs different actions based on specified conditions

It is like an if/else statement in most programming languages

We can use it by saying CASE WHEN. This is where you specify your conditions.

If the conditions are met, we can perform an action by saying THEN and specifying what should happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a primary key and foreign key?

A

Both are types of constraints that are used to define the relationships between tables in a relational database.

A Primary Key is a unique identifier for each row in a table.

A Foreign Key is a column in a table that is used to establish a link between data in two tables.

Bonus: Foreign keys ensure referential integrity between tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the difference between a JOIN and a UNION?

A

A JOIN combines rows from two or more tables based on a related column between them. The related column is usually the primary key of one table, and a foreign key in the other table.

A UNION combines a result set of two or more queries (Select Statements).

Joining appends horizontally while using a Union appends vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you standardize data in SQL?

A

Standardizing data in SQL involves ensuring that data is consistent and conforms to a set of rules or standards.

This is usually done as part of the data cleaning process by removing inaccuracies, inconsistencies, or duplicate data from the data

An example would be if we had several different date formats in a single column. Standardizing the data would be changing them to all be the same format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you explain how the data collection process works?

A

The Data collection process will follow some combination of these steps:
Meet with Stakeholders and Data Owners

Connect to Source data and Extract the data

Transform the data based on business rules (data cleaning, standardization, etc)

Load data into a Staging database

Transfer to Production database

Bonus: Understanding the data and how it’s collected at the source is important to understanding the business rules you’ll create to transform the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Data Cleaning?

A

Data Cleaning is the process of identifying and removing errors, inaccuracies, and inconsistencies from the data, to make sure that the data is accurate, complete, and consistent.
Data cleaning can involve a variety of tasks, such as:

  1. Removing duplicate records
  2. Correcting inaccuracies or inconsistencies in the data
  3. Removing outliers or incorrect values
  4. Fixing or removing invalid or missing data
  5. Standardizing data (such as date or currency format)
  6. Identifying and removing any errors or inconsistencies in the data
  7. Merge similar data (merging employee names that are misspelled or inconsistent)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between how a query is written (syntax) and how it is executed?

A

The typical Query is written in this order:
SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT

How it executes on the backend is different, it will usually run in this order:
FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and LIMIT

The biggest difference is the SELECT Statement in the execution order is much lower. This is why we need a HAVING clause. If we use GROUP BY, we can’t filter on it yet because it hasn’t been grouped in the execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you handle NULL values in your data?

A

How you handle NULL values depends greatly on what you want to do with them and their intended purpose. Here are a few ways to handle NULL Values:

  1. You can simply filter out NULL values using the IS NULL or IS NOT NULL operators in the WHERE Statement
  2. Replace NULL values. Depending on the data it may make sense to replace data with “0”s or (Mean or Median) averages of the field
  3. Use functions to utilize NULL Values like COALESCE, NULLIF, ISNULL, and IFNULL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes a dataset “easy” or “difficult” to work with?

A

A dataset may be considered easy when the hard work is done upfront. It’s cleaned properly, data is standardized, columns are named appropriately, and others factors. These things make it easy to manipulate and drive insights.

A difficult dataset is the opposite. It’s not straightforward, relationships between tables have not been defined properly, it’s not fully cleaned, and it doesn’t have intuitive column names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do you know about query optimization?

A

Optimizing a query involves making changes to the query and the database to make it run faster and more efficiently.

Ways to optimize queries:
Use Indexes to provide faster access to the data

  1. Limit the number of subqueries and joins in a single query and use them wisely.
  2. Optimize datatypes to only store what you need to store in the most efficient way.
  3. Use LIMIT and don’t use SELECT * to reduce the volume in the output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly