Data Analyst Interview Prep Flashcards
data management techniques, SQL query optimization, and strategies for handling missing values and outliers. It covers data visualization using tools like Power BI and Tableau, as well as statistical analysis concepts such as correlation vs. causation and hypothesis testing. Additionally, the deck delves into MicroStrategy-specific knowledge, addressing security filters, report migration to cloud environments, and attribute roles.
How would you find duplicate records in a table?
Use a SQL query that groups by the key fields and counts the occurrences. For example:
SELECT column_name, COUNT() FROM table_name GROUP BY column_name HAVING COUNT() > 1;
Write a query to join two tables and filter results based on a specific condition.
Imagine you are tasked with combining the customer lists from two different branches of an alarm system company. How would you write a query to merge these lists and filter the results to only include customers with a premium alarm package?
SELECT a.column1, b.column2 FROM table_a a INNER JOIN table_b b ON a.id = b.id WHERE a.condition = ‘specific_condition’;
To merge the customer lists from two branches and filter for those with a premium alarm package, you could use a SQL query like this:
SELECT b1.customer_name, b2.branch_location
FROM branch1 b1
INNER JOIN branch2 b2 ON b1.customer_id = b2.customer_id
WHERE b1.package_type = ‘Premium’;
Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and how you would use these in SQL queries.
INNER JOIN returns only matching rows; LEFT JOIN returns all from left table and matched from right; RIGHT JOIN returns all from right and matched from left.
How do you optimize a slow-running query?
Analyze the execution plan, create indexes, avoid SELECT *, and limit dataset with WHERE conditions.
What steps would you take to clean a dataset with missing values?
Identify missing values, decide on a strategy (imputation or removal), and document the process.
How do you handle outliers in your data?
Use statistical methods to identify outliers and decide to remove, transform, or investigate.
Explain the process of normalizing and standardizing data.
Normalization rescales values to a range, while standardization rescales data to have mean 0 and std dev 1.
How do you use pivot tables for data analysis?
Pivot tables summarize data dynamically, allowing users to rearrange fields for quick comparisons.
How do you create and use calculated fields to derive new insights from existing data?
Create calculated fields in Microstrategy to derive metrics using existing fields.
Describe how you would create interactive dashboards in Power BI/Tableau.
Connect the data source, create visualizations, and use filters and slicers for interaction.
Explain the difference between correlation and causation.
Correlation indicates a relationship; causation implies one variable influences another.
How would you test a hypothesis using data?
Formulate null and alternative hypotheses, select a statistical test, and analyze the data.
What is the Central Limit Theorem, and why is it important?
The Central Limit Theorem states sample means approach a normal distribution as sample size increases, crucial for inferential statistics.
How would you approach analyzing sales data to identify trends and patterns?
Clean and prepare data, use visualizations to spot trends, and apply time series analysis for forecasting.
Imagine a scenario where a company’s profits are declining. What data would you look at, and how would you analyze it to find the root cause?
Analyze sales, customer feedback, and market trends; use regression analysis to find significant factors.