The Data Analyst's Toolbox Flashcards
What are some common tools used for data analysis
- spreadsheets
- query Languages
- visualization tools
What is a spreadsheet?
Remember working in Excel
A digital worksheet that
* stores
* organizes
* sorts
data
What are some ways a spreadhseet helps a data analyst?
It helps them:
* see patterns
* group information
* create data visualizations
What are formulas in spreadsheets?
Mathematical equations/instructions used to perform specific calculations
What are functions in spreadsheets?
Preset/Predefined formulas that automatically perform a calculation/task
they cannot be internally edited / are fixed
eg SUM(argument1, argument2,…)
What is a query language?
It is a **computer prpgramming language ** that allows one to retrieve and manipulate data from a database.
Why is Structured Query Language a widely used language
It is easy to understand and works well with many kinds of databases.
What are some uses of query languages?
They allow analysts to select, retrieve, request for, create, add, delete and update data.
What are data visualizatioon tools used for?
Think of the Tableu Viz of the day
- turning complex numbers into a story
- help stakeholders come to conclusions leading to informed data driven decisions and strategies.
Give two example of data visualization tools.
- Tableu - has drag and drop features to create visuals
- Looker- communicates directly with database and connects data to visual tool.
Overall, what determined which tool to use in the data analysis process between databases and spreadsheets?
criteria based on numerous factors such as
* number of users,
* type of data
* availability
* analytical steps required
When is it ideal to use a database?
When dealing with large amounts of complex data with multiple dimensions.
What are some advantages of databases?
- Unlimited amount of data
- Ability to build relations and link data across tables
- Efficiently hosting multiple users
- Suitable when data is over 100 rows
- Information can automatically update itself across all tables
- Summary reports - since all tables are linked
What are some limitations of databases?
- Only give output in a standard table form (thus limited in visuals)
- Have limited calculation abilities.
What are some advantages of spreadhseets?
- Can perform complex calculations and analyses.
- Have in-built visualization tools
- Suitable for finance (straightforward number crunching) and data analysis
number crunching- like profit, total sales
What are some limitations of spreadsheets?
- Limited in relating and connecting data especially across tables and different files.
- One has to manually enter data
- Becomes challenging when data exceeds 100 rows.
Every programming language follows a predetermined set of guidelines and structure known as,
Syntax
What is the basic structure of an SQL Query?
SELECT (columns)
FROM (dataset.table)
WHERE (conditions);
it is more efficient to select only what you need.
semicolon (;) **statement terminator **
Write an SQL query selecting the columns that contain a customer’s first name, last name and ID from the customer name table in the customer data dataset. Furthermore, write one to retrieve customer data from a customers whose first name is Tony.
SELECT
first_name,
last_name,
customer_ID
FROM
customer_data.customer_name
WHERE
first_name= ‘Tony’;
Write an SQL query selecting the columns that contain a customer’s first name, last name and ID from the customer name table in the customer data dataset. Furthermore, write one to retrieve customer data from a customers whose first name is Tony, last name Magnolia and customer id is greater than 0.
SELECT
first_name,
last_name,
customer_ID
FROM
customer_data.customer_name
WHERE
first_name= ‘Tony’
AND last_name= ‘Magnolia’
AND customer_ID > 0;
customer_ID > 0 is a logical condition
True or False
Using indentation and capitalization in SQL helps one to review and troubleshoot their code more easily.
True
Write a WHERE command that looks for all names that start with ch
SELECT *
FROM customers.customers_name
WHERE first_name LIKE ‘ch%’;
% is a **wildcard **
To save on time when reviewing long, complex queries after a long time one should write what alongside their code?
Write an example comment on SELECT colomns
Comments
SELECT columns – this is customer name
SELECT columns /* */
True or False
Comments can be written outside a query statement and within
True
Write an SQL statement to assign a new alias to a column known as field 1 to first name.
(let it be known as)
field_1 AS first_name – Alias to assign new name
AS clause
True or False
Aliases do not inherently change the name and they only last within the query.
True
What does this command mean.
WHERE first_name <> ‘John’
where first name is not equal / not John.