Course-5 Analyze data to Answer questions Flashcards
Analysis
The process used to make sense of the data collected
Goal of analysis
The goal of analysis is to identify trends and relationships within data so you can accurately answer the question you are asking.
The 4 phases of analysis
-Organize data
- Format and adjust data
-Get input from others
-Transform data
Sorting
when you arrange data in a meaningful order to make it easier to understand, analyze and visualize.
Filtering
is used when you are only interested in seeing data that meets a specific criteria, and hiding the rest.
Sort Sheet
All of the data in a spreadsheet is sorted by the ranking of a specific sorted column- data across rows is kept together.
Sort range
Nothing else on the spreadsheet is rearranged besides the specified cells in a column.
Customized sort order
When you sort data in a spreadsheet using multiple conditions.
Sort function syntax
=SORT(Range,sort_column,true or false) True ascending and false is descending
Incorrectly formatted data can:
1) Lead to mistakes
2) Take time to fix
3) Affect stakeholder’s decision-making
How does data validation work?
It allows you to control what can and can’t be entered in your worksheet.
Data validation
1) Allows dropdown lists with predetermined options
2) Create custom checkboxes
3) Protect structured data and formulas
Conditional formatting
A spreadsheet tool that changes how cells appear when values meet specific conditions.
Concatenate
A function that joins together two or more text strings
Text string
A group of characters within a cell, most often composed of letters
Openness ( Or open data)
Free access, usage, and sharing of data
if(end>start,end-start,1-start+end)
Best Practices for searching online
- Thinking skills
- Data analytics terms
- Basic knowledge of tools
Mental model
Your thought process and the way you approach a problem.
R
A programming language frequently used for statistical analysis, visualization, and other data analysis.
Aggregation
Collecting or gathering many separate pieces into a whole.
Data aggregation
The process of gathering data from multiple sources in order to combine it into a single summarized collection.
Data Aggregation example
-Puzzle pieces= Data
-Organization = Aggregation
-Pile of pieces= Summary
-Putting the pieces together= gaining insights.
Data Aggregation Benefits
- Identify trends
- Make comparisons
- Gain Insights
Data can also be aggregated over a given time period to provide statistics such as:
- Average
- Minimums
- Maximums
- Sums
Subquery
A query within another query
VLOOKUP( Vertical Lookup)
A function that searches for a certain value in a corresponding piece of information.
Value
A function that converts a text string that represents a number to a numerical value.
Remove duplicates
A tool that automatically searches for and eliminates duplicate entries from a spreadsheet.
=VLOOKUP Syntax
=VLOOKUP(103,A2:B26,2,False)
Recognize the limitations of VLOOKUP and fix some of the most common problems.
Troubleshooting Questions
-How should I prioritize these issues?
- In a single sentence, what’s the issue I am facing?
- What Resources can help me solve the problem?
-How can I stop this problem from happening in the future?
Absolute reference
A reference that is locked so that rows and columns won’t change when copied.
MATCH
A function used to locate the position of a specific lookup value.
MATCH
A function used to locate the position of a specific lookup value.
MATCH
A function used to locate the position of a specific lookup value.
Join
A SQL clause that is used to combine rows from two or more tables based on a related column.
Common Joins
- Inner
-Left
-Right
-Outer
Inner Join
A function that returns records with matching values in both tables.
Inner Join
A function that returns records with matching values in both tables.
Left Join
A function that will return all records from the left table and only the matching records from the right table.
Right Join
A function that will return all records from the right table and only the matching records from the left.
OUTER JOIN
A function that combines RIGHT and LEFT JOIN to return all matching records in both tables.
Count in spreadsheets
Can be used to count the total number of numerical values within a specific range in spreadsheets.
Count in SQL
A query that returns the number of rows in a specified range.
Count Distinct
A query that only returns the distinct values in a specified range.
Aliasing
When you temporarily name a table or column in your query to make it easier to read and write.
Subquery
A SQL query that is nested inside a larger query.
Having
Allows you to add a filter to your query instead of the underlying table that can only be used with aggregated functions.
CASE
Returns records with your conditions by allowing you to include if/ then statements in your query.
Data Calculations
-Formulas for basic calculations
- Conditional formulas that use the IF Function
-The SUMPRODUCT Function
- Pivot tables to organize calculations
- Queries and Calculations in SQL
- Temporary tables in SQL
COUNTIF Syntax
COUNTIF(range, “value”)
Summary Table
A table used to summarize statistical information about data.
SUMIF
A function that adds numeric data based on one condition
=SUMIF
(Range,criteria/condtion,[sum_range])
SUMPRODUCT
A function that multiplies arrays and returns the sum of those products.
SUMPRODUCT Syntax
=sumproduct(array1,[array2])
Array
A collection of values in cells
Profit margin
A percentage that indicate how many cents of profit has been generated for each dollar of sale
Pivot tables
Let you view data in multiple ways to find insights and trends.
Analysis steps
- Find out how much revenue was generated each year.
- Build a pivot table to show the revenue per year.
- Find the average revenue per movie.
- Check our findings for some possible trends.
Calculated field
A new field within a pivot table that carries out a certain calculations based on the values of other fields.
Operator
A symbol that names the type of operation or calculation to be performed in a formula.
The Four Basic arithmetic operators in Spreadsheet formulas
+ Addition
- Subtraction
* multiplication
/ division
Modulo
An Operator (%) that returns the remainder when one number is divided by another.
Underscores
Lines used to underline words and connect text characters.
GROUP BY
A command that groups rows that have the same values from a table into summary rows.
Extract Command
Lets us pull one part of a given date to use
Data Validation process
Checking and rechecking the quality of your data so that it is incomplete, accurate, secure and consistent.
Temporary Table
A database table that is created and exists temporarily on a database server.