UNIT 10 Flashcards
Suppose you have the following data and you want to also analyze gender and whether the patients ever smoked before the first encounter. What could you do to add this information such that the resulting dataset is a tidy dataset?
Create a second table with the variables: Patient ID, Age and Smoked
Using the tidy data set definition of a variable, what are the variables in this data set? (check all that apply)
Payer
Service
Age Group
Gender
Year
Cost
If you have the following two tables in two different Excel spreadsheets and you’d like to create a visualization in Tableau that uses data from both, what is the best way to do this?
Create a separate data connection within the same workbook to each spreadsheet and then blend both tables on Patient ID
What combination of the following variables form the observational units for the tidy data version of this dataset? (Check all that apply)
Payer
Service
Age Group
Year
If you have the following two tables in two different tabs of an Excel spreadsheet and you’d like to create a visualization in Tableau that uses data from both, what is the best way to do this?
Connect to the spreadsheet and then join both tables within the same data connection
Why is tidy data preferred?
Tidy data produces tables that are easier to analyze, because it makes variables, values, observations, and observational units explicit and well-organized for analysis
What is measured for each observational unit in the tidy data format for this dataset? (Check all that apply)
The following data set (see http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/Age-and-Gender.html (Links to an external site.)) shows personal healthcare expenditures (in millions of dollars) for several years by Payer, Service, Age Group, and Gender.
Cost
You have hospital data by month and year for Amount Billed and Amount Collected. Select all of the reasons why it may be best to keep Amount Billed and Amount Collected as separate variables (columns) instead of using a column called Status that contains the values “Billed” and “Collected” and a second column called “Amount” that contains either the amount billed or the amount collected.
The unit of observation is best thought of as the Year/month, with Amount Billed and Amount Collected the measures for each Year/month
Keeping them separate will allow you to easily calculate profit/loss as a new variable
Amount Billed and Amount Collected are not part of a part-to-whole relationship
Which of the following is an observation in the tidy data format of this dataset?
The following data set (see http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/Age-and-Gender.htmlLinks to an external site.) shows personal healthcare expenditures (in millions of dollars) for several years by Payer, Service, Age Group, and Gender.
Medicaid, Dental Services, 0-18, Males, 2002, 978
Match the data structure below to whether it is table deep, table wide, or messy. Each records heart rate (HR) for a patient on two different dates.
Table Wide
Table Deep
Messy
Suppose you have the following data in Excel:
and you would like to produce the following graph:
Use Measure Names and Measure values
A dashboard is a
visual
display of the most important
information
needed to achieve one or more objectives that has been consolidated on a
single
computer screen so it can be monitored at a glance.
Visual
Information
SIngle
What is used in a dashboard to efficiently and comprehensively communicate?
Text and graphics
Dashboards can be used for exploratory data analysis
False
If Total Cost is a discrete measure containing decimal numbers of the form 245.15 and it is on the Columns shelf, which visualization will you see by default?
B.
This is produced when Total Cost is a discrete measure. Since Total Cost is a measure, Tableau, by default, aggregates all of its values using Sum. Since Total Cost is discrete, Tableau creates a label showing that sum.