CDW 100V Caboodle Flashcards
Schemas
agreed upon logical division of database objects in a database.
‘a collection of tables”
ETL order
Chronicles to clarity to caboodle (last)
dbo Schema
Used by computers:
Data dictionary, Slicerdicer (gets data from caboodle via dbo schema)
Full access schema
Used by people
Contains everything for report writers including the dbo schema. Should be your default schema for reporting!
Filtered access schema
Used by people- filtered by user access security
Version of full access that has all the same columns, but different rows. Used most often when you have several organizations that are in one instance of caboodle
Ex. community hospital that has smaller arms of hospitals, dont want to share all info to everyone
- would only give data for just their site.
SSIS
mechanism to move data during the ETL process
Why Caboodle
- Has both epic and non epic data
- data source for SLICER DA DICER
- enforced naming conventions (standardization)
- simplified report writing
Surrogate key
made up by ETL process
- used to identify something in a database
does NOT exist in source database
Does this bc there can be some external data that doesnt have any EPIC data so need surrogate keys to label all things that may be coming in
foreign key
lets you link out to other tables
“Look up columns” in EPIC
Chronicles identifier
look for columns that end in “-EpidID” or -EpicCsn
Fact tables
represent the occurrence of a specfic, measurable event
medicationdispense is an ex
fact tables join OUT
Star schema
Centered around events and thus around fact tables
Start query with central fact table and join OUT to peripheral tables
Ex: MedicationOrderFact
would have peripheral details like - when was the med dispensed TimeofdayDim
who was it dispensed by ProviderDim
who was the patient it was dispensed for etc (PatientDim)
Dim Table
Dimension table
contains one row for each entity within the set
Generally contain more data and fewer lookup columns
Fact Table
contains one row for each occurence of some significant, measured event
Encounterfact
MedicationorderFact
Bridge Table
used in caboodle to model many to many relationships
Caboodle
uses dimensional data model
Makes report writing easier, but bc of this tradeoff is ETL is more complicated to maintain
Granularity
Will tell you what youre looking at in each table
ex: an encounter
Chips
in data dictionary- individual attributes
Data dictionary Overview
description of the table
ER Diagram
entity relationship diagram
shows a table and the other tables that it joins to
caboodle tables organized in ER diagrams by its tables
only fact table and datamar tables. have ER diagrams
Reporting when you don’t know the table but know the subject
Go to galaxy and go to reporting with caboodle
Reporting when you have slicerdicer query
look through the slicer dicer data models
then go to troubleshoot (top right on screen)
then find the root table definition in the query and will show the table name
Can also go to record viewer to investigate
reporting when you don’t have much info
you can go to the data dictionary in anal cat and search by materfile and the code
Change tracking
feature in caboodle where we sometimes hold on to old data
- want to avoid them when doing joins
2 types
1. Snapshot change tracking (type 2)
2. None (type 1)
Snapshot change tracking
have extra rows to store old data, snapshots of entities in a time period
- can identify them in the data dictionary by the CHIP
Non snapshot change tables
if source data gets updated, then the data shown in the table gets updated as well (onedrive file)
snapshot tracking- shows old rows with old data
all items in the column will change in each row, not just the most recently changed stuff
snapshot column
retains old data in a snapshot table
Will have multiple rows and variable results in each row post change. Will not have all the same info like non snapshot would (Name change)
when a change in snapshot table took place
does not show when new data was input, just the ETL date
- need to check the DATA LINEAGE to determine the chronicles item or clarity table and column that store this info
Durable key
surrogate key in caboodle
identifies unique entities in a table
- ex. pts with the same name, would have unique ids to show theyre 2 diff people
only found in snapshot tables
snapshot tables
have both snapshot and non snapshot columns
- if its a snapshot column, new data will keep the old rows
- if its a non snapshot column, no new row added, all the data in that column gets updated
Referential Integrity
- Lookup columns (key columns) will always have a value (isnotnull)
- Lookup column values will always find a matching value in the destination table
Not enforced in filteredaccess schema
unspecified values
-1; expected, but null in the source = unspecified
-1 contains nulls ONLY in KEY COLUMNS
*Unspecified in Strings
“NULL” = date, integer columns
not applicable (NA) values
-2
ALSO ONLY IN KEY COLUMNS- IF COLUMN ENDS IN KEY
Inferred row
Created to maintain 2nd part of referential integrity- WHEN THERE IS A MISMATCH
- that a column will always find a match
Does this by creating an INFERRED ROW when needed ( when one table has more up to date info than the other)
Add 3 special rows to every table
Inferred rows are created when
a table with a LOOKUP COLUMN goes through ETL , not when the table itself goes through ETL
- some tables updated less often than nightly bc theyre not used as much
Deleted Values referential integrity
Only thing that will remain is the PRIMARY key
will show as -3 for lookup columns
*deleted (star deleted) for strings
metadata column _isdeleted column = binary 1
date/ time = “NULL”
Writing the query
- First, outline the tables that you will need that has all the columns needed
- can search data dictionary or anal cat to find the tables
- this is your select clause- “Table.column”
check to see and label which is a snapshot table vs non snapshot table - From _Table
- Join Clause
a. map out tables to find where you will join: “key to key”
b. see what tables are snapshot and what are non snapshot
- join durable key to durable key for snapshot tables
- typically the key table is the center of the star schema to link to other dim tables
c. Join all the above tables.columns in the select clause laid out - write in condition to iscurrent=1 to consider snapshot table data that may have duplicate older rows
b. could also do a where clause
“where
table.count = 1
- snapshot table that would have the rows = 1
count only exists in FACT tables, not dim tables
Bridge table
used to model (in joins) many to many relationships
ex pt and dx
Bridge table columns
2 columns
- <name> Combokey- lookup column for the fact
</name> - <name> key - lookup column for the dimension
</name>
Combo key
know youll need a bridge table when you see a combokey column
bridged to bridge tables
Data Mart
collection of data on a topic
- makes it easier to write reports
- designed for a specific reporting need
Should save time to write a report
group of data/topics together
SetDims
Tables with lists in them
almost like what we’re doing with groupers
Ex. Give me a list of all meds that put me to sleep
- tranqs
Ex. Give me a list of all outpatient departments