Part III Flashcards

Question

What are levels of testing?

Answer 1

``` • Unit Test – white box – code coverage – developer driven, can be semi automated ``` • Integration Test – white box – interfaces/API’s – developer driven • System Test – Black box – Test against documented requirements (verification) ``` • Acceptance Test – Black box – Request for user sign off (validation) – Ex: “beta testing”, solicit input from a small group of users ```

Answer 2

Regression Testing • Regression is a new defect revealed with the addition of a new functionality • Ex: you add a new feature to a program, now an older feature stops working as intended that’s a regression • Regression testing: when you add new functionality, go back and repeat all prior tests, retest all previously reported and fixed defects

Answer 3

Extract sex ( =Male, or =female) • If sex=female, then – If age >65, and extract (history of hysterectomy, benign), output “HPV and cervical cancer screening not recommended”) –If age <21, output “HPV and cervical cancer screening not recommended” –If age >=21 and <=29, output “Liquid Pap screening is preferred every 3 years.” –If age >30 and <65, output “Liquid Pap screening is preferred and HPV for high risk screening” •Extract “Pap results” as == (defined as) normal or == abnormal, •Extract HPV results, as == normal or == abnormal, and –if Pap and/or HPV == abnormal, output “Consult MD Anderson guideline for abnormal Pap/HPV results” –If Pap results == normal, and HPV not performed, output “Repeat Pap screening and/or HPV screening every 3 years –If Pap and HPV normal, output “repeat every 5 years”

Answer 4

Any collection of related data (address book, spreadsheet, MS Access) • Database Management System (DBMS) – Allows users to interact with DB and maintain structure, integrity – Common features of DBMS • Define data types, structures, constraints • Construct data tables, store data on a storage medium • Manipulate data to create (insert), retrieve (read), update (edit), delete (sometimes abbreviated “CRUD”) • Share data via permissions, user access control; control concurrency • Protect against inappropriate access, hardware/software failure • Maintain & Optimize data structures

Answer 5

* Convenient, easy, ubiquitous * May require redundant data (eg: Louise Chen – have to remember to indicate in each cell that she is deceased) * Can’t represent 1-to-many relationships easily (Louise has 2 meds) * Limited ability to enforce data integrity (multiple spellings of “yes” and “no”) * Incomplete data represented as blank cells

Answer 6

* Defines association within and between relations (relation ~ table) * Each attribute (attribute ~ column) corresponds to a domain in the relation * Each tuple (tuple ~ row) describes an ordered list of elements, the order is important * Data elements (element ~ cell) have a data type that is consistent across that attribute. (VARCHAR, INT, DATE, LONG, etc) * Attributes can also have constraints (non NULL, auto-incrementing, cascading delete, Primary Key, Foreign Key) beyond the type constraint * Create and describe structure/constraints using “Data Definition Language” (DDL) which contains metadata (data about the data) * Further describe the data using a Data Dictionary (not just FK/PK, constraints, but also definitions of each field and its intended use)

Answer 7

The relation schema (table schema) is a description of the relation, its attributes, and the data types / rules associated with the relation. • A specific table that uses that schema is an instance of that schema • Adding new relations as easy as adding a table, add an attribute by adding a column • In very simple terms these make it easy to know “everything that has one attribute”

Answer 8

• Parallelism between OOP and RDBMS is very useful programmatically – Object-oriented Class  DB Relation – Instance  Tuple (a specific row) in a Relation (table) , where each row is a member of the class described by those attributes – Attribute Attribute (column), where the Value of that attribute is the element (content of the cell) – Method (accessors, mutators)  database manipulation (CRUD) functions • Many modern programming languages use Object-Relational Mapping (ORM) either built-in or available as an extension ``` – Each class is mapped to a table – Each attribute is mapped to a column – Each method (getters/setters) mapped to a “CRUD” function – Ex: Creating a new instance of a class automatically creates a row and populates data ```

Answer 9

In SQL: Create, Read, Update, Delete: | Insert, Select, Update, Delete

Answer 10

Only include that match both tables

Answer 11

Include all rows in the left table, display blanks from right

Answer 12

Include all rows in the right table, display blanks from left

Answer 13

Include all rows in both tables, blanks in both

Answer 14

Cartesian join - gives cross product of both tables

Answer 15

select * from medications where pat_id in (select pat_id from patients where pat_age between 4 and 5)

Answer 16

• Wildcards: “%” matches any length, “_” must match a single character where lastName like ‘Smith_’ …would match “Smithe”, “Smiths”, “Smithy” …would NOT match “Smith” or “Smithers” ``` • LIKE is case sensitive, so you may need to case-correct the string before matching – UPPER([char])  converts [char] to all upper-case – LOWER([char])  converts [char] to all lower-case • This expression: where lower(lastName) like ‘desa%’ …would match “Desai”, “DeSai”, “desai”, “DeSalles”, etc… ```

Answer 17

• Structurally different from RDBMS • Optimized for rapid transactions of hierarchical data • In very simple terms, makes it easy to know “every attribute about one thing” (quickly retrieve all known information about patient 1001) • Computationally easy to traverse the tree. Can only traverse tree from root (top parent) node • Ex: “find all deceased patients who were ordered topiramate” would be “easier” in relational DB than hierarchical DB • Child nodes can only have 1 parent – Difficult to model relationship between child nodes (many-to-many, recursive relationships)

Answer 18

• Described in 1969 by Greenes, Pappalardo, Marble, and Barnett • “MGH Utility Multi-Programming System” • Design goals – Flexible interface (e.g. lab systems, notes, variable output format) – Variable length text-handling – Hierarchical design to support complexity of clinical data and update/retrieval methods – Multi-user access (original paper recognized potential for conflicting updates, need to have ACID transactions) – Large storage capacity – Low CPU usage – A high-level programming language to make interface design less time-consuming, more efficient • MUMPS renamed “M” in 1993 by M Technology Association, recognized by ANSI in 1995 • MUMPS and its derivatives, such as Intersystems Caché, are among the most widely used transactional DBs for EHRs today • Design of MUMPS predated, anticipated the “NoSQL” and “schema-less DB” movement MUMPS is a hierarchical database

Answer 19

This programming snippet reads user input from teletype at the prompt “Unit No.” and assigns the value to variable X • Line 1.15 uses a ternary operator (IF-THEN-ELSE) to validate the format of the string X, in this case, that it’s the form of 3 digits, a dash, 2 digits, a dash, and two digits. • If the pattern does not match, it displays the phrase “ILLEGAL” and returns to 1.10

Answer 20

• “[H]ierachically organized, symbolically accessed” structure – KEY/VALUE database • Local variables are defined in the scope of the program • Global variables referenced by an up arrow symbol (later became a caret “^”) • This code retrieves a patient in the Active Patient Record (APR) global that matches a local variable “UN” (hospital unit number, or location of patient) and assigns the name and age: SET ^APR(UN, NAME)=“DOE, JOHN”, ^APR(UN, AGE)=“34” • This code traverses a patient’s record UNCHEMN (unit number, chemistry results, sodium), and assigns it a string value, concatenated from two local variables DATE and TEST: SET ^APR(UN,CHEM,N)=$DATE.”,”,TEST

Answer 21

Data represented as data objects • Support for more data types (graphics, photo, video, webpages) • Object DBs are usually integrated into programming language, so accessing data doesn’t require complex driver configuration • Increased use recently with development of web applications, most web application frameworks support interaction with OODBMS • Commercial example: Intersystems Caché – the OODBMS behind the Epic EHR

Answer 22

Standard toolset for describing aspects of databases, software, business processes • Class diagram to describe OO classes (name, hierarchies, attributes, methods) • Activity diagram ~ process flowchart, stepwise description of decisions, consequences, inputs, outputs

Answer 23

* Use Case diagram – describes actors, goals, dependencies * Entity-Relationship diagram – describes objects and their relationships * ER diagram can then be used to define RDBMS logical schema, which DB programmers can use to build physical schema of a DB

Answer 24

* Title of the class * Attributes with type (optional default values) * Methods with inputs or return types * Inheritance indicated with a solid arrow pointing to parent class “E-R” = Entity Relation Note relationship between Customer and Purchase Order: • A customer is an optional participant (the “O” symbol) • Only one customer can participate (the “|” symbol) • A customer can have multiple purchase orders (the threepronged arrow symbol) • Details the attributes of Customer and Purchase Order Based on the E-R Diagram, a developer can: • describe the logical schema for the database • create physical schema and DDL/SQL code to create tables • create object classes that map to database tables • map object classes to DB tables using an ORM tool

Answer 25

Reliable DB Transactions * Atomicity – transaction is indivisible, it either happens or it doesn’t, no possibility of a partial transaction (ex: a DB transaction that updates 2 cells – it either does both or neither) * Consistency – transaction meets all constraint rules (can’t add a DATE to an INT field, can’t have a non-unique PK) * Isolation – RBDMS must be able to sequence simultaneous transactions (ex: 2 transactions to update the same cell. Both must take place, but not at same time, or else you have a write-write failure) * Durability – system must be tolerant to failure (ex: RDBMS has queued 200 transactions in memory, and power fails. How do you know if all 200 transactions took place?)

Answer 26

• Techniques of structuring tables to reduce redundancy, dependency between tables • Consider the “Flat File” example from earlier – One of the “Medication” cells has 2 entries – This violates a rule known as 1st Normal Form (1NF) • Goals of Normalization – To free the collection of relations from undesirable insertion, update, and deletion dependencies – To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the lifespan of application programs – To make the relational model more informative to users – To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by

Answer 27

A DB is said to be in 1NF if it can meet the following conditions: – Each cell contains a single value (our “Flat File” example breaks this rule) – No duplicate rows (and therefore each row is unique, can be described by a unique key) 1NF is susceptible to certain INSERT, DELETE, and UPDATE anomalies: – INSERT: A patient can’t have a Pharmacy without a prescription, unless we create a row with NULL values – DELETE: If the med “acetaminophen” is deleted, the Pharmacy “Glenbrook” ceases to exist – UPDATE: If the “Medstar” pharmacy chain changes their name, we have to edit multiple cells in this table

Answer 28

A DB is said to be in 2NF if it can meet the following conditions: – Table meets all criteria for 1NF AND – Every non-key attribute is dependent on the Primary Key (In 1NF example, there was no PK) 2NF is susceptible to certain INSERT, DELETE, and UPDATE anomalies: – INSERT: We can’t indicate a patient’s pharmacy unless there is a medication prescribed – DELETE: If you delete the last row that contains med “9906761”, you no longer know if it’s on formulary – UPDATE: to update the formulary status for a Medication_ID may require updating multiple rows

Answer 29

• A DB is said to be in 3NF if it can meet the following conditions: – Table meets all criteria for 1NF AND 2NF AND – No non-key column determines another non-key column (this is a very simplified definition) • Even 3NF is susceptible to certain INSERT, DELETE, and UPDATE anomalies! • Example: What if there was a registration error and patient 1001 and patient 1003 are actually the same patient? How can you avoid changing multiple cells in the first table? Patient

Answer 30

* There are higher forms of normalization beyond 3NF, like “Boyce-Codd Normal Form” (abbreviated “BCNF”) * In practice, a database that is in 3NF can be called “normalized” * Normalized DBs are safe against most INSERT, UPDATE, and DELETE anomalies, however, to generate a report, you have to “denormalize” the data – requires lots of PK & FK “JOIN” logic in your query * For high-performance RDBMS apps, denormalized schema may be preferable to allow single-table lookup functions with an index, to avoid additional JOINs and full-table scans * Also, you need to denormalize data to aggregate the data into meaningful groups (eg: all meds for patient X) * That is often the role of reporting tools, analytics, data marts, etc

Answer 31

* Extract/Transform/Load (ETL) process gets transactional data into a format that is optimal for reporting / queries * Real life example: Epic EHR runs on Intersystems Caché Object DB for transactional processing. Has a nightly process to push data into an RBDMS (e.g. Oracle SQL) * Datamart is a smaller collection of related tables and data derived from the warehouse for a specific purpose, usually for analysis, report generation, spreadsheet, dashboards, etc. * Example: EHR may have a real-time transactional DB, nightly dump to a SQL data warehouse, and weekly extracts to a datamart to generate an updated enterprise asthma performance dashboard.

Answer 32

Ring, mesh, star, fully connected, line, tree, bus * Patterns of links between elements of a computer network * Choice of topology determines fault tolerance, redundancy, and scalability * Phone telephony is an example of a point-to-point connection, so is connection between your CPU and the hard-drive * Centralized – Star, Tree * Decentralized – Mesh, Fully Connected

Answer 33

“Handshake” • Acknowledgement • Payload • Multiple protocols exist for multiple purposes • Distinguish the Network Protocol from the Data or Encoding Standard – Ex: HL7 v2.x message is a pipe-delimited text file, v3 is an XML file, both can be transmitted via TCP/IP across a network

Answer 34

• Network protocol used for internet communications • TCP = transmission control protocol • IP = internet protocol • UDP (user datagram protocol) is an alternative to TCP – Key differences • TCP requires acknowledgement, UDP does not • TCP guarantees sequence/order of packets, UDP does not – TCP used where packet loss is unacceptable (will resend until acknowledgement or timeout) – UDP used where packet loss is less important (Voice over IP aka “VoIP” or streaming protocols)

Answer 35

• Host – Data [transmission of message via HTTP, SMPT, FTP] 7. Application (HTTP is an application layer protocol) 6. Presentation / Syntax (e.g. character encoding in ASCII or data encryption) 5. Session (ex: a web conference may have a persistent session to synch audio and video) – Segments 4. Transport [transmission of segments via TCP, UDP] • Media – TCP Packet / UDP Datagram 3. Network [transmission of packets via IP, DNS server, through routers] – Frame 2. Data Link [transmission of frames via Ethernet or PPP] – Bit 1. Physical [transmission of binary bits via copper wire, coaxial, or fiber optic cable]

Answer 36

7: Application POP3/IMAP4 for email, HTTP for web content, FTP for file transfer, SSH and HTTPS for secure browsing 6: Presentation Encryption, decryption, conversion to character sets (like ASCII) 5: Session LDAP “Lightweight Directory Access Protocol” for authenticating users against X.500 directories 4: Transport TCP “transmission control protocol” (with acknowledgement), UDP “User Datagram Protocol” (without acknowledgement) 3: Network IPv4, IPv6, DHCP “dynamic host control protocol” used to assign IP addresses to hosts, for example, when you connect to a wireless hotspot 2: Data Link ARP – “address resolution protocol” used by TCP to communicate with hosts when only neighboring hosts’ addresses are known 1: Physical none

Answer 37

7: Application Layer – I send a letter to Bob. Bob receives and opens the letter. • 6: Presentation Layer – my letter is encoded in roman script using the English language • 5: Session Layer – I may include one or more letters per session (envelope) as long as I have appropriate postage • 4: Transport Layer – There is a typo in the address. The postal service marks it “recipient unknown” and sends it back. I get details of failure or confirmation of success (ex: registered mail) • 3: Network Layer – physical mail is sent by plane between cities. Pilot has no awareness of the the final destination of the letter she is carrying • 2: Data Link Layer – Postal Service worker drives truck within city to deliver message • 1: Physical Layer – my letter is comprised of ink or graphite on a piece of paper, folded and tucked into an envelope

Answer 38

Telephony has changed rapidly in past decade • Popularization of mobile, wifi, and VOIP • Video conferencing – Web conferencing / collaboration via H.264 Scalable Video Coding (SVC) • This is the the same codec*, known as MPEG-4, used for distribution of video content on IP, like YouTube *CODEC = coder / decoder – a compression algorithm used for a digital stream to transmit audio, video, etc. They can be “lossy” or “lossless”. Lower bitrate often means lower fidelity.

Answer 39

Short Range PAN - Personal Area Network – RFID (one way) and NFC (two way) – IEEE 802.15 – Wireless Personal Area Network and derivatives (Bluetooth & Infrared Data Association or IrDA) Historically maintained by IEEE as 802.15.1, but now maintained by Bluetooth SIG • Bluetooth 4.0 standard introduced Bluetooth Low Energy (aka Bluetooth Smart or Bluetooth LE) • Bluetooth LE has recently become very popular in health and fitness – Healthcare-specific profiles for blood pressure, thermometer, glucose monitor, continuous glucometry – Fitness-specific profiles for weight scale, running/cycling speed, heart rate, etc.

Answer 40

Medium Range WLAN – Wireless Local Area Network – 802.11b – Max 11Mbps, interferes with other 2.4Ghz devices like microwaves, Bluetooth, cordless phones. Popularized WiFi – 802.11g – Max 54 Mbps, same band as 802.11b, same interference concern. – 802.11n – uses both 2.4Ghz and 5 Ghz spectrum for max speeds of 54 Mbps and 600 Mbps respectively. Speed enhanced by MIMO (Multiple Input, Multiple Output) – 802.11ac – emerging draft standard for “gigabit wifi” – 1 Gbps

Answer 41

``` Alphabet Soup & Confusing Marketing Alert! – WiMax – CDMA – 3G – 4G – LTE ```

Answer 42

* IP Telephony (“Vocera” devices in healthcare) * SMS text messaging * Various “secure” texting solutions (HIPAA compliant) * RFID/NFC tagging of medical devices, patients

Answer 43

RFID = Radio Frequency Identification – 3 “flavors”: passive, active, battery-assisted passive – Passive relies on power from the reader, but reader has to emit 1000x stronger signal – Tags are read-only or read-write – RFID reader sends a signal to interrogate tag – RFID tag/chip responds with ID and other info – Like tags, readers can be active or passive – Uses: animal tags, “Smart cards”, asset tracking

Answer 44

NFC = Near Field Communication – Used to establish communication between two electronic devices – NFC tags passively store data, some can be written to by an NFC device. – Typical uses = phone-enabled payment (credit card information), PIN storage

Answer 45

Firewall: A set of hardware components (router, hosts, and combinations) and networks with appropriate software to restrict network traffic to conform to the security policy of the site. [Zwicky 2000]

Answer 46

Virtual private network: A VPN is a virtual network, built on top of existing physical networks, that can provide a secure communications mechanism for data and other information transmitted between networks. Because a VPN can be used over existing networks, such as the Internet, it can facilitate the secure transfer of sensitive data across public networks. [NIST, 2008]

Answer 47

Encryption: A method of converting an original message of regular text into encoded text. The text is encrypted by means of an algorithm (type of formula). If information is encrypted, there would be a low probability that anyone other than the receiving party who has the key to the code or access to another confidential process would be able to decrypt (translate) the text and convert it into plain, comprehensible text.

Answer 48

* Data Encryption Standard (DES), Triple DES (3DES), Advanced Encryption Standard (AES) * Secure Hash Algorithm (SHA-1, SHA-256), Message-Digest Algorithm (MD5), Hash Message Authentication Code (HMAC) * WiFi – Wired Equivalent Privacy (WEP) – Wi-Fi Protected Access (WPA) – Wi-Fi Protected Access 2 (WPA2) * Public key encryption

Answer 49

Covered entities must: 1. Ensure the confidentiality, integrity, and availability of all e-PHI they create, receive, maintain or transmit; 2. Identify and protect against reasonably anticipated threats to the security or integrity of the information; 3. Protect against reasonably anticipated, impermissible uses or disclosures; and 4. Ensure compliance by their workforce.

Answer 50

The Security Rule defines “confidentiality” to mean that e-PHI is not available or disclosed to unauthorized persons. The Security Rule's confidentiality requirements support the Privacy Rule's prohibitions against improper uses and disclosures of PHI. The Security rule also promotes the two additional goals of maintaining the integrity and availability of e-PHI. Under the Security Rule, “integrity” means that e-PHI is not altered or destroyed in an unauthorized manner. “Availability” means that e-PHI is accessible and usable on demand by an authorized person.

Answer 51

``` Requirements Expanded to Business Associates • Provisions Include Data Restrictions, Disclosure and Reporting Requirements – Limited Data Sets – Restrictions on Disclosures – Marketing – Reporting Security Breaches – Accounting of Disclosures – Charitable Fundraising – Sales of Protected Health Information • Enforcement ```

Answer 52

* A breach is the unauthorized acquisition, access, use or disclosure of unsecured PHI which compromises the privacy, security or integrity of the PHI. Unsecured PHI is defined as PHI not secured through technology or method specified by the Secretary through guidance. * Must notify individuals within 60 days of discovery * Must resort to public media notification if > 500 records * Must notify the Secretary without reasonable delay of breaches > 500 records * Must provide Secretary annual report of all breaches

Answer 53

ARRA: “Conduct or review a security risk analysis per 45 CFR 164.308 (a)(1) and implement security updates as necessary and correct identified security deficiencies as part of its risk management process.”

Answer 54

* HIPAA Privacy and Security rules now apply to Business Associates * Civil and criminal penalties also apply * Business Associate Agreements may need to be updated * BAA must demonstrate documented policies and procedures * BA must notify covered entity and Secretary of HHS of breaches

Answer 55

* Audit trails * Encryption * VPN * Software discipline * System assessment * Individual (strong) authentication of users * Firewalls

Answer 56

Authentication is any process of verifying the identity of an entity that is the source of a request or response for information in a computing environment. It is the linchpin for making decisions about appropriate access to health care information, just as it is for controlling legal and financial transactions. Generally, authentication is based on one or more of four criteria: 1. Something that you have (e.g., a lock key, a card, or a token of some sort); 2. Something that you know (e.g., your mother’s maiden name, a password, or a personal ID number); 3. Something related to who you are (e.g., your signature, your fingerprint, your retinal or iris pattern, your voiceprint, or your DNA sequence); or 4. Something indicating where you are located (e.g., a terminal connected to a hardwired line, a phone number used in a callback scheme, or a network address). Authentication = who you are. Authorization = What you can do

Answer 57

* Layers of protection * Dynamic, moves with changes * Comprehensive * Commensurate with asset classification, value-adjusted, cost-effective * Consistent with institutional mission and operation * Clear, assigned responsibilities * Metrics - Defense in-depth

Answer 58

* There are NO perfectly secure information systems * We have to identify risks specific to an asset based upon possible threats, and then * Implement and modify security controls to reduce risks, so that * Residual risks are at an acceptable level. * Threats may become security incidents, which lead to sanctions and modified security controls Acknowledge: security controls and ease of access often work against each other

Answer 59

``` Decentralized management Immutable Audit Trail Data Provenance Robustness/Availability Security/Privacy ```

Answer 60

* Types – Administrative, Physical, Technical * Administrative examples – Acceptable Internet Use policy – Password management policy – Use & protection of SSN in clinical research data – Business associate agreement, contracts * Physical examples – Badge based entry into sensitive areas – Cameras, RFID based protection in Nursery – Dual lock system for access to pathogens, access to animal labs – Essential data center and data closet security

Answer 61

``` • Network – Firewalls – Intrusion detection and prevention systems (IDPS) – Network access control (NAC) – Virtual private networks (VPN) – Data leakage protection (DLP) ``` • Systems, applications – Authentication, Authorization, Audit logs (Security Event/Incident Management) – Patching, up-to-date rules in Anti virus/spyware – Host based Firewalls, IDPS, DLP – Encryption, encryption, encryption

Answer 62

``` TECHNICAL: • Individual authentication of users • Access controls • Audit trails • Physical security and database recovery • Protection of remote access points • Protection of external electronic communications • Software discipline • System assessment ``` ``` ORGANIZATIONAL • Security and confidentiality policies • Security and confidentiality committees • Information security officers • Education and training programs • Sanctions • Improved authorization forms • Patient access to audit logs ```

Answer 63

Physical Intrusion, fire, power, seismic protection Network Firewalls, WEP (wired equivalent privacy) Social Phishing, malware, spoofing Software Design, updates, authentication Data Backup, restore, redundancy

Answer 64

Malware Malicious software • Spoofing attack One ne person or program successfully masquerades as another by falsifying data, thereby gaining an illegitimate advantage. * Phishing A deception perpetrated via email where recipients are enticed into following an attacker's instructions. Following the instructions may take the reader to malicious sites crafted to impersonate valid ones and steal credentials * Denial of service Attempts to prevent legitimate users from accessing information or services. * Ransomware A type of malware that infects computer systems, restricting users’ access to the infected systems. Users are told that unless a ransom is paid, access will not be restored.

Answer 65

* Methods – Self-assessment of asset owners – Assessed by internal group * Security, Risk management, Internal audit – Assessed by external group * Vulnerability scanners, ethical “white-hat” hackers, external auditors * Measurement – Qualitative – High, medium, low – Quantitative – a derived numeric score * Management – Risk acceptance – Risk mitigation – Risk transference * Examples * DMCA violation by students and staff * VVIP access * Unencrypted PHI on a desktop * Malicious user prints identity for identity theft * Breach notification process * Office of Civil Rights (OCR) Audit preparation • Risk management portfolio • Awareness education • Senior management support

Answer 66

Data integrity can refer to many things, such as – Data provenance – where does your data come from? – Data completeness and correctness – must insure for re-use • Mapping – to and from controlled terminologies, as discussed in 3D1-3 • Manipulation – retrieval from databases, as discussed in 3A1-2 – Querying – SQL – Reporting

Answer 67

* Narrative – recording by clinician * Numerical measurements – blood pressure, temperature, lab values * Coded data – selection from a controlled terminology system * Textual data – other results reported as text * Recorded signals – EKG, EEG * Pictures – radiographs, photographs, and other images * Metadata – information about the data, that give context and detail, e.g., electronic header information in notes

Answer 68

* Historically performed by a Clinical Coding Specialist (CCS) – Major purpose has historically been for reimbursement (Scott, 2008) * A core issue in biomedical informatics has been how to generate and use coded data for other purposes * Trade-offs – Standardization of language vs. freedom of expression – Time to narrate vs. code * Other difficulties – Creating and maintaining coding systems – Structuring coding systems to capture meaning

Answer 69

* General categories of data entry – Free-form entry by historical methods * Writing • Dictation • Typing – Structured (menu-driven) data entry by mouse, typing, or (in past) pen – Speech recognition for either of above – “Scribes” – people who enter data for physicians (Baugh, 2012) Structured or menu-driven data entry • Many attempts from old (Greenes, 1970; Cimino, 1987; Bell, 1994) to new (Oceania; OpenSDE – Los, 2005) • Can be done via mouse or pen, with typing • Benefits – Data codified for easier retrieval and analysis – Reduces ambiguity if language used consistently • Drawbacks – In general, more time-consuming – Requires exhaustive vocabulary – Requires dedication to use by clinicians

Answer 70

• Most common use is for narration – e.g., computer dictation of clinical notes • Continuous speech recognition now commercial reality • Many established systems on the market that operate on front end (used by clinician) or back end (process dictations) (Brown, 2008) – An advantage to front-end systems is instant availability of dictated content – Problem with back-end systems is editing task transferred from professional transcriptionist to clinician-author

Answer 71

* Modern speech recognition systems are improved but still have challenges – Systems have output lag behind user input, which can be distracting – Require area with minimum of background noise and where patient privacy can be protected * Most common types of errors systems make include (Zafar, 2004) – Enunciation errors from mispronunciation – Dictionary errors from missing terms – Suffix errors from misrecognition of appropriate tenses of a word – Added or deleted words – Homonym errors from substitution of phonetically identical words * Recent systematic review of research studies (Johnson, 2014) found – Productivity – report turnaround time faster – Quality – human transcription slightly more accurate, varies by setting and system – System design – macros and templates improve turnaround time, accuracy, and completeness

Answer 72

More limited form of EHR – Can be separate from EHR or extract of data from it (Dreyer, 2009; Hersh, 2011) • Typically oriented to one or small number of diseases, most often chronic diseases • Usual functions – Patient reports – status of monitored conditions – Exception reports – outliers, overdue for care – Aggregate reports – how is care team delivering recommended care

Answer 73

• Data mining (also called Knowledge Discovery in Databases, or KDD) is process of discovering patterns (or knowledge) in large databases (Bellazzi, 2008) • Related (and now more commonly used) term is analytics (Hersh, 2014) – Defined as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions” (Davenport, 2007)

Answer 74

Prescriptive: Optimization Predictive: Predictive modelling, forecasting, simulation Descriptive: alerts, query/drill-down, ad hoc reporting, standard reporting

Answer 75

Big Data - 4Vs: volume, velocity, variety, variability Machine learning - area of computer science focused on systems / algorithms that learn from data - Supervised: use training data - Unsupervised: no training data Text mining - applying data mining to unstructured textual data

Answer 76

Data provenance – origin and trustworthiness (Buneman, 2010)

Answer 77

Business intelligence – use of data to obtain timely, valuable insights into business and clinical data (Adams, 2011)

Answer 78

Precision medicine (IOM, 2011; Collins, 2015; Ashley, 2015) – “prevention and treatment strategies that take individual variability into account” (Collins, 2015)

Answer 79

– Data integration is merging of data from different sources into a single integrated whole – Data interfacing is how data must be transformed to allow real or virtual integration • Dealing with multiple identifiers • Anonymization of data

Answer 80

* 87% of US population uniquely identified by five-digit zip code, gender, and date of birth (Sweeney, 2002) * One analysis identified Governor William Weld of Massachusetts in health insurance database for state employees by purchasing voter registration for Cambridge, MA for $20 and linking zip code, gender, and date of birth to “de-identified” medical database (Sweeney, 1997) * Genomic data can aid re-identification in clinical research studies (Malin, 2005; Lumley, 2010) * Social security numbers can be predicted from public data (Acquisti, 2009) * Dates and results in lab data facilitate re-identification (Cimino, 2012)

Answer 81

• Definition of the Discipline – Study of people and computers – Discipline combines computer science, behavioral science, psychology, design, human factors analysis, and more • Applicability to Healthcare – Increasing awareness that HCI limitations responsible for medical errors – Cognitive overload, “alert fatigue” – One of the “grand challenges” facing Clinical Decision Support

Answer 82

1. Use Error Root Cause 2. Define risk parameters (freq, severe, detect, complex...) 3. Evaluate indicators 4. Adverse events

Answer 83

* Wrong Patient – user has 2 charts open, enters orders on the wrong one * Wrong Mode for Action – user tries to enter 100mg (direct dose) but accidentally enters 100mg/kg (weight-based dose) * Inaccurate Data Display – lab value is truncated in a report, causing user to come to incorrect conclusion * Incomplete Data Display – summary view of vital signs only shows last value per shift, user overlooks the max value within that shift * Non-standard measurement, convention, or term – weight based medications calculated using metric units, but order entry screen shows weight in English units * Reliance on user recall – vaccine administration documentation screen requires a lot number. Lot number is visible on previous screen, but not on this one. Users mis-type info or type in nonsense data as a result * Inadequate feedback –User attempts to order med requiring 0.1cc precision and submits incorrect dose because system silently rounds dose to nearest 0.5cc, calculation not transparent to user * Corrupted Data Storage – user enters orders in discharge workflow then clicks “next”, but orders are not submitted because user did not click “sign” before proceeding to next step.

Answer 84

Predictive Model for HCI – expresses user response time as a function of number of possible responses (n) – predicts user response to hierarchical menus, response to finding correct option among an unfamiliar list (like a non-QWERTY keyboard) – Law predicts: RT = a + b*log2(n)

Answer 85

HCI Predictive Model – time to task completion is the sum of the time spent key-stroking, pointing, homing, drawing, mental operator (thinking), system response operator (waiting). – Used to anticipate which functions are most amenable to shortcuts and “hotkeys”.

Answer 86

– Time it takes to track to an object with a cursor is a function of distance traveled (D) and width of the target (W). – Moving a cursor a large distance to hit a narrow target has a high Index of Difficulty (ID) ID = log2(2D/W)

Answer 87

* Exhaustive description of the states and transitions involved in using a mouse. * Three states: out of range, tracking, and dragging. To transition from “out of range” to “tracking” you lift or put down the mouse. To transition from “tracking” to “dragging”, you depress or release the button.

Answer 88

* Hands are not used equally. Each hand, because of single hand dominance, has distinct roles. * The model describes – for example - why left handed users are ill-served by modern 101-key keyboards * Some keys are unilateral and only available on the left side, like CTRL, ESC, TAB, FN; putting those who use the mouse on the left side at a disadvantage * Similarly, most “acknowledge” buttons on modal dialogues are on the bottom right – more mouse travel and possible error for lefties.

Answer 89

``` 1. Testing – Coaching – Thinking-aloud – Eye-tracking / Click tracking – Performance ``` 2. Inspection – Cognitive walkthrough – Heuristic evaluation ``` 3. Inquiry – Field Observation – Focus groups / Interviews – Surveys – Usage Logs ```

Answer 90

* Coaching method – user asked to perform a task, allowed to ask any questions they want to an expert coach. Coach keeps track of questions – useful in determining training and documentation needs. * Thinking-aloud – user attempts to complete a task and speaks aloud each step he/she is doing, along with articulating difficulties, confusion, realizations. * Eye-tracking / Click-tracking – electronic or manual measurement of use, can generate “heat map” of an interface • Performance measurement – Ideally, 5-8 users attempt to complete a specified set of tasks. Evaluator measures performance such as: – Task completion time and rate – Recovery rate (user makes a misstep but recovers) – Failure rate (user unable to complete task) – Frequently and never used features – For tasks with multiple ways of accomplishing the same thing, measuring which way users choose to do it.

Answer 91

• Cognitive walkthrough – using low-fidelity paper models or wireframes – Will the users try to achieve the right effect? Ex: system requires a weight before entering medication order. Will the user know to enter the weight? – Will the user notice that the correct action is available? Ex: Button to submit/sign order is in an inconspicuous location. Will user see it? – Will the user associate the correct action with the effect to be achieved? Ex: are the labels intuitive and easy to follow? – If the correct action is performed, will the user see that progress is being made toward solution of the task? Ex: does system give feedback that a step in task was completed. Heuristic Evaluation – design principle or “rule of thumb” used to critique interface. – Example heuristic for web design: Jakob Nielsen’s Heuristic List

Answer 92

Principles of User Interface Design 1. The system status should be visible. 2. There should be a match between the terminology and concepts used by the system and those in common use in the “real world”. 3. The system should give users control and freedom, with a clear way to undo, redo, or exit a task. 4. The system should be consistent and use standards where possible. 5. Built-in error prevention is better than a clever error message. 6. A user’s recognition of icons and pathways is stronger than their recall. By showing options that facilitate user action, one can avoid forcing users to memorize sequences of menus or keystrokes. 7. The system should support novice and expert users, with shortcuts and “accelerators”. 8. Dialogues should be sparingly written; design should be minimalist. 9. The system should help users to recognize, diagnose, and recover from errors. 10. The system should provide succinct and context-sensitive help and documentation.

Answer 93

Field observation Focus groups / interviews Surveys Review of usage logs

Answer 94

* Method of HCI Evaluation that does not require large number of personnel or budget, described by Jakob Nielsen. * Minimum 5 testers perform a modified version of testing and inspection – modified think-aloud, focus on qualitative aspects of interface – heuristic evaluation – low-fidelity prototypes to test one process at a time, rapid iterations

Answer 95

* Technique to design and gather feedback about an interface without actually having to code it * As in SDLC, identifying errors early is less costly * Can be paper based, designed as “wireframe” designs, or can even mimic full applications using drag-and-drop software (e.g. Balsamiq, Proto.io, atomic.io) * But there are advantages to keeping it “lo fi” * LOW FI vs HI FI

Answer 96

Lee Milligan: 1. Low-Fi / paper prototype 2. Higher fidelity prototype 3. Finished product No design standards

Answer 97

* Step 1 – EHR application analysis – Who are users? – What is their work environment? (lighting, noise, hardware) – What do they do? – What does the interface look like? – What mistakes might they make? – What evaluation has been done to mitigate mistake and improve usability? * Step 2 – EHR User Interface Expert Review – Two-person heuristic review – Clinical subject matter expert review for potential errors * Step 3 - User Interface Validation Test – Performance measurement (task completion and associated metrics) – Post-testing interview

Answer 98

* Emerging field within health informatics * Study of human processing mechanisms: how and why people make decisions • National Center for Cognitive Informatics & Decision Making in Healthcare – Located at UT Health Science Center at Houston, 9 partners – Site of a Strategic Health IT Advanced Research Project (SHARPC) – Led by Dr. Jiajie Zhang

Answer 99

Five Projects: 1. Project 1: Work-centered Design of Care Process Improvements in HIT A. EHR Usability B. EHR Workflow 2. Project 2 A. Cognitive Foundations for Decision Making: Implications for Decision Support B. Modeling of Setting-Specific Factors to Enhance CDS Adaptation 3. Project 3: Automated Model-based Clinical Summarization of Key Patient Data 4. Cognitive Information Design and Visualization: Enhancing Accessibility and Understanding of Patient Data

Answer 100

EHR Usability by NCCD - Task, User, Representation, Function - Main dimension of usability - Useful: supports the work domain - Usable: easy to learn, use, adapt - Satisfying: good subjective experience TURF Usability software

Part III Flashcards

(124 cards)