Part III Flashcards
What is a half adder?
XOR + AND; C - carry; S - sum
What is imperative and procedural programming?
Ex: FORTRAN, BASIC, Pascal, C, C++
• Programs are a list of tasks, subroutines
• Like a recipe: each step is a sequenced instruction
• Procedural languages allow programmer to define functions, subroutines (procedures) and reuse throughout the program
What is object oriented programming?
Ex: Java, Objective-C, VisualBasic.NET • Programs are a collection of interacting objects • Objects can have independent functions, characteristics, and states • A Class is a “blueprint” for an object – it describes the functions and characteristics that all objects in that class share in common – Ex: “the class of all dogs” • A specific member of a class is an Instance – Ex: “a beagle is an instance of a dog”
Describe OOP terminology
- Objects have Attributes (adjectives) and Methods (verbs)
- Instantiation involves creating a new object and setting initial parameters for the Attributes and Methods
- Ex: the class of “Dog” may have attributes for “fur color”, “name”, “breed” and methods for “bark” and “roll over”
What is encapsulation?
• Classes can demonstrate Encapsulation: can keep their attributes and methods private; control access to other parts of the program; either allow or restrict external invocation / modification
What is composition?
Composition
– Objects can be composed from smaller objects
– Ex: if you have a class that defines a “Point” as an X,Y coordinate, you can create a class for “Line” by reusing two “Point” objects
What is inheritance?
Inheritance
– Objects can inherit their structure from parent objects and extend their functionality
– Ex: Two classes for “Manager” and “Staff” may inherit the same basic structure of a parent class “Employee”.
What is polymorphism?
Polymorphism
– Objects can override their parent attributes/methods
– Details of the subclass implementation will determine which attribute/method is invoked
– Ex: class “Animal” may have a “makeNoise” method. SubClasses of “Animal” (Dog, Cat, Mouse) will all have a “makeNoise” method, but may implement it differently and can override the parent method
Compare accessors vs. mutators
Accessors (“getters”) are methods used to retrieve variable state
• Mutators (“setters”) are methods used to change variable state
What are types of qualitative data?
Nominal aka “Categorical”
– Think “data with names”
– Mutually exclusive, unordered, discrete categories of data, such as patient smoking status
– Mode (most frequently occurring value) is the only measure of central tendency • Ordinal
– Think “ordered data”
– Data that have a natural ordering
– Ex: “Small”, “Medium”, “Large”
– Ex: Asthma severity (“Intermittent”, “Mild Persistent”, “Moderate Persistent”, “Severe Persistent”)
– Ex: Likert scales of patient satisfaction, Pain scales
– Median (middle-ranked value) and Mode can be used to measure central tendency
What is quantitative data?
• Interval
– Data where the intervals between values represent the same distance
– Example: Year, Temperature. The difference between 32°F and 33°F is the same as the difference between 76°F and 77°F
– Arithmetic mean, median, and mode can be used as measures of central tendency
• Ratio
– Allows for additional comparison because “zero” means something and is not arbitrarily chosen
– Ex: Area, distance. Temperature in Kelvin is a ratio (because 300K has twice as much heat as 150K), but temperature in Fahrenheit is not (because zero Fahrenheit does not mean zero heat)
– Geometric mean (nth root of the product of n values), arithmetic mean, median and mode are allowed
Describe data transformations?
Interval Ratio – Your application stores a value for “year diagnosed with cancer” – that’s an Interval – In your software, you convert this to “years since diagnosis of cancer” – that’s a Ratio
- Interval Interval or Ratio Ratio – Example: Image processing, scaling an image: take every 2x2 cluster of pixels and calculate the average color value, assign it to a single 1x1 pixel.
- Interval Ordinal – Can group ranges of variable using “Binning” techniques, commonly used to smooth effects of minor observation errors – Take a continuous variable like “age in years”, group them into categories such a “Neonate”, “Infant”, “Toddler”, “Child”, “Adolescent”, “Adult” (implied order)
Be aware what you gain or lose when you transform data. When you compress image / audio files by downsampling, you can lose data When you bin continuous data into ordinal buckets, you lose data.
Describe trace tables
Slide 34
Describe a truth table
Slide 38
What is a hash function?
- Algorithm that maps data of arbitrary length to data of fixed length, an example of “binning”
- Returned value is known as “hash value”, “hash code”, “hash sum”, or “checksum”
- Uses in data integrity: – Checking that a file was not tampered with or downloaded incompletely (“MD5 Hash”) – Checking that a keystroke error did not occur (National Provider Identifier and LuhnAlgorithm, where last digit is a checksum)
• Uses in Cryptography – Websites should not store unencrypted passwords – Rule of thumb: if you click ‘forgot password’ on a website and they email you the password, they’re probably storing them
unencrypted, which is a no-no
• Uses in Indexing (as in databases) – Takes text strings and, via a hash, places them into “buckets” – Retrieving info is faster since program only has to look in that bucket
What is SDLC?
Software development life cycle
Need for standard approach to:
– Determine scope
– Organize programming tasks
– Determine testing requirements
– Manage resources & time commitments
– Deliver software on a reproducible schedule
What are the phases of SDLC?
Planning
–Gather requirements, determine scope and priority of work
•Software Implementation (what we often refer to as “Software Development or Design”)
–Actual development process
•Testing Verification vs. Validation
–Verification –“are you building the software right?” (does it meet specification, is the code high quality,
defect free, etc.)
–Validation –“are you building the right software?” (does it meet customer’s expectation, satisfy their needs,
do what it’s supposed to)
•Documentation
–critical for future enhancement, debugging, maintainability
•Deployment (what we often refer to as “Implementation”)
–Install, customize, train, evaluate
•Maintenance
–Process to collect new defects or enhancements, support users in ongoing fashion
What is Waterfall (SDLC)?
Move to next phase only when prior phase is done
•Option to revisit decisions, but usually go
through a formal change control process
•Highly structured, relatively inflexible
•Good for projects with stable requirements
•Advantages: defects are found sooner,
when they’re less costly to fix
•Disadvantage: a late breaking requirement
can be expensive or prohibitive
Steps Requirements Dev/Design Deploy/Implement Verification Maintenance
What is Spiral (SDLC)?
•“Risk oriented” software development
– One example of risk is poorly understood requirements
• Development broken into smaller efforts
• Each subproject designed to tackle an area of high risk
• Traditional phases
– Determine objectives, alternatives, constraints
– Identify and resolve risks
– Evaluate alternatives
– Develop deliverables for a given iteration and verify they are correct
– Plan the next iteration
– Commit to an approach for subsequent iteration
• Advantages: highest risk is tackled early on, when change is less expensive
What is Agile (SDLC)?
High focus on very small steps, frequent loops
• Extreme Programming (XP) and Scrum are two popular Agile variations
• Feedback provides regular testing and release of software
Scrum Terminology
– Product Owner person who represents client requirements. Writes user stories (brief statement that captures “who”, “what”, “why” of a simple requirement) and adds to backlog.
– Development team 3 9 programmers who are cross functional (can tradeoff coding, testing, documentation)
– Scrum Master rule enforcer and remover of impediments
– Sprint basic unit of development, predefined in duration and scope, chosen from list of backlog
– Daily Scrum what happened yesterday? What will happen today? Any obstacles?
•Advantage: Extremely flexible; assumes that clients will change requirements and is equipped to adapt to unpredicted challenges
What is system integration?
Process of linking subsystems in software architecture
– requires combination of software, hardware, interface sills
• Facilitated by strong understanding of standards
– HL7, Web Services, Networking
• Methods
– Vertical Integration
• Group systems by function into silos
• Integrate within a silo, but not necessarily between
– Star Integration
• Unique connection to each system that requires it
• High overhead for complex systems
– Horizontal Integration (via Enterprise Service Bus)
• One connection per system to ESB. ESB handles downstream connections
• Replacing a single system is much easier
What is quality assurance / testing?
- Goal: to mitigate project risk as early as possible, release good products
- Cost of fixing a defect is a function of project phase
Describe THERAC-25
- Radiation therapy machine
- Responsible for 6 accidents, 3 deaths due to massive radiation overdose from 1985 1987
- Attributed to failure to detect a “race condition” in software
- Poor design of error messages (“MALFUNCTION” followed by a number from 1 to 64)
- Personnel didn’t believe patient’s complaints
Describe software testing approaches
• Static testing reviewing code itself
• Dynamic testing apply test cases to code
• Test case development approaches
– White Box tests inner workings of a program (ex: was an order sent/received
correctly between systems)
• Code Coverage is an example of White Box testing programmer will develop a set of test conditions for all scenarios, all variables, all possible inputs with awareness of the internal design of the system.
– Black Box tests functionality from end user standpoint (ex: if user enters an
order, does EHR display the order is active and signed).
• Ex: specification testing, which may be insufficient to capture all defects
What are levels of testing?
• Unit Test – white box – code coverage – developer driven, can be semi automated
• Integration Test
– white box
– interfaces/API’s
– developer driven
• System Test
– Black box
– Test against documented
requirements (verification)
• Acceptance Test – Black box – Request for user sign off (validation) – Ex: “beta testing”, solicit input from a small group of users
What is Regression Testing?
Regression Testing
• Regression is a new defect revealed with the addition of a new functionality
• Ex: you add a new feature to a program, now an older feature stops
working as intended that’s a regression
• Regression testing: when you add new functionality, go back and repeat all prior tests, retest all previously reported and fixed defects
Design pseudo-code for HPV status:
1) Returns the correct recommendation based on patient age
2) Takes into account PAP status
3) Takes into account HPV status
Extract sex ( =Male, or =female)
• If sex=female, then
– If age >65, and extract (history of hysterectomy, benign), output “HPV and cervical cancer screening not
recommended”)
–If age <21, output “HPV and cervical cancer screening not recommended”
–If age >=21 and <=29, output “Liquid Pap screening is preferred every 3 years.”
–If age >30 and <65, output “Liquid Pap screening is preferred and HPV for high risk screening”
•Extract “Pap results” as == (defined as) normal or == abnormal,
•Extract HPV results, as == normal or == abnormal, and
–if Pap and/or HPV == abnormal, output “Consult MD Anderson guideline for abnormal Pap/HPV results”
–If Pap results == normal, and HPV not performed, output “Repeat Pap screening and/or HPV screening every 3 years
–If Pap and HPV normal, output “repeat every 5 years”
What is a database?
Any collection of related data (address book, spreadsheet, MS Access)
• Database Management System (DBMS)
– Allows users to interact with DB and maintain structure, integrity
– Common features of DBMS
• Define data types, structures, constraints
• Construct data tables, store data on a storage medium
• Manipulate data to create (insert), retrieve (read), update (edit), delete (sometimes abbreviated “CRUD”)
• Share data via permissions, user access control; control concurrency
• Protect against inappropriate access, hardware/software failure
• Maintain & Optimize data structures
What is a flat file?
- Convenient, easy, ubiquitous
- May require redundant data (eg: Louise Chen – have to remember to indicate in each cell that she is deceased)
- Can’t represent 1-to-many relationships easily (Louise has 2 meds)
- Limited ability to enforce data integrity (multiple spellings of “yes” and “no”)
- Incomplete data represented as blank cells
What is a relational database?
- Defines association within and between relations (relation ~ table)
- Each attribute (attribute ~ column) corresponds to a domain in the relation
- Each tuple (tuple ~ row) describes an ordered list of elements, the order is important
- Data elements (element ~ cell) have a data type that is consistent across that attribute. (VARCHAR, INT, DATE, LONG, etc)
- Attributes can also have constraints (non NULL, auto-incrementing, cascading delete, Primary Key, Foreign Key) beyond the type constraint
- Create and describe structure/constraints using “Data Definition Language” (DDL) which contains metadata (data about the data)
- Further describe the data using a Data Dictionary (not just FK/PK, constraints, but also definitions of each field and its intended use)
What is a relational schema?
The relation schema (table schema) is a description of the relation, its attributes, and the data types / rules associated with the relation.
• A specific table that uses that schema is an instance of that schema
• Adding new relations as easy as adding a table, add an attribute by adding a column • In very simple terms these make it easy to know “everything that has one attribute”
What is object-relational mapping (ORM)?
• Parallelism between OOP and RDBMS is very useful programmatically
– Object-oriented Class DB Relation
– Instance Tuple (a specific row) in a Relation (table) , where each row is a member of the class described by those attributes
– Attribute Attribute (column), where the Value of that attribute is the element (content of the cell)
– Method (accessors, mutators) database manipulation (CRUD) functions
• Many modern programming languages use Object-Relational Mapping (ORM) either built-in or available as an extension
– Each class is mapped to a table – Each attribute is mapped to a column – Each method (getters/setters) mapped to a “CRUD” function – Ex: Creating a new instance of a class automatically creates a row and populates data
What is CRUD?
In SQL: Create, Read, Update, Delete:
Insert, Select, Update, Delete
What is an inner join?
Only include that match both tables
What is a left outer join?
Include all rows in the left table, display blanks from right
What is a right outer join?
Include all rows in the right table, display blanks from left
What is a full outer join?
Include all rows in both tables, blanks in both
What is a cross join?
Cartesian join - gives cross product of both tables
Show an example of a nested SQL subquery
select * from medications where pat_id in (select pat_id from patients where pat_age between 4 and 5)
Describe parameters for the LIKE clause
• Wildcards: “%” matches any length, “” must match a single character where lastName like ‘Smith’
…would match “Smithe”, “Smiths”, “Smithy”
…would NOT match “Smith” or “Smithers”
• LIKE is case sensitive, so you may need to case-correct the string before matching – UPPER([char]) converts [char] to all upper-case – LOWER([char]) converts [char] to all lower-case • This expression: where lower(lastName) like ‘desa%’ …would match “Desai”, “DeSai”, “desai”, “DeSalles”, etc…
Hierarchical database
• Structurally different from RDBMS
• Optimized for rapid transactions of hierarchical data
• In very simple terms, makes it easy to know “every attribute about one thing” (quickly retrieve all known information about patient 1001)
• Computationally easy to traverse the tree. Can only traverse tree from root (top parent) node
• Ex: “find all deceased patients who were ordered topiramate” would be “easier” in relational DB than hierarchical DB
• Child nodes can only have 1 parent
– Difficult to model relationship between child nodes (many-to-many, recursive relationships)
What is the MUMPS system?
• Described in 1969 by Greenes, Pappalardo, Marble, and Barnett
• “MGH Utility Multi-Programming System”
• Design goals
– Flexible interface (e.g. lab systems, notes, variable output format)
– Variable length text-handling
– Hierarchical design to support complexity of clinical data and update/retrieval methods
– Multi-user access (original paper recognized potential for conflicting updates, need to have ACID transactions)
– Large storage capacity
– Low CPU usage
– A high-level programming language to make interface design less time-consuming, more efficient • MUMPS renamed “M” in 1993 by M Technology Association, recognized by ANSI in 1995
• MUMPS and its derivatives, such as Intersystems Caché, are among the most widely used transactional DBs for EHRs today
• Design of MUMPS predated, anticipated the “NoSQL” and “schema-less DB” movement
MUMPS is a hierarchical database
What are Mumps commands?
This programming snippet reads user input from teletype at the prompt “Unit No.” and assigns the value to variable X
• Line 1.15 uses a ternary operator (IF-THEN-ELSE) to validate the format of the string X, in this case, that it’s the form of 3 digits, a dash, 2 digits, a dash, and two digits.
• If the pattern does not match, it displays the phrase “ILLEGAL” and returns to 1.10
What are MUMPS global variables?
• “[H]ierachically organized, symbolically accessed” structure – KEY/VALUE database
• Local variables are defined in the scope of the program
• Global variables referenced by an up arrow symbol (later became a caret “^”)
• This code retrieves a patient in the Active Patient Record (APR) global that matches a local variable “UN” (hospital unit number, or location of patient) and assigns the name and age:
SET ^APR(UN, NAME)=“DOE, JOHN”, ^APR(UN, AGE)=“34”
• This code traverses a patient’s record UNCHEMN (unit number, chemistry results, sodium), and assigns it a string value, concatenated from two local variables DATE and TEST:
SET ^APR(UN,CHEM,N)=$DATE.”,”,TEST
What are object databases?
Data represented as data objects • Support for more data types (graphics, photo, video, webpages)
• Object DBs are usually integrated into programming language, so accessing data doesn’t require complex driver configuration
• Increased use recently with development of web applications, most web application frameworks support interaction with OODBMS
• Commercial example: Intersystems Caché – the OODBMS behind the Epic EHR
What is UML?
Standard toolset for describing aspects of databases, software, business processes
• Class diagram to describe OO classes (name, hierarchies, attributes, methods)
• Activity diagram ~ process flowchart, stepwise description of decisions, consequences, inputs, outputs
Compare use case and entity-relationship diagrams
- Use Case diagram – describes actors, goals, dependencies
- Entity-Relationship diagram – describes objects and their relationships
- ER diagram can then be used to define RDBMS logical schema, which DB programmers can use to build physical schema of a DB
Compare UML Class and E-R Diagrams
- Title of the class
- Attributes with type (optional default values)
- Methods with inputs or return types
- Inheritance indicated with a solid arrow pointing to parent class
“E-R” = Entity Relation Note relationship between Customer and Purchase Order:
• A customer is an optional participant (the “O” symbol)
• Only one customer can participate (the “|” symbol)
• A customer can have multiple purchase orders (the threepronged arrow symbol)
• Details the attributes of Customer and Purchase Order
Based on the E-R Diagram, a developer can:
• describe the logical schema for the database
• create physical schema and DDL/SQL code to create tables
• create object classes that map to database tables
• map object classes to DB tables using an ORM tool
What is the ACID test?
Reliable DB Transactions
- Atomicity – transaction is indivisible, it either happens or it doesn’t, no possibility of a partial transaction (ex: a DB transaction that updates 2 cells – it either does both or neither)
- Consistency – transaction meets all constraint rules (can’t add a DATE to an INT field, can’t have a non-unique PK)
- Isolation – RBDMS must be able to sequence simultaneous transactions (ex: 2 transactions to update the same cell. Both must take place, but not at same time, or else you have a write-write failure)
- Durability – system must be tolerant to failure (ex: RDBMS has queued 200 transactions in memory, and power fails. How do you know if all 200 transactions took place?)