W4 Readings Flashcards
Validity and reliability of measurement instruments used in researched
Kimberlite article
Reliability
The obtained and observed score through a measuring Instrument is composed of both the true score which is unknown and error in the measurement process
True score
The score that a person would have received if the measurement were perfectly accurate
Pre-testing or pilot testing
An instrument allows for the identification of such scores refinement of the instrument than focusses on minimizing measurement error
Reliability estimates are used to evaluate
1
1.The stability of measures administered at different times to the same individual
2The equivalent of sets of items from the same test (internal consistency) Or of different observers scoring a behaviour or event using the same Instrument (Interrater reliability)
Reliability coefficient range from…..
0-1
Higher coefficient means higher levels of reliability
Stability
Is determined by administering the test at two different points in time to the same individual and determining the correlation or strength of Association of the two sets of scores
Internal consistency
Gives an estimate of the equivalent of sets of items from the same test
Ex,A set of questions aimed at assessing quality of life or disease severity
The coefficient of internal consistency provides an estimate of the reliability of measurement and is based on the assumption that items measuring the same contract should correlate
What is the widely used method for estimating internal consistency reliability
Cronbatch alpha
What is cronbatch’s alpha¿
What is it used for?
What makes cronbatch Alpha high? Major gains?
A function of the average intercorrelations of items and the number of items in the scale
Used for summmated scales such as quality of life Instruments
All things being equal, the greater the number of items in the summated scale
Additional items up to approximately 10, when the increase in reliability for each additional item levels off. This is one reason why the use of a single item to measure a construct is not optimal
Having multiple items to measure a construct aids in the determination of the reliability of measurement and in general improves the reliability or precision of the measurement
Interrater reliability /Intero observer agreement
Establishes the equivalent of readings obtained with an instrument when used by different observers
If a measurement process involves judgements or ratings by observers, a reliable measurement will require consistency between different raters
Requires completely independent ratings of the same event by more than one rater
No discussion or collaboration can occur when reliability is being tested
Reliability is determined by the correlation of the scores from two or more independent raters Or the coefficient of agreement of the judgements of the Raters 
For categorical variables, _______ _____ is commonly used to determine the coefficient of agreement.
______ Is used when two raters or observes Classify events or observations into categories based on rating criteria

For categorical variables, Cohens Kappa is commonly used to determine the coefficient of agreement.
Kappa Is used when two raters or observes Classify events or observations into categories based on rating criteria
Rather than a simple percent agreement, KAPPA takes into account the agreement that could be excepted by chance alone
Interrater reliability should be establish when datas are abstracted from medical charts or when diagnosis or assessments are made for research purposes
It also depends on…
Developing precise operational definitions of variables being measured as well as having observers well trained to use the instrument
it is optimized when criteria are explicit and raters are trained to apply the criteria
Raters must be trained how to make a decision
I that and event has occurred or how to determine which point on the scale measuring strength or degree a phenomenon should be applied.
The more that individual judgement is involved in the reading, the more crucial it is the independent observers agreed when applying the scoring criteria.
Training should include multiple cases such as
They respond to simulated situation they will encounter and rate, interrater reliability is calculated disagreements or clarified and a criterion level of agreement is Met
Interrater reliability you should be verified
Throughout the study even when established observational instruments are being used or criteria or explicit research That relies on observations or judgements should check reliability and the study protocol should include procedures to determine the level of observer agreement
I percentage of observation such as number of charts reviewed his randomly selected for scoring by two independent readers rather than requiring that to raiders judge all observation
Data to establish the consistency…
With which the primary rater applies the criteria overtime are important for establishing the reliability of an instrument
Rater drift can occur when 
An individual rater alters the way he or she applies the scoring criteria such as becoming more lenient or stringent over time 
Investigators who build and reliability checks throughout the study is data or collected rather than waiting until the end of data collection can identify instances where into regular reliability has become to deteriorate perhaps to the 
Rater drift
Validity
The extent to which an instrument measures what purports to measure
Validity requires
That’s an instrument is reliable but an instrument can be reliable without being valid
For example the scale that is incorrectly calibrated may healed exactly the same but in accurate weight values
Multiple choice test intended to evaluate the counselling skills of pharmacy students May yield reliable scores but it may actually Evaluate drug knowledge rather than the ability to communicate effectively with patients and making a recommendation
Validity is not a property of the test itself instead Validity is…
The extent to which the interpretation of the results of a test are warranted which depend on the test intended use such as measurement of the underlying construct
Much of the research conducted in healthcare involves quantifying attributes that cannot be measured directly. Instead hypothetical or abstract concepts (________)
Constructs Such as severity of disease, drug efficiency, drug safety, burden of illness, patient satisfaction, health literacy, quality of life, quality of provider patient communication, and adherence to medical regimes Are measured 
Hypothetical constructs cannot be
Measured directly and can only be inferred from observations of specified behaviours or phenomenon that are thought to be indicators of the presence of the contract
Measurement of a construct requires that the…
A conceptual definition will be translated into an operational definition
An operational definition of a construct…
Links to conceptual or theoretical definition to more concrete indicators that have numbers applied to signify the amount of the construct
Example, efficiency of new drug product. The ability to improve a patient’s health may be measured by the decrease of certain symptoms, the delay in onset of certain diseases, length of remissions, or the prevention of certain clinical complication. Like why is the theoretical construct of medication adherence may be operational define as a one month recording of number of miss those as measured by medication event monitoring system Which include micro processors That record occurrence and time of each opening up a prescription file
an operational definition of patient satisfaction with healthcare might be
Patient self reported responses to items on the 18th item short form version of the patient statisfication questionnaire
Crocker and Algina Have pointed to the importance of a theoretical foundation by noting
That Constructs cannot be defined only in terms of operational definitions but also have demonstrated relationships to other contstucts or observe able phenomenal
Construct validity
A judgement based on the accumulation of evidence from numerous studies using a specific measuring instrument
Evaluation Of construct validity requires examining the relationship of the measures being evaluated with the variables known to be related or theoretically related to the construct measured by the instrument
For example a measure of quality of life would be expected to result in lower scores for chronically ill patients then for healthy college students. Correlations to fit the expected pattern contribute evidence of Construct validity.
All evidence of validity including content and criterion related validity, contributes  to the evidence Of construct validity
Content validity
Addresses how well the items developed to operationalize a construct provide an adequate and representative sample of all the items that might Measure the construct of interest
Because there is no statistical test to determine whether a measure adequately covers the content area or adequately represents a construct….
Content validity usually depends on the judgement of experts in the field
Criterion related validity
Provides evidence about how well scores on the new measure correlate with other measures of the same construct or very similar underlying constructs that theoretically should be related
Crucial that these criterion measures are valid themselves 
With one type of criterion related validity ________ ________ The criterion measurement is obtained at sometime after the administration of the test, and the ability of the test to accurately predict the criterion is evaluated
What is an example of this?
Predictive validly
For example surrogate Out come such as blood pressure and cholesterol levels are based on their predictive validity In projecting the risk of cardiovascular disease, even though some of these associations have been recently question
Another type of criterion related validity is_________,Scores on an instrument or correlated with scores on another criterion measure of the same construct were highly related construct that is measured concurrently in the same subjects. Ideally, the criterion measure would be considered to be the gold standard measure of the construct. The strategy of determining the validity of a measure might be seen in a situation in which a new instrument has some advantage over the gold standard measure. Disadvantages would justify the time and effort involved in the development and validation of a new instrument.
Example, a clinical research are wanting to use a brief screening instrument for condition, such as depression, instead of administering a more expensive measure.
Cost of administration of the best criterion measures they also have a barrier 
-concurrent validity
Responsiveness
The ability of a measure to detectChange over time in the construct of interest
For outcome measures intended to evaluate the effects of medical or educational interventions, Responsiveness two changes that result from the intervention is required
What is a crucial component of responsiveness?
Reliability. The noise that is due to measurement Error mask changes that may in fact be attributed to the intervention
For example, using a scale manufactured to weigh trucks will not be helpful when evaluating a new weight loss drug In humans because the estimates will be too imprecise to identify small changes. The measurement will be valid yet unreliable or imprecise
Responsiveness to change can legitimately different from one population to another, which is why the measure must be appropriate to the subjects being studied. Give an example of this
A measure of activities of daily living that includes the ability to dress or wash oneself may be responsive to change among an elderly population of patients undergoing physical therapy or cardiac rehabilitation. However, it would probably not be sensitive to change due to a ceiling affect among oh younger group of newly diagnosed hypertensive patients who have not experienced significant disability due to disease or to the ageing process
Selecting an Existing instrument 
Before developing a new test or measure an investigator should identify existing instruments that measure the construct of interest. Using an existing instrument that has Substantial evidence of reliability and validity in a variety of populations is more cost effective than starting from scratch to develop and validate an instrument.
In selecting an instrument the following questions should be addressed:
1.Do instruments already exist that measure a construct the same or very similar to the one you wish to measure?Before you begin searching for instruments, you must have a clearly defined construct or concept that you wish to measure, along with operational definitions and some evidence that the construct can be measured as defined
2. How well do the constructs in the instruments you have identified match the construct you have conceptually defined in your study? And evaluating whether there is congruence, do not rely on the title of the measure or on the operational definition of the construct that appears in a research article or the description of variables in a secondary data base.Real understanding of the measure usually requires an examination of the actual items or questions and the way dad were generated or documented.
3.Is the evidence of reliability and validity well established? Has a measurement evaluated using various types of reliability instruments such as both internal consistency and test retest and varied strategies for establishing validity such as content and concurrent validity as well as more extensive evidence of construct validity in very population? Has it been validated in a population similar to the one you will be studying?
4.In previous research, was there variability In scores with no floor or ceiling affect? The previous studies have a large amount of missing data, either on the measure itself or on items within the measure?
5.If the measure is to be used to evaluate health outcomes, effects of intervention or changes over time are there Studies that establish the instruments responsiveness to change in the construct of interest? Obviously it is important that change in measurement we do the change in the construct rather than the instability of scores such as lacking reliability of the measure itself. In addition it would be helpful if there were data on how much change in scores would be required to be considered clinically meaningful.
7.How expensive is it to use the instrument? Email questionnaire cost less and require less time to administer them do telephone or face-to-face interviews.However, electronic data may not contain information that is available on patient charts so a good understanding of the limitation to data available as well as requirements of measurements for your study is important
8.If the instrument is administered by an interviewer Or if the measures require uses of judges or experts,How much expertise or specific training is required to administer the instrument?
9.Will the instrument be acceptable to subjects? Does the test require invasive procedures? Is the reading level appropriate? Is the respondents burden, included complexity of questions and time needed to complete the instrument, unlikely to affect response rates of the quality of responses?
Reliability and validity evidence from establish instruments is a applicable only…
If you use the instrument in the same form and follow the same administration procedures has used in the validation study.Modifications of validate instrumentsMay require permission from developers and also require validating the modified instrument as if it were a new instrument
Researchers may be tempted to conclude that..
They must develop their own instruments.
They may view the measures they want to develop as being so straightforward such as a few questions measuring patient knowledge or specific item from a medical chart, that they do not need to conduct a pilot Test to determine reliability and validity.
Researchers me then go to considerable After collecting data only to find at the end of the study that subject to not very much and their responses to the instrument or the documentation in the charts was in adequate, so the measure was not able to correlate with any other variable of interest
Subjects may miss interpret questions.
Responses may be highly skilled
Internal consistency maybe so low the item responses cannot Reasonably be combined into a single summmated score.
And other studies,
Researchers may obtain biased results by incorrectly assuming the diagnostic codes are valid without determining the relationship to other measures that should indicate the presence of the disease. Assuming medical records adequately capture the information needed to construct a measure in the chart review were’s will interpret information uniformly can also threaten the validity of findings.
Careful attention to the development of instruments regardless of how straightforward to measure may seem, along with pilots testing to determine the reliability and validity is crucial to conduct of quality research
Item response theory
Provide an alternate framework for understanding measurement and alternative strategies for judging the quality of a measuring instruments.
Building item pools and developing questionnaires that measure key health outcomes related to many chronic diseasesIncluding measures such as fatigue and pain. These items will be available to investigators, and the Repository will become a resource for accurate and efficient measurements of patient reported symptoms and other health outcomes in clinician practises
Measurements using self-report
With surveys, researchers rely on responses to questions to provide measurements of the constructs of interest. While self reports of behaviours beliefs and attitudes are prone to know when biases, they are no acceptable alternative means of measurement for many constructs such as level of pain depression patients abdication with care quality of life
Self-report questions May elicit An estimation of behavioural frequency rather than the recall and count response desired by the researcher.It is the use of estimation rather than recall which is a function of how information is retrieved from memory, how frequency response scales are formulatedAnd other specific aspects of instrument.
Behaviours that occur with high frequency such as dietary intake or taking a scheduled medication for a chronic condition or not likely to be a specific in memory for a very long period of time. If it is desired that specific events be recalled rather than estimated, the timeframe must be a very short duration in the immediate past
Therefore ask impatiens how many doses of medication they missed in the past month or past year will likely result in an estimate or educated guess where as a questionable the past 24 hours or past three days may reflect actual recall
Response choices require subject pride their own judgement about frequency using undefined response alternatives such as an ordinal scale from seldom to frequency
When I ask you questions or frequency of behavior, it is usually best to let the subject fill in the blank on an item with a clearly defined reference period
Ex,How many doses of specific medication have you missed taking completely in the past three days?Open format requires a specific description of the behaviour interest as well as a specific time frame

Use of self-report or poorly designed measures can result in…
Misclassification bias which is error in classifying either exposure status or effect (e.g disease ) in patients or subjects
Patient recall of Previous drug exposure, for example, has been shown to be subject to error
In case - control studies, Recall biases of concern when there are no objective markers of exposure. Individuals with the disease or outcome of interest are more likely to remember relevant exposures than are healthy controls.
One approach that is recommended to address this recall bias is to have a control groupAffected by a disease different from that of cases to introduce a similar biased toward recall of exposure
Use a secondary data
What does this include?
Data originally gathered for a different purpose are often used to answer a research question
These data may have addressed a different research question Or may have been gathered for clinical, billing, or legal purpose
This includes Pharmacy records, electronic or paper medical records, patient registries, and insurance claims data.
What is the first consideration when deciding whether secondary data can be used?
Is to verify that the data set appropriately measures the variables require to answer the research question. If the data elements are not present, consideration can be given to weather appropriate proxy measures a variables of interest are available.
The use a proxy measures requires
Careful conceptual analysis OfHow closely the variables of interest and proxy measures are associated. For example it seems intuitive that the claim database could be used to identify all patients who suffered a stroke during a certain time. As long as they were eligible for benefits. However strokes may have been silent and required no medical intervention, patient may have died before medical care could be sought. Stroke may have been missed diagnosed, or certain medical services may not have been covered by the insurance company and us may not appear in the billing databases
Use of surrogate measures
Surrogate endpoint of clinical trials as a Laboratory measurement or a physical sign used as a substitute for a clinically meaningful and point that directly measures how patient feels, functions, or survived.
Changes induced by a therapy on a surrogate How come expected to reflect changes in clinically meaningful end point
Do use of surrogate outcomes to operationally defined a construct, such as drug efficacy Has become increasingly popular, as application of these measures is typically faster and less costly
Results are obtained after shorter follow up periods, and the number of patients and length of time patients have to participate experiments are reduced.
For a surrogate I’ll come to be valid, it should be in the direct pathophysiological pathway of a disease End it should be reasonable to expect that the pharmacologic action of the new drug is mediated through this pathway. If these two conditions are true, the drug affect on the surrogate Outcome can be Extrapolated toward true measures of morbidity or mortality. However even well-established surrogate outcomesHave recently been questioned
Surrogate outcomes Remain nothing more than substitutes and can only approximate the truth
In healthcare and social science research, many of the variables of interest and outcomes that are important are abstract concept known as _______ _________
Theoretical constructs
Using test or instruments that are valid and reliable to measure such constructs is a crucial component to research quality