14 - DEFINITION, TYPES OF ASSESSMENTS, AND QUALITIES OF A GOOD MEASURING INSTRUMENT Flashcards
- Determines the desired dimensions of defined characteristics
- Involves the quantitative amount of measurement using measuring scales: ruler for length, weighing scale for weight, thermometer for temperature
- Characteristics: reliability, validity, and objectivity
Measurement
- Measures the performance of an individual from a known objective or goal
- Gaining information on the students’ extent of learning and organizes the data gathered into interpretable forms of variables in numbers
- Includes tests, aptitude tests, inventories, and questionnaires
Assessment
- Gives value judgment to assessment through the qualitative measure of the prevailing situation
- Determines the appropriateness, worthiness, goodness, validity, and legality of something based on predetermined standards
Evaluation
- Does not express any clear assumption about a student
- Does not require much energy and time
- The scope is limited, only some dimensions of personality can be tested under measurement
- It is content-oriented
- It is a means and not an end in itself
- The purpose is to gather evidence
- It may not be an essential part of education
- Answers the question “how much” (measurement)
- Prediction cannot be made meaningfully based on measurement
- It acquaints with a situation. This is isolated from the entire environment
- It indicates those observations which are displayed numerically
- It can be conducted at any time
Measurement
- The clear assumption about a student can be formed
- Requires more energy and time
- The scope is wide; in it, the entire personality of a student is tested
- It is objective-oriented
- It is an end in itself
- It deduces inferences from the evidence, that is, its work is appraisement of evidence
- It is an integrated or necessary part of education
- Answers the question “what value”
- It can predict meaningfully
- It acquaints about the entire situation
- It comprises both quantitative and qualitative observations
- It is a continuous process
Evaluation
Assessments that complement each other to determine the totality of the students’ performances. Either of the two can be used for the improvement and enhancement of teaching.
Formal & Informal
- Data-based test
- Determines students’ proficiency or mastery of the content or knowledge testing
- Systematic
- Structured
- Used for comparison against a certain standard
- Mathematically computed and summarized (formal grading system)
- Assesses overall achievement
- Norm-referenced measure
- Quantitative
- Normal classroom environment
- Examples: exams, diagnostic tests, achievement tests, aptitude tests, intelligence tests, and quizzes
Formal Assessment
- Content and performance-driven
- Measures students’ performance and progress (progress measuring)
- Spontaneous
- Flexible
- Progress of every student by using actual works (individualized)
- The rubric score is used
- Day-to-day activities such as projects, assignments, experiments, and demonstration
- Criterion-referenced measure
- Qualitative
- Could be beyond the classroom environment
- Examples: checklists, observations, portfolio, rating scale, records, interviews, and journal writing
Informal Assessment
Assessments can be categorized according to the nature of the content and various functions. Both types are essential components of classroom instruction.
Formative & Summative
- Determines whether learning is taking place
- Provides feedback on how things are going and it is used to emphasize areas for further studies
- Helps students perform well at the end of the program or monitors the students’ progress
- It is a gathering of detailed information and narrow in the scope of the content
- Evaluates the effectiveness and improvement of teaching
- Conducted during teaching or instruction daily or every session
- Primarily prospective
- Examples: observation, oral questioning, assignments, quizzes, discussions, reflection, research proposal, peer or self-assessment
Formative Assessment
- Determines if learning is sufficiently complete
- Determines how well things went measures students’ overall performances
- Concerned with purposes, progress, and outcomes of the teaching-learning process
- Gathering information is less detailed but broader in the scope of content or skills assessed
- Provides information to the students, parents, and administrators on the level of accomplishment attained
- Occurs at the end of the instruction on the end of each unit
- Primarily retrospective
- Examples: unit test, final examinations, comprehensive projects, research paper, presentations, project, and portfolio
Summative Assessment
Used to interpret student performance. The test score of a student could be compared among the class standing or compare based on standardizing criteria.v
Norm-referenced & Criterion-referenced
- It is a relative ranking of students
- Students compete against each other
- Determines the students’ placement on a normal distribution curve to rank and sort students
- Determines a student’s level of the construct measured by a test concerning a well-defined reference group of students
- Examinee-centered
- Evaluates the effectiveness of the teaching program and student’s preparedness for the program
- Focuses too heavily on memorization and routine procedure
- Highlights achievement differences
- Identifies whether a particular student performs better or worse than the rest of the students
- Statistical methods used on how raw scores are interpreted such as the percentile rank, and normal curve
- NSAT, College Entrance Examinations, National Achievement Test, IQ Test, and Cognitive Ability Test
Norm-referenced Assessment
- It interprets scores in terms of absolute standard
- A student competes against him/herself
- Measures how well students have mastered a particular body of knowledge
- Determines a student’s level of performance about a well-defined domain of content
- Content-centered
- Assesses higher-level thinking and writing skills
- Emphasizes thinking and application of knowledge
- Setting performance standards
- Used to monitor students’ performance in their day-to-day activities
- Tests must be valid and reliable. Test item analysis
- Domain-referenced tests, competency tests, basic skills tests, mastery tests, performance or assessments, objective-referenced tests, authentic assessments, and standards-based tests
Criterion-referenced Assessment
Qualities of a Good Measuring Instrument
Validity, Reliability, Usability
Most important characteristic of a good test
According to Ebel and Frisbie (1991), refers to “consistency or accuracy with which the scores measure a particular cognitive ability of interest”
Validity
Types of Validity
- Content Validity
- Construct Validity
- Criterion-related Validity
determines the representation of the covered topics discussed or what has been taught
Content Validity
provides a general description of the student’s performance, thus, it provides the meaning of the scores from the test
Construct Validity
when the test to be measured is compared to the accepted standards
Criterion-related Validity
Two types of criterion-related validity
Predictive, Concurrent
- Test performance predicts future performance
- Two different tests namely the scholastic aptitude scores – test performance and the achievement test scores – criterion performance are given at a different time or months
- Determines who is likely to succeed or fail in a certain course, board examinations or occupation
Predictive Validity
- It relates to the present standing on other valued measure called a criterion
- Two different tests namely the scholastic aptitude scores – test performance and the achievement test scores – criterion performance are given at the same time
- Estimates present status or current skills in the actual setting
Concurrent Validity
- Is the consistency of measures or what it intended to measure
- High reliable scores in different test instruments are considered accurate, reproducible, and generalizable
- A test may be reliable but not valid but a valid test is always reliable
Reliability
Different Methods in Estimating Reliability
- Test-retest Reliability
- Inter-rater Reliability
- Parallel-forms Reliability
- Split-half Reliability
- It determines the consistency of test across time
- The test is administered twice at a different point in time
- The test scores are correlated, hence, the higher the correlation, the higher the reliability
Test-retest reliability
- It determines the consistency of the raters
- Compare and correlate the scores of two or more raters/judges
Inter-rater reliability
- Determines the consistency of the test content
- Compares two different tests with the same content, quality, and difficulty level that are administered to the same person
- Paired observations are correlated
Parallel-forms reliability
- It determines the consistency of the test results across items
- The test is divided into odd and even items, producing two scores for each student
- The scores are correlated which provide a measure of internal consistency
Split-half reliability
- Estimates the item (test question) internal consistency of a test.
- It measures reliability for a test with binary variables (dichotomous).
- KR 20 applies for items that have varying difficulty and with the correct answer for each item.
- If the test items have a similar difficulty, the Kuder Richardson Formula 21 is used.
- The scores of KR-20 and KR-21 range from 0 to 1, where 0 is not reliable and 1 perfectly reliable. A score of above 0.5 is usually considered reliable.
Kuder Richardson Formula 20 (KR-20) and Kuder Richardson Formula 21 (KR-21)
measures the internal consistency of test items with a set of variables
Cronbach’s alpha Formula
- the extent or practicality of using a test
- The test should administer with ease, clarity, and uniformity; clear instruction; scoring is simple and clear; answer key must be available; observe correct interpretation and application of test results
Usability