Measurement and Evaluation in Education

Almost all the Commissions on education as also the National Policy on Education, and National Curriculum Planners have stressed the need for a more continuous and comprehensive evaluation of students in order to pass more sound judgments about students’ learning and growth.

The Concepts of Test and Measurement

These concepts are often used interchangeably by practitioners and if they have the same meaning. This is not so. As a teacher, you should be able to distinguish one from the other and use any particular one at the appropriate time to discuss issues in the classroom.

a. Measurement:

The process of measurement as it implies involves carrying out the actual measurements in order to assign a quantitative meaning to a quality i.e. what is the length of the chalkboard? Determining this must be physically done. Measurement is therefore a process of assigning numerals to objects, quantities, or events in other to give quantitative meaning to such qualities.

In the classroom, to determine a child’s performance, you need to obtain quantitative measures on the individual scores of the child. If the child scores 80 in Mathematics, there is no other interpretation you should give it. You cannot say he has passed or failed. Measurement stops at ascribing the quantity but not making a value judgment on the child’s performance.

b. Assessment:

Assessment is a fact-finding activity that describes conditions that exist at a particular time. Assessment often involves measurement to gather data. However, it is the domain of assessment to organize the measurement data into interpretable forms on a number of variables. Assessment in an educational setting may describe the progress students have made toward a given educational goal at a point in time. However, it is not concerned with the explanation of the underlying reasons and does not proffer recommendations for action.

Although, there may be some implied judgment as to the satisfactoriness or otherwise of the situation. In the classroom, assessment refers to all the processes and products which are used to describe the nature and the extent of pupils’ learning. This also takes cognizance of the degree of correspondence of such learning with the objectives of instruction. Some educationists in contrasting assessment with evaluation opined that while evaluation is generally used when the subject is no person or group of persons but the effectiveness or otherwise of a course or programme of teaching or method of teaching, assessment is used generally for measuring or determining personal attributes (the totality of the student, the environment of learning and the student’s accomplishments).

A number of instruments are often used to get measurement data from various sources. These include Tests, aptitude tests, inventories, questionnaires, observation schedules etc. All these sources give data that are organized to show evidence of change and the direction of that change. A test is thus one of the assessment instruments. It is used in getting quantitative data.

c. Evaluation:

Evaluation adds the ingredient of value judgment to assessment. It is concerned with the application of its findings and implies some judgment of the effectiveness, social utility, or desirability of a product, process, or progress in terms of carefully defined and agreed-upon objectives or values. Evaluation often includes recommendations for constructive action. Thus, evaluation is a qualitative measure of the prevailing situation. It calls for evidence of the effectiveness, suitability, or goodness of the programme.

The Purposes of Evaluation

According to Oguniyi (1984), educational evaluation is carried out from time to time for the following purposes:

(i) to determine the relative effectiveness of the programme in terms of students’ behavioral output;

(ii) to make reliable decisions about educational planning;

(iii) to ascertain the worth of time, energy, and resources invested in a programme;

(iv) to identify students’ growth or lack of growth in acquiring desirable knowledge, skills, attitudes, and societal values;

(v) to help teachers determine the effectiveness of their teaching techniques and learning materials;

(vi) to help motivate students to want to learn more as they discover their progress or lack of progress in given tasks;

(vii) to encourage students to develop a sense of discipline and systematic study habits;

(viii) to provide educational administrators with adequate information about teachers’ effectiveness and school needs;

(ix) to acquaint parents or guardians with their children’s performances;

(x) to identify problems that might hinder or prevent the achievement of set goals;

(xi) to predict the general trend in the development of the teaching-learning process;

(xii) to ensure economical and efficient management of scarce resources;

(xiii) to provide an objective basis for determining the promotion of students from one class to another as well as the award of certificates;

(xiv) to provide a just basis for determining at what level of education the possessor of a certificate should enter a career.

Types of Evaluation

There are two main levels of evaluation viz: programme level and student level. Each of the two levels can involve either of the two main types of evaluation – formative and summative at various stages. Programme evaluation has to do with the determination of whether a programme has been successfully implemented or not. Student evaluation determines how well a student is performing in a programme of study.

a. Formative Evaluation:

The purpose of formative evaluation is to find out whether, after a learning experience, students are able to do what they were previously unable to do. Its ultimate goal is usually to help students perform well at the end of a programme.

b. Summative Evaluation:

Summative evaluation often attempts to determine the extent to which the broad objectives of a programme have been achieved (i.e. SSSCE, (NECO or WAEC), PROMOTION, GRADE TWO, NABTEB Exams, and other public examinations). It is concerned with the purposes, progress, and outcomes of the teaching-learning process. Summative evaluation is judgemental in nature and often carries a threat with it in that the student may have no knowledge of the evaluator and failure has a far-reaching effect on the students. However, it is more objective than formative evaluation.

Tests in the Classroom

What is a Test?

To understand the concept of “test” you must recall the earlier definitions of “assessment” and “evaluation”. Note that we said people use these terms interchangeably. But in the real sense, they are not the same. Tests are detailed or small-scale tasks carried out to identify the candidate’s level of performance and to find out how far the person has learned what was taught or is able to do what he/she is expected to do after teaching.

Tests are carried out in order to measure the efforts of the candidate and characterize the performance. Whenever you are tested, as you will be done later on in this course, it is to find out what you know, what you do not know, or even what you partially know.

The test is therefore an instrument for assessment. Assessment is broader than tests, although the term is sometimes used to mean tests as in “I want to assess your performance in the course”. Some even say they want to assess students’ scripts when they really mean they want to mark the scripts. Assessment and evaluation are closely related, although some fine distinctions had been made between the two terms.

The evaluation may be said to be the broadest. It involves the evaluation of a programme at the beginning, and during a course. This is called formative evaluation. It also involved the evaluation of a programme or a course at the end of the course. This is called summative evaluation. Testing is part of assessment but assessment is more than testing.

Tests involve the measurement of candidates’ performance, while evaluation is a systematic way of assessing the success or failure of a programme. Evaluation involves assessment but not all assessments are evaluation. Some are reappraisal of a thing, a person, a life, etc.

Aims and Objectives of Classroom Tests

In this section, we will discuss the aims and objectives of classroom tests. But before we do this, what do we mean by classroom tests? These can be tests designed by the teacher to determine or monitor the progress of the students or pupils in the classroom. It may also be extended to all examinations conducted in a classroom situation. Whichever interpretation is given, classroom tests have the following aims and objectives:

  1. Inform teachers about the performance of the learners in their classes.
  2. Show the progress that the learners are making in the class.
  3. Compare the performance of one learner with the other to know how to classify them either as weak learners who need more attention, average learners, and strong or high achievers that can be used to assist the weak learners.
  4. Promote a pupil or student from one class to another.
  5. Reshape teaching items, especially where tests show that certain items are poorly learned either because they are poorly taught or difficult for the learners to learn. Reshaping teaching items may involve resetting learning objectives, teaching objectives, sequencing teaching items, or grading the items being taught for effective learning.
  6. For certification – we test in order to certify that a learner has completed the course and can leave. After such tests or examinations, certificates are issued.
  7. Conduct research – sometimes we conduct class tests for research purposes. We want to experiment with whether a particular method or technique or approach is effective or not. In this case, we test the students before (pre-test) using the technique. We then teach using the technique on one group of a comparative level, (i.e.experimental group) and do not use the technique but another in another group of a comparative level, (i.e.control group). Later on, you compare outcomes (results) on the experimental and control groups to find out the effectiveness of the technique on the performance of the experimental group.

Types of Tests

Types of tests can be determined from different perspectives. You can look at types of tests in terms of whether they are discrete or integrative. Discrete point tests are expected to test one item or skill at a time, while integrative tests combine various items, structures, and skills into one single test.

a. Discrete Point Tests:

As we have defined above, a discrete point test, measures or tests one item, structure, skill, or idea, at a time. There are many examples of a discrete point test. For language tests, a discrete point test may be testing the meaning of a particular word, a grammatical item, the production of a sound, e.g. long or short vowels, filling in a gap with a specific item, and so on.

A mathematics test, it may be testing the knowledge of a particular multiplication table. Let’s give some concrete examples. From the words lettered A-D, choose the word that has the same vowel sound as the one represented by the letters underlined.

b. Integrative Tests:

As you have learned earlier on, tests can be integrative, that is, testing many items together in an integrative manner. In integrative tests, various items, structures, discourse types, pragmatic forms, construction types, skills, and so on, are tested simultaneously.

Popular examples of integrative tests are essay tests, close tests, reading comprehension tests, working on a mathematical problem that requires the application of many skills, or construction types that require different skills and competencies.

Other integrative tests are

a. Essay Questions:

Give five main characteristics of traditional grammar. Illustrate each characteristic with specific examples.

Reading Comprehension Test

Read the passage below and answer the following questions: (A passage on strike)

The second perspective for identifying different kinds of tests is by the aim and objectives of the test. For example, if the test is for recording the continuous progress of the candidate, it is referred to as a continuous assessment test. Some of the tests that are for specific purposes are listed below. The purpose for which the test is constructed is also indicated.

i. Placement test: for placing students at a particular level, school, or college.

ii. Achievement tests: for measuring the achievement of a candidate in a particular course either during or at the end of the course.

iii. Diagnostic tests: for determining the problems of a student in a particular area, task, course, or programme. Diagnostic tests also bring out areas of difficulty of a student for the purpose of remediation.

iv. Aptitude tests: are designed to determine the aptitude of a student for a particular task, course, programme, job, etc.

v. Predictive tests: designed to be able to predict the learning outcomes of the candidate. A predictive test is able to predict or forecast that if the candidate is able to pass a particular test, he/she will be able to carry out a particular task, skill, course, action, or programme.

vi. Standardized tests: any of the above-mentioned tests that have been tried out with large groups of individuals, whose scores provide standard norms or reference points for interpreting any scores that anybody who writes the tests has attained. Standardized tests are to be administered in a standard manner under uniform positions. They are tested and re-tested and have been proven to produce valid or reliable scores.

vii. Continuous assessment tests are designed to measure the progress of students in a continuous manner. Such tests are taken intermittently and students’ progress is measured regularly. The cumulative scores of students in continuous assessment often form part of the overall assessment of the students in the course or subject.

viii. Teacher-made tests are tests produced by teachers for particular classroom use. Such tests may not be used far and wide but are often designed to meet the particular learning needs of the students.

Characteristics of a Good Test

A test is not something that is done in a careless or haphazard manner. There are some qualities that are observed and analyzed in a good test. Some of these are discussed under the various headings in this section. Indeed, whether the test is a diagnostic or achievement test, the characteristic features described here are basically the same.

i. A good test should be valid: by this we mean it should measure what it is supposed to measure or be suitable for its intended purpose. Test validity will be discussed fully later.

ii. A good test should be reliable: reliability simply means measuring what it purports to measure consistently. On a reliable test, you can be confident that someone will get more or less the same score on different occasions or when it is used by different people. 

iii. A good test must be capable of accurate measurement of the academic ability of the learner: a good test should give a true picture of the learner. It should point out clearly areas that are learned and areas not learned. All being equal, a good test should isolate the good from the bad. A good student should not fail a good test, while a poor student passes with flying colors.

iv. A good test should combine both discrete point and integrative test procedures for a fuller representation of teaching-learning points. The test should focus on both discrete points of the subject area as well as the integrative aspects. A good test should integrate all various learners’ needs, a range of teaching-learning situations, and objective and subjective items.

v. A good test must represent teaching-learning objectives and goals: the test should be conscious of the objectives of learning and the objectives of testing. For example, if the objective of learning is to master a particular skill and apply the skill, testing should be directed toward the mastery and application of the skill.

vi. Test materials must be properly and systematically selected: the test materials must be selected in such a way that they cover the syllabus, teaching course outlines, or the subject area. The materials should be of mixed difficulty levels (not too easy or too difficult) which represent the specific targeted learners’ needs that were identified at the beginning of the course.

vii. Variety is also a characteristic of a good test. This includes a variety of test types: multiple-choice tests, subjective tests, and so on. It also includes a variety of tasks and so on. It also includes a variety of tasks within each test: writing, reading, speaking, listening, re-writing, transcoding, solving, organizing and presenting extended information, interpreting, black filling, matching, extracting points, distinguishing, identifying, constructing, producing, designing, etc. In most cases, both the tasks and the materials to be used in the tests should be real to the life situation of what the learner is being trained for.

Construction of Tests in the Classroom

Teacher-made tests are indispensable in the evaluation as they are handy in assessing the degree of mastery of the specific units taught by the teacher. The principles behind the construction of the different categories of Tests mentioned above are essentially the same.

Preparation of the Test Blueprint

The test blueprint is a table showing the number of items that will be asked under each topic of the content and the process objective. This is why it is often called Specification Table.

Thus, there are two dimensions to the test blueprint, the content and the process objectives. As mentioned earlier, the content consists of a series of topics from which the competence of the pupils is to be tested. These are usually listed on the left-hand side of the table. The process objectives or mental processes are usually listed on the top row of the table.

The process objectives are derived from the behavioral objectives stated for the course initially. They are the various mental processes involved in achieving each objective. Usually, there are about six of these as listed under the cognitive domain viz: Knowledge, Comprehension, Analysis, Synthesis, Application, and Evaluation.

(i) Knowledge or Remembering

This involves the ability of the pupils to recall specific facts, terms, vocabulary, principles, concepts, and generalizations from memory. This may involve the teacher asking pupils to give the date of a particular event, the capital of a state, or reciting multiplication tables.

(ii) Comprehension and Understanding

This tests the ability of the pupils to translate, infer, compare, explain, interpret or extrapolate what is taught. The pupils should be able to identify similarities and differences among objects or concepts; predict or draw conclusions from given information; describe or define a given set of data i.e. what is democracy? Explain the role of chloroplast in photosynthesis.

(iii) Application

Here you want to test the ability of the students to use principles; rules and generalizations in solving problems in novel situations, e.g. how would you recover table salt from water?

(iv) Analysis

This is to analyze or break an idea into its parts and show that the student understands its relationships.

(v) Synthesis

The student is expected to synthesize or put elements together to form a new matter and produce a unique communication, plan, or set of abstract relations.

(vi) Evaluation

The student is expected to make judgments based on evidence.

Scales of Measurement

Any test can be useful only when it is reliable, and it should be able to measure only that attribute or characteristic for which it has been constructed. The tools for measurement have been needed from ancient times, and we need them in our daily life

Measurement has the following four chief levels:

1. Nominal Scale

This is the lowest level of measurement. Some people call it by the name of classification level too. Under this scale, the measured objects or events are classified into separate groups on the basis of their certain attributes, and this group is given a separate name, number, or code for its Notes easy identification. The chief feature of this group is that all elements or individuals will be similar to each other within the group but they will be entirely different when compared to those of another group.

This feature of the group is called internal homogeneity. For example, the cricket teams of Sri Lanka and Australia will be given different color dresses in order for easy identification and their dresses will be marked with the letter S and A respectively.

2. Ordinal Scale

In the arrangement of scales, the ordinal scale is put in second place from down below. In this scale, objects, individuals, events, characteristics, or responses are arranged in hierarchical order in ascending or descending order depending on the basis of certain attributes.

After that, they are given ranks. Giving first, second or third position or rank to students on the basis of their scores, giving preference in employment to candidates on the basis of eligibility and experience, awarding trophies to players on the basis of their performance, selecting Miss World or Miss Universe on the basis of beauty, selecting the best industrialist, selecting professors for the college proctorial board and arranging them in hierarchical order in view of their administrative accomplishment, arranging fruits on the basis of their taste and flavor, etc. are some of the illustrations of this scale.

3. Interval Scale

This is the third level of measurement. This scale endeavors to do away with the limitations of the above two scales. Under this scale, we display the difference between any two classes, individuals, or objects by the medium of scores. The distance between the two differences is equal.

The lack of an exact zero point is a shortcoming of this scale, due to which the measurement done by this scale is relative measurement, and not absolute; that is, if a student obtains zero marks in this scale, then it should not be concluded that the student is fully ignorant of the given subject.

4. Ratio Scale

This is the highest level of measurement. This scale comprises all features of all other scales. The presence of an exact or true zero point is the chief feature of this scale. This zero point is not an arbitrary point, rather it is related to the zero amount of certain attributes or features. In physical measurement, there is always an absolute zero point, such as meter, km, gram, liter, millimeter, etc

Measurement of height, length, weight, or distance is started from zero point. In the ratio scale, the true zero point is considered the initial point of the scale. So, we can find out the ratio between the distance of any two places, and on its basis, we can say with certainty how distant one place is from another.

Qualities of a Good Test

A good test should possess the following qualities:-

a. Validity

A test is considered valid when it measures what it is supposed to measure. Types of validity are:

b. Operational Validity

A test will have operational validity if the tasks required by the test are sufficient to evaluate the definite activities or qualities.

c. Predictive Validity

A test has predictive validity if scores on it predict future performance

d. Content Validity

If the items in the test constitute a representative sample of the total course content to be tested, the test can be said to have content validity.

e. Construct Validity

Construct validity involves explaining the test scores psychologically. A test is interpreted in terms of numerous research findings.


A test is considered reliable if it is taken again by the same students under the same circumstances and the scoring average is almost constant, taking into consideration that the time between the test and the retest is of reasonable length. Methods of determining reliability are:

Test-retest method

A test is administrated to the same group with short intervals. The scores are tabulated and correlation is calculated. The higher the correlation, the more the reliability.

Split-half method

The scores of the odd and even items are taken and the correlation between the two sets of scores is determined.

Parallel form method Notes

• Reliability is determined using two equivalent forms of the same test content.

• These prepared tests are administrated to the same group one after the other.

• The test forms should be identical with respect to the number of items, content, difficulty level, etc.

• Determining the correlation between the two sets of scores obtained by the group in the two tests.

• If the higher the correlation, the more the reliability.


A test is said to be objective if it is free from personal biases in interpreting its scope as well as in scoring the responses.

Item Analysis

The success of a test depends on two factors. First, how successfully a test measures our prescribed objectives, and second, can each item in the test discriminate between bright and dull students or not? If a test is not capable of measuring the given objectives meaningfully and discriminating against students, then it cannot be called a successful test.

In Conclusion

Measurement refers to the process of delegating a numerical index, to the object in a meaningful and consistent manner. Evaluation is when the comparison is made between the score of a learner with the score of other learners and judges the results.


Akanne Academy is an online learning platform that provides educational lecture materials, software tutorials, technological skills training, digital products, etc. Use the top right button to join our group.

Post a Comment

Previous Post Next Post