Findings: Examiner (Part IV)

This is for YH, who added the first comment to the previous post. He asked that question which people outside the Dunwich-esque community of educators always innocently ask: "Do you mark for good answers or do you mark for 'good' points?"

Let me explain the difference.

Consider a game of tennis. The first player serves, the ball hits the ground once and flashes past her opponent. Score: 15-love. The second player questions the point. After some inspection, the umpire decides the point is 'good'; that is, the point will be counted since it meets the definition of what a point is. This is a 'good' point.

On the other hand, let us suppose that the return of service is a bullet that no reasonable human being stands any chance of intercepting. But the player isn't reasonable. Putting on superhuman effort, she breaks the record for a 10-metre sprint, retrieves the ball, and somehow sends it approximately back the way it came. On the way back, the ball hits the net, does a wheelie along the net, somehow flies in an arc that takes it in a great circle route over a post, and takes a long time looping back into the opponent's court. Whereupon the opponent, having plenty of time to prepare, puts it away for a point against the first player.

In this latter case, according to USTA rules 2, 24, 25 and 26, the first player made a good return (a good 'answer), but failed to score a point. It was also great entertainment from the spectators' point of view. The journalists are enthusiastic, thus ensuring that the spectacular return is forever enshrined in tennis lore. But sorry, no point.

I have seen many such scripts in my time as an examiner. The answers in all these cases meet all the criteria for good answers to their respective questions. But the marking scheme is fixed, and if you give a mark for what is not in the marking scheme, it either goes to adjudication or the examiner is told that he has made a mistake himself.

This is for a simple reason. Some questions just have too many answers. It would take too long to adjudicate them fairly to a 99.9% level of confidence.

A question such as, "Discuss the claim that some areas of knowledge are discovered and others are invented," and requiring an answer of 1200-1600 words (for example), may be answered in a practically infinite number of ways. It is therefore certain that of those many ways, some will be perfectly good answers, but fail to score the points one might think they deserve. Even a question like, "What does the phrase 'basic compound' mean?" is likely to produce many good answers some of which will fail to score any points, depending on the subject being examined.

In a perfect examination system, the examination would have 100% validity (i.e., it only tests what it's supposed to test), 100% reliability (i.e., it always gives the same score for the same level of performance) and 100% utility (i.e., it is easy to administer and grade, and serves its ostensible purpose faithfully).

But there are no perfect exams. We have to make do with reasonably good exams that work at anywhere from an 80% to a 95% confidence level. And when we no longer have any idea what the usefulness of the material being tested is, in real life, we have reached a point at which we must consider changing the examination.

Labels: Education, Examinations, Ideas

2 Comments:

Unknown said...: Thanks for the wonderful reply. I was under the impression IB markschemes are often open to interpretation.

I actually had this quote in mind while writing the previous comment:
"The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift."

Oh, if only we knew. as well as he.; Wednesday, January 14, 2009 7:01:00 am
Unknown said...: I suppose everyone has different aims, interests, and reasons according to how they are vested. Often which are very different from what a third party would expect.

Thus creating misconceptions which manifests itself in mindsets such as exam results being the absolute measure of intellectual ability

Not saying they are a bad gauge of ability, but that there might be a degree of random error from the examinees' point of view based on how well his answer conforms to the markscheme.

Too bad people tend to give too much credit to themselves for getting a better than expected (luck??) result, too little when they actually do badly.

and it's back to square one.
Sorry its 3am and there is no coffee in sight; Wednesday, January 14, 2009 7:59:00 am

<< Home

Findings

Monday, January 12, 2009

Examiner (Part IV)

2 Comments:

Thought of the Day

Previous Posts

Autochthon

Tracks

Little Pictures