Findings: Tweaking the Meritocracy

Supposing you had a meritocratic state, and you wanted to adjust the academic inclinations of the populace. The best way would be to adjust the examination system so that study in specific areas would be rewarded by perceived economic or social advantage, normally conveyed by that well-established proxy, aggregate test scores.

So let's look at Atlantis. This is a state in which the young, just before they enter their teenage years, are tested to the brink of destruction (but mostly, not beyond) by a test called the Phased Stress Loading Exercise. In this test, the candidates are subjected to four kinds of load.

The first kind is linguistic; it is a double-weighted test (at least at the raw score level) designed to evaluate a candidate's competence in the language that he or she will have no choice but to use in almost every public area of life (and in 60% of the cases, every private area of life too) for the forseeable future.

The second is pseudocultural; it is also double-weighted at the raw score level, but it evaluates a candidate's competence in a language that is sometimes touted as a future trade opportunity, and which is otherwise used by less than 40% of the population. The double-weighting here is a political tool which was used to eliminate that language's dialect rivals decades ago, thus severing links between the modern population and its ancestors.

The third kind is mathematical; this is supposedly easy to test, and so it is. However, the test itself has been made to discriminate better simply because it doesn't have double-weighting at the raw level and yet is probably more important than test two. This of course has inspired furious rants from those who feel discriminated against — which is what the test is designed to do.

The last kind is scientific; this is arbitrary, as most tests of scientific ability are at that age. What it does test is scientific knowledge at a simplified level, coupled with capacity for absorbing advanced concepts at an abstracted level. This is somehow supposed to be correlated strongly to future performance in science. However, this is not true except in the general sense that students who are good at memorizing things at the age of 12 are often good at doing that at age 18.

Eventually, the final aggregate score is calculated. This is the sum of the t-scores for each test. The t-scores are calculated like this:

Take the candidate's raw score for any test, r, and find the difference between that and the average score of the population, m.
Divide this difference, r-m, by the standard deviation (roughly, the spread), z.
Multiply this number, (r-m)/z, by 10 and add 50.
This gives the total test score for that test, t = 10[(r-m)/z]+50.
Add up all the t-scores for the four tests.

This gives the candidate an aggregate score which will be higher if the candidate does well in every test and everyone else does worse (and especially if everyone else does a lot worse with a relatively narrow spread). Note that after this process is completed, tests 1 and 2 will no longer likely be double-weighted, since the process 'normalizes' things in a way.

How do you tweak this?

Well, if you set a very tough paper, you will see record aggregate scores and be able to separate the best from the less good by a larger margin. However, it is probably easier to just adjust the marks awarded. After all, people will complain, "The paper was so tough that even I wouldn't be able to answer it!" Which of course says more about these people than it does about the paper.

A simpler paper is easier to grade, and since we haven't got enough competent markers in most populations, that's a bonus in time saved — less adjudication or moderation will be required. But you'll see lower aggregate scores and less-informed people will say, "Oh look, we've had a bad year!" and plot silly graphs which predict the downfall of society.

Actually, I have no idea how exactly any such tweaking would occur. But tweakability exists for any system, and I have just shown how this might manifest in the mythical Atlantean system. Think of it as an imaginary service for an imaginary public.

Labels: Examinations, Testing, Tweakability

5 Comments:

P0litik said...: haha..i so remember my science paper. totally useless. it was like a memory test.

'The platypus is a special mammal that reproduces by laying eggs'; Friday, May 14, 2010 5:36:00 am
Trebuchet said...: Yeah, my own experience was like hacking a database. So much... stuff. In order to figure out how it was put together, you needed brains. But most people just used the stuff to answer simple queries.; Friday, May 14, 2010 2:49:00 pm
Anonymous said...: hi. i'd like to know if there's a reason for step 1 (and the other steps, if possible) of the methodology. sorry for my ignorance but wouldn't it be more reasonable to add up each individual's scores and then average them? thank you.; Friday, May 14, 2010 8:56:00 pm
Trebuchet said...: jjwalden: the main thing is that this examination is a 'streaming' or 'tracking' examination, designed to filter candidates into different curricular programs.

So what they really want to know is how much better each individual is compared to the cohort. This method rewards outstanding achievement by giving extra points based on how much better the candidate's score is compared to the average score, and how rare the extreme scores are. If the average score is low and the extreme scores are rare, the high extreme candidates will have a huge score.

They will then qualify for certain high-ability tracks by a clear margin.

Hope that helps!; Friday, May 14, 2010 11:58:00 pm
Anonymous said...: Trebuchet: yes, thank you!; Saturday, May 15, 2010 6:58:00 pm

<< Home

Findings

Thursday, May 13, 2010

Tweaking the Meritocracy

5 Comments:

Thought of the Day

Previous Posts

Autochthon

Tracks

Little Pictures