The A-levels algorithm fiasco and algorithmic accountability in practice

August 17, 2020 | 0 Comments | Uncategorized

Last week, thousands of students in England and Wales received their A-level exam grades. The catch? The grades were the result of an algorithm rather than actual exams. Almost 40% of students received lower grades than they had anticipated. The students were faced with the prospect of missing the opportunity to attend their university of choice. Earlier today, however, the UK Government made a complete U-turn and decided to ignore the algorithm’s outcomes.

The situation sparked my interest because I have been researching the use of algorithms and computer models in UK government since 2013. Still, it was rather challenging to follow the discourse because I knew little about the UK education system in general and the university admissions procedure in particular. I spent quite some time in piecing together relevant information and I figured it might be useful to other non-UK residents interested in the A-level fiasco to share an overview. With this post I hope to do two things: 1. Help others to develop an understanding of the situation, 2. Share some thoughts on the situation from the perspective of my research.

A couple of cautionary notes are in order: The situation is still unfolding which means that there are still considerable vagueness pertaining the computer model and its impacts. Moreover, given the  level of media attention surrounding the A-level crisis, it can be challenging to separate sensational headlines from matters of fact. I have tried my best to validate the information and have provided links to the Tweets, blogs, and articles so that you can judge for yourself.

A-levels and university applications

First, some background based on a Twitter thread by Nick Brown. Final-year high school students (age 16-18) in England and Wales do “A-level” exams. These exams are important for their university application. Students apply to university prior to the A-level exams and receive a place conditional on their performance in the exams. Students typically take the A-level exams in three to four subjects which can range from “Marine science” to “Performance studies” or “Mathematics. The subjects are much more specialised that I would have thought, a list of all subjects is available here.

The A-level exams are graded A* (best), A, B, C, D, E (worst), and U (fail). To get into university students need at least C scores, although some universities may accept D scores. For top universities like Oxford or Cambridge the requirement is typically A scores across the board, with one or more A* grades. Applications to university are made through the centralised UCAS system.

Students can apply for up to five course-university combinations through the UCAS system. This allows them to “hedge” their applications. Say a student applies to study History at Cambridge and receives an offer that is conditional on getting three A* grades, then she can also provisionally accept a less stringent offer (say AAB) to study Sociology at Sussex. If the student gets the three A* grades, she can accept the Cambridge offer, otherwise she can go for the offer at Sussex.

COVID-19 and the cap

The COVID-19 crisis has had a considerable impact on the United Kingdom and has hit the education sector in two ways that are relevant to the current subject. First, one report suggests student numbers may go down by as much as 24%. As you can imagine, this presents a huge drop in tuition fee income for universities which will compete for student numbers in an already competitive market. In response, the UK government decided to impose a limit on student number growth rates of 5%. Any university attracting more students than the cap would face substantial financial penalties. The move was said to prevent prestigious institutions from attracting students from other universities.

The computer model

The second impact of COVID-19 was the cancellation of the A-Level exams. This presented a problem to the Department for Education and the universities, because the A-level grades are such an important part in the university application procedure.Ofqual, the exams regulator, decided to estimate the A-level grade on the basis of: 1.The historical grade distribution of schools from the three previous years (2017-2019); 2. The rank of each student within her own school for a particular subject, based on a teacher’s evaluation of their likely grade had the A-levels gone forward as planned (called the “Centre Assessed Grade” or CAG for short); 3. The previous exam results for a student per subject. 

Ofqual teamed up with Cambridge Assessment to develop a model. That model looks at the historical grade distribution of a school and then decides a students’ grade on the basis of their ranking. For instance, if you’re halfway down the ranking list, then your grade is roughly whatever the person halfway down the ranking list in previous years obtained. This correction was done to prevent grade inflation, but it can produce unfair results. This tool and accompanying blog shed some light on how the model pans out in different situations. For instance, if no one from your school has gotten A* grades in the past three years, you are very unlikely – if not impossible – to get an A* even if your past grades and CAG would indicate so.

Another particularity of the model is that it puts more weight on the CAGs if there are less than 15 students for a particular school in a particular subject. This means that students at smaller schools are more likely to benefit from grade inflation than those at larger schools. This reinforces existing inequalities, as it was shown that the “proportion of A* and As awarded to independent schools rose by 4.7 percentage points – more than double the rate for state comprehensive schools”.

Image
Table from the Ofqual technical report showing the differences in grade changes across different types of schools.

Appeals

Several people have identified issues with Ofqual’s model and the 319 page technical report. Some of these concerns were raised months before the public outrage over the A-level exam grading. Amongst other things, experts have critized the low accuracy and lack of uncertainty analysis of the model. However, even if the modelling had been perfectly executed, it would be prone to weaknesses. It follows that Ofqual could have anticipated their modelling would result in unfair or unjust outcomes to some students. You would have expected that they would have put in place a clear appeal procedure.

Regrettably, the appeal procedure itself is subject to controversy since students had to pay a fee to contest their grades and the procedure was seriously complicated. Even as universities were confirming places to students based on their estimated A-levels, it remained unclear how students could appeal their grades. Like the model itself, the flaws in the appeals procedure are likely to disproportionately affect students from lower social economic backgrounds.

The situation was further exacerbated because the flexibility of universities to take in more students was limited to the earlier mentioned cap. Universities thus could not afford being more flexible in their admissions procedure even if they wanted to.

Quality assurance

The controversy surrounding the A-level exam grade modelling presents a climax in the awareness of algorithms amongst the general public; UK students have taken to the streets chanting “Fuck the algorithm”. Issues with modelling in government are, however, nothing new. After issues came to light with the modelling and analysis conducted for the InterCity West Coast Competition, the UK Treasury conducted a review of all government analytical models in 2013. That review and the resulting UK guidance on producing quality analysis (Aqua book) stress the importance of proportionate quality assurance. In cases of “high business risk”, they prescribe external audit or review for the modelling. So far, there has been no evidence of such external review for the A-level grade modelling. Reportedly, the Royal Statistical Society offered to help, but Ofqual ignored their offer since the RSS would not sign non-disclosure agreements.

Figure from the “Aqua” book detailing the expected level of quality assurance for models used in UK government.

Data privacy impact assessment

Since Ofqual processed personal data as part of their A-level exam grade modelling, they are subject to the General Data Protect Regulation (GDPR) and need to conduct a Data Privacy Impact Assessment (DPIA). Ofqual shared a simplified version of this DPIA on their website, but several experts point out issues with it. Most importantly, Ofqual argues in the DPIA that students were not subject to “automated decision making”. The GDPR provides safeguards against this through the right to not be subjected to automated decision making. While the DPIA may argue that teachers contributed to the A-level grades, one expert suggest that this is not sufficient. This would mean that students could have asked Ofqual to take a new decision which is not based solely on the A-level grade model.

Algorithmic accountability

The entire situation is great mess, and really shows how difficult it is to get computer models right. Despite Ofqual’s efforts to be transparent about the modelling by sharing technical documentation, much remains unclear about the development and use of the algorithm. While the 319 page technical documentation may help increase transparency to experts, it is of little use to those affected by the A-level grading; students. This highlights a key accountability asymmetry for quantifications like models and demonstrates why independent review and policing of high stakes modelling is so important.

Guidelines alone have not prevented modelling mishaps in the past and will not prevent incidents towards the future. Moreover, responsible use of algorithms needs to go beyond transparency of the model itself and has to consider the ways in which it interacts with its stakeholders. In the case of the A-level grading model, Ofqual should at the very least have anticipated the importance of a clear appeals procedure.

The A-level exam grading fiasco does not sit in isolation. Similar issues with marking by algorithm were reported in Scotland and across Europe for the International Baccalaureate exam. Thanks to huge public outcry, pressure from the media, and legal action the UK government decided not to use the A-level grades produced by the algorithm. Instead, the CAG will be used as the final exam grade. In addition, the cap on student number growth has been lifted.

Although this may seem like a happy ending, the problem now lies entirely with the universities. Less prestigious institutions are faced with increased competition and potential financial difficulty, while top universities may struggle to comply with COVID-19 guidelines given a larger student intake.

Daan Kolkman is a senior research fellow at the Jheronimus Academy of Data Science and the Technical University of Eindhoven. You can follow his work on Twitter or Google Scholar

Comments are closed.