Home Tech Researchers trick college scorers with AI-generated exams

Researchers trick college scorers with AI-generated exams

0 comment
Researchers trick college scorers with AI-generated exams

Researchers at the University of Reading tricked their own teachers by secretly submitting AI-generated exam answers that went undetected and scored better than real students.

The project created fake student identities to submit raw responses generated by ChatGPT-4 in online take-home assessments for undergraduate courses.

University raters, who were not informed about the project, only marked one of the 33 entries, and the remaining AI responses received higher than average ratings than the students.

The authors said their findings showed that AI processors like ChatGPT were now passing the “Turing test” (named after computer science pioneer Alan Turing) of being able to pass undetected by experienced judges.

technology/2024/feb/01/more-than-half-uk-undergraduates-ai-essays-artificial-intelligence"},"ajaxUrl":"https://api.nextgen.guardianapps.co.uk","format":{"display":0,"theme":0,"design":0}}" config="{"renderingTarget":"Web","darkModeAvailable":false,"updateLogoAdPartnerSwitch":true,"assetOrigin":"https://assets.guim.co.uk/"}"/>

Billed as “the largest and most robust blind study of its kind” to investigate whether human educators could detect AI-generated responses, the authors warned it had important implications for how universities assess students.

“Our research shows that it is of international importance to understand how AI will affect the integrity of educational assessments,” said Dr Peter Scare, one of the authors and an associate professor in Reading’s School of Psychology and Clinical Language Sciences.

“We won’t necessarily completely return to handwritten exams, but (the) global education sector will need to evolve in the face of AI.”

The study concluded: “According to current trends, AI’s ability to exhibit more abstract reasoning will increase and its detectability will decrease, meaning the problem of academic integrity will worsen.”

Experts who reviewed the study said it was a death sentence for take-home exams or unproctored courses.

Professor Karen Yeung, Research Fellow in Law, Ethics and Computing at the University of Birmingham, said: “The publication of this real-world QA test demonstrates very clearly that freely and openly available generative AI tools enable students to cheat on their results. exams without difficulty to obtain better grades, but that type of cheating is practically undetectable.”

The study suggests that universities could incorporate student-generated AI material into assessments. Professor Etienne Roesch, another of the authors, said: “As a sector, we need to agree how we expect students to use and recognize the role of AI in their work. The same applies to the broader use of AI in other areas of life to avoid a society-wide crisis of trust.”

Professor Elizabeth McCrum, Reading’s vice-chancellor for education, said the university was “moving away” from using take-home online exams and was developing alternatives that would involve applying knowledge in “real-life, often workplace-related” settings.

McCrum said: “Some assessments will help students use AI. Teach them to use it critically and ethically; develop their AI literacy and equip them with the skills needed for the modern workplace. Other assessments will be completed without the use of AI.”

But Yeung said allowing the use of AI in exams at schools and universities could create its own problems when it comes to “disqualifying” students.

“Just as many of us can no longer navigate unfamiliar places without the help of Google Maps, there is a real danger that the next generation will end up effectively tethered to these machines, unable to think, analyse or write seriously without their assistance,” Yeung said.

In the final notes of the study, the authors suggest that they may have used AI to prepare and write the research, stating: “Would you consider it ‘cheating’? If you considered it a ‘cheat’ but we denied using GPT-4 (or any other AI), how would you try to prove that we were lying?

A spokesperson for Reading confirmed that the study was “definitely conducted by humans.”

You may also like