Concerns are growing about the impact of AI across the higher education sector, but could it offer real value in university assessment and marking? Dr Deborah Talmi, an experimental psychologist at the University of Cambridge, is leading ai@cam’s AI-deas project to develop an evidence-based framework for the role of AI in student evaluation.
What was the motivation behind the ‘Supporting evaluation through AI’ project?
At the highest level, I’m interested in the value of academia for society and the changing landscape of higher education. The increasing use of AI in student coursework has been widely discussed, particularly regarding academic integrity and data protection. As an examiner, this sparked my curiosity about AI’s potential in student assessments and whether it could ever align with human judgement.
This project emerged from some fundamental questions: how does AI’s evaluation of student work compare to human examiners? What are the differences? What are the overlaps? As researchers, our goal is to start an informed, evidence-based discussion around the role and use of AI in assessment. Rather than relying on speculation, we aim to gather clear data to understand how AI might complement – or potentially challenge – traditional marking methods. We also need to understand, as a sector, the potential benefits, and risks of using this technology in assessment, and that means bringing diverse voices into the conversation.
How will you conduct the research?
The project has three strands. Firstly, we will be establishing an advisory board to unify expertise and consider best practices for the responsible integration of AI into assessment processes. We are working with co-investigators (examiners, teachers, and educators) from three universities – the University of Cambridge, the University of Nottingham, and Manchester Metropolitan University. Each institution will be represented on our committee alongside those from the Office for Students and the Department for Education.
Our second stand of activity we will be focusing on quantitative analysis by measuring alignment between AI and human marking, assessing how AI performs across different types of student responses. This includes examining potential biases, consistency, and accuracy.
To do this, we will be analysing a sample of psychology essays, volunteered to us by students, that have already been graded by a human assessor. The next step is to submit these essays to various AI models (including GPT, Claude, and Gemini) using carefully designed prompts to see how AI will mark them. For me, it will be fascinating to see whether different AI models offer a different analysis of the work submitted and if different AI models collaborate with each other.
Our final step will be gathering qualitative research. We will be conducting surveys and focus groups with students, lecturers, and university professional services to understand perceptions of AI in assessment. Do students trust AI-generated feedback? Do educators view AI as a valuable tool, or do they perceive it as a potential threat? These discussions will help shape the ethical and practical considerations of AI-assisted assessment.
The hope is, that by the end of the project, we have enough evidence to share our findings and invite others to have their say in steering the future of this technology.
What challenges do you anticipate facing throughout the project?
A major challenge is ensuring AI models are used ethically and transparently. We must navigate academic integrity, intellectual property, and bias. Additionally, AI models will need to be queried effectively, as poorly designed prompts could lead to inaccurate assessments.
Another challenge is overcoming scepticism within the academic community. Many educators instinctively resist AI-assisted marking, fearing it could rightly undermine educational quality. Others are now starting to believe that AI will become so good so quickly, that it could replace human teachers altogether. A key aspect of our work is to facilitate an open dialogue that is based on evidence-based findings rather than assumption.
There is also the logistical challenge of interdisciplinary collaboration. This project involves AI specialists, psychologists, educators, and university staff across all levels, all of whom bring valuable opinions to this discussion. Ensuring smooth communication and integration of expertise will be essential to the project’s success.
What are the potential benefits of adopting AI in assessment?
If AI proves to be a reliable tool for assessment, it could alleviate a lot of the administrative burden associated with marking. Rather than spending weeks reviewing essays, lecturers could use that time for in-depth discussions with students, providing personalised feedback and mentorship. This shift could lead to a more dynamic learning environment for students, where lecturers could focus on fostering deeper critical engagement around a subject area.
AI could also help standardise marking, reducing variability and unconscious bias among teachers and educators. Additionally, AI-generated feedback could serve as a tool for students to get immediate comments on drafts, allowing them to refine their work before a final submission. At the end of the day, it could make higher education more accessible. However, there are strong concerns that AI might overlook nuances in argumentation and creativity that only human examiners may recognise.
Ideally, we want to examine how AI might serve as a supportive tool, ensuring that students receive high-quality feedback while maintaining the integrity of academic assessment.
What greater impact do you hope this project will have on the HE sector?
The success of this project will be measured by our ability to establish an evidence-based understanding of the role of AI in marking. We want to identify the conditions under which AI aligns with human judgement and facilitate an open conversation about AI integration in higher education assessment.
As a result of our work, universities will be able to systematically assess the reliability and effectiveness of AI-supported marking systems, facilitating informed decision-making about integrating AI into assessment practices.
In the first instance, the project will provide methodologies, best practices and baseline data required for evidence-based decisions on whether and how to deploy AI assistants as part of varied educational activities. This, in turn, has the potential to substantially improve student outcomes by providing educators with insights into how AI can enhance formative feedback, thereby promoting deeper learning and more tailored educational experiences.
If AI proves to be effective, it could prompt further discussions about its potential use in secondary and post-secondary education, including A-levels and GCSEs. Even if AI is found to have limitations, understanding its strengths and weaknesses will enable institutions to make more informed decisions regarding its application.
Ultimately, we aim to provide recommendations for best practices in AI-assisted assessment, ensuring that any implementation is ethical, fair, and beneficial to students and educators alike.
What inspired you to work in this field?
As a researcher, I find the process of discovery deeply rewarding. The opportunity to analyse data and uncover new insights that have not been seen before is exhilarating. However, this isn’t just about AI – it’s about the future of education.
Ultimately, this is an opportunity to shape the conversation about AI in higher education. By approaching it with curiosity rather than fear, we can determine whether AI should be a tool for improving education or if there are boundaries that need to be protected and maintained.
This is an exciting and critical area of research, and I look forward to seeing where the evidence leads us.
More about the Supporting evaluation through AI Project
This research is being led by the University of Cambridge in collaboration with the University of Nottingham and Manchester Metropolitan University. If you’re interested in learning more or contributing to the project, please reach out to a member of the project team or Dr Deborah Talmi.