AI is getting better and better. It’s becoming more and more popular at the same time as online learning is booming in popularity. What's stopping students from using AI to do their assignments? Who would even be able to tell if they did? As AI continues to take over jobs, and as more and more of our everyday activities are becoming robotics-enabled, the question becomes how to regulate these advancements. How does this form of AI work? When did it get to the point where it can be used in everyday life? Are there regulations around the world for it? Does New Zealand?
If you couldn’t tell, that paragraph was written by an AI. Given a small prompt and a handful of keywords, it was able to write that in a matter of milliseconds. AI has gotten to the point where it can be borderline real without much input from a human. In the words of one professor, “Believe me, it’s better than some of what my first-years send across my desk.”
Specific, writing-capable AI are already widespread online, even if you don’t notice them. The predictive text in your messenger app is an AI, as well as a lot of blog posts and social media advertisements. Many news organisations have used AI to write various different types of articles, including the Huffington Post, the L.A. Times, and The New Yorker. This system is called Automated Journalism, and it has become incredibly difficult to identify, even with a trained eye. It can do the same job cheaper, faster, and with less prejudice than a human, or at least that was the argument that an AI came up with to justify why Critic Te Ārohi should fire its human staff and subscribe to an Automated Journalism service.
There are many different text-generating AIs. One of the best ones uses Generative Pre-trained Transformer 3 (or GPT-3 for short) which basically reads a lot of reference material and then tries to mimic what it’s read (technically, it’s “an autoregressive language model that can create realistic human-like text using deep learning technology”). Developed by Musk-affiliated OpenAI and released in 2020, GPT-3 has four different model versions, with each one performing differently and set at a different price point. Each one can improve on or completely rewrite essays, paragraphs, prompts, emails or literally any other form of text you could think of. It can tweet, it can report, it can advertise. It’s shiny and scary on the surface, but it has plenty of flaws. It can’t really make jokes, it can’t cite articles, and it’s pretty useless with te reo.
In fact, the one consistent theme in the story of AI-generated text is that the people on the outside are wowed while the people actually making the AI remain highly dubious. Such was the case with ELIZA, an early chatbot AI that ran from 1964 to 1966. Joseph Weizenbaum, the creator, said that “I had not realised ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people,” but also went on to say, in 2010, that the only people who called ELIZA a “sensation” were the ones who fundamentally misunderstood its potential.
Today, ELIZA has been joined by several digital comrades. They have different styles of coding, but each approach uses deep learning at its fundamental core. Deep learning is similar to the way humans learn; it discovers different links and structures in the data it's fed as reference material and forms a layered network to gain a “deep” understanding of the data. Each layer within the network could correspond with a different aspect of the received data, such as emotion or tone. This all builds up to the point where, in our example, the AI is able to imitate a human’s writing almost flawlessly.
Absolutely nothing is stopping you from using AI to fully complete your assignments, or even just parts of them, so long as you don’t get caught. In fact, many students already are - even if they don’t realise it. Grammarly is a favoured and respected piece of technology that helps all kinds of people complete their tasks, including students and their assessments. Grammarly doesn’t write text for you, but takes what you have written and improves on it slightly in a similar style and tone to help boost the piece's fluidity, engagement and general quality. It also checks grammar, punctuation, and spelling too. Grammarly is an AI, but it doesn’t feel like cheating in the way that a complete-creation AI does.
Otago Uni currently has in place academic integrity policies that attempt to manage the ways assessments are completed, and maintain a standard that is accepted by the University. These policies may stop students from actively plagiarising, however, none of the policies directly mention the usage of AI. Nonetheless, Professor Helen Nicholson, Deputy Vice-Chancellor (Academic), said that AI-generated text would be regarded as cheating, “as we expect all work to be a student’s own.” Grammarly was a different story, as it “can only work with material which has already been written. AI text generators produce material for you, which is why their use is regarded as misconduct in an academic setting.”
Some may argue that using an AI to directly complete a whole assessment is plagiarism, but most of the writing completed by AI is unique, so it does not actually get picked up by plagiarism software. It is not copy/pasting. One professor said that “you won’t get dinged for plagiarism unless you were unlucky enough to have the AI generate something that just so happens to exist elsewhere - but you could get that unlucky just by writing it yourself. That already happens.” The professor also asked us to “please not give any of my students any ideas”. Oops.
But a new AI, designed to recognize writing produced by other AI, is on the way. It’s a sort of AI-writing arms race. A tool called the Giant Language Model Test Room (GLTR) has been developed by Harvard University and MIT-IBM, and is used to identify text that has been generated by an AI. When writing something, a human is able to intuitively know which word comes next in a sentence. An AI, however, strategically places words after one another in line with the data it has taken in from other writing sources. This means that an AI sentence would would follow a specific pattern, and thus be more predictable. Although it’s hard to spot, GLTR can attempt to identify these statistically-placed words and highlight them based on how likely they are to appear given the algorithm’s source code. Without GLTR, only half of the presented texts were identifiable as written by AI. With GLTR, 72% of the texts were identifiable. This technology was developed to help identify “fake news, bogus reviews, and phoney social accounts”, however, it could definitely be adapted to help universities deal with the inevitable rise in the usage of AI by students. This rise comes with ethical concerns about the value of a uni degree, which Helen echoed, saying “We all want an Otago degree to have value and mana. Every student who tries to cheat the process of gaining their degree is undermining their own degree as well as everyone else’s.”
To test how good the AI is, we actually had it design all of the interview questions for this article, including the ones sent to Helen Nicholson, who said “We also believe AI text generators are currently not sophisticated enough to produce work to the standard and style expected at a university level.” When we told her where they came from, she said “I’m not surprised the questions were generated by AI, but there is quite a bit of difference between questions from a reporter and a university assignment, so analysing their origin was not a priority.” Fair enough.
We also gave this set of questions to Lech Syzmanski, a Computer Science lecturer focusing on mainly machine learning, as well as other Otago University students. The interviews went smoothly, but only Lech immediately picked up on the possibility that the questions were synthetically generated: “Yeah,” he said, “not surprising.” Lech said that the danger “is that now, [there’s] a tool in our hands and there's no checks or balances if we’re to misuse it.” He said that his concerns were less about the authenticity of the writing, and more about if using an AI is detrimental to his students’ learning. “In some cases it could be useful, like if it’s someone whose first language is not English, or somebody who’s really good technically, but a little weaker in writing… in principle, I think it could be useful, but it's hard to see how at this point.”
The pervasive theme throughout our interviews is that nobody knows what the hell is going on. Policy has lagged behind programming, and we’re still in the very early days of AI-generated content. When we asked Lech what the Uni could do to combat AI-generated answers, Lech said that at this point in time, “ I have no idea… we are sort of in the dark, we don't know what can be done [but] we can't ignore that this thing is out there.”
When we interviewed Professor Colin Gavaghan, in the law department, he said that he’d been playing around with this software himself. The question, in his eyes, had less to do with how we should police these activities, and more to do with how we might need to reshape our understanding of education. He fed the same AI one of his exam questions, thinking that while the AI might be able to master grammar and factoids, it wouldn’t be able to make a complex argument. “And I was quite alarmed when I read it,” he said, “[because] it was about as good as some of the ones I'm gonna actually be marking.”
Colin had debated with some of his mates that having students memorise tomes of law jargon was a waste of time, “because machines can already beat us at that, hands down.” What we needed to be doing, argued Colin, “is focusing on the skill of actually using it and applying it. And then you look at [the results it gave me] and you think: is even that now in the firing line?” Colin said that this concern is what he wanted to take to his own students, to ask “if an AI can generate that kind of text now, and it will get better in the future, is that even what we should be assessing students on for the future?”
AI writing has improved by leaps and bounds over the last few years, and Colin agreed that it will only continue to get better. It can already beat you at memorization, and recently it’s even started to outpace some weaker writers in terms of crafting an argument. It’s impossible to detect half of the time, even to a trained eye, and it flies under the radar of modern plagiarism software. “As educators,” said Colin, “we're gonna have to think about what we make of that, because if this is a skill that machines will be doing in 15 years, why are we bothering about testing our students on it now? Should we be looking for something else? And if so, what? I mean, it's a massive, massive concern.”