Home
Business News
AI Detectors Are Hurting Students With False Positives: How Accurate Are They?

AI Detectors Are Hurting Students With False Positives: How Accurate Are They?

Fact Checked by Henry Stater

Ecommerce Expert

Disclosure

Last updated October 21, 2024

Someone working on a laptop

Unless you have been living under a rock, you probably know that artificial intelligence (AI) euphoria is taking over the world and the enthusiasm is not merely limited to the business and investing communities. While many are using AI for legit reasons, the technology is also being exploited to cheat in school with many students taking shortcuts for their assignments by using AI tools.

To address the worsening situation of “AI cheating” universities and companies have turned to AI detectors that claim to help detect if someone used AI for their work. Unfortunately, it’s becoming clear that these detectors are not as accurate as they may seem with a propensity for false positives and students are paying the price.

AI Detectors Can Incorrectly Flag Original Work As AI-Generated

As an experiment, Businessweek tested GPTZero and Copyleaks, two of the leading AI-detectors. It used a random sample of 500 college application essays which were submitted to Texas A&M University in the summer of 2022. Since the cut-off date was before ChatGPT was released and these essays were not part of datasets on which AI tools are trained, it virtually ensured that they were not made using AI. However, the tools flagged some of the essays as generated by AI, and in some cases, they claimed 100% certainty.

It is morally wrong to use AI detectors when they produce false positives that smear students in ways that hurt them and where they can never prove their innocence.

Do not use them. https://t.co/cSBZRHgif5 pic.twitter.com/47lFXqT4G7

— Ethan Mollick (@emollick) October 18, 2024

Such wrongful classification of legit work as AI-generated can have disastrous consequences for students, college applicants, job seekers, or even graduate students and post-docs. Also, it can lead to serious charges of plagiarism which can have far-reaching implications for people in some industries.

Incorrect flagging by these detectors can fundamentally ruin the relationship between students and their teachers as both the growing use of AI by students and incorrect flagging by AI detectors leads to constant anxiety and mistrust among them. There is simply no 100% accurate way to tell whether text was written by AI. Many teachers still don’t know how to proceed without the potential of falsely accusing students or letting them get away with cheating.

How do AI Detectors Work?

AI detectors analyze text using natural language processing and machine learning to determine if the content was written by a human or an AI tool. These use specific characteristics of the text like perplexity, embeddings, and burstiness to determine if the content was AI-generated. They don’t understand language and rely on historical data that they have been trained on to arrive at a conclusion.

Someone sent me a cold email proposing a novel project. Then I noticed it used the word "delve."

— Paul Graham (@paulg) April 7, 2024

The following are some of the commonly used tools by these detectors.

Word choice: Some words like “delve” that are not used commonly by humans appear frequently in AI-generated content and help detectors determine if the content was generated by AI or a human.
Frequency and repetition: AI-generated content usually lacks variability and has excessive repetition which helps detectors in identifying them.
Burstiness: AI-generated content typically has low burstiness – or variation in sentence structure and length – which helps AI detectors identify them.
Perplexity: Content generated by humans has a higher perplexity and uses more creative language choices as compared to AI-generated content.

Should We Rely Only on AI Detectors?

Meanwhile, Businessweek’s analysis showed that those who speak English as their second language, neurodivergent persons, and those using straightforward vocabulary were at a higher risk of their work being incorrectly flagged by AI detectors.

Notably, Stanford’s research last year showed that while AI detectors were “near-perfect” in evaluating essays written by US-born eighth-graders, they classified more than half of TOEFL (Test of English as a Foreign Language) essays written by non-native English students as AI-generated.

James Zou, a professor of biomedical data science at Stanford University said, “They (AI detectors) typically score based on a metric known as ‘perplexity,’ which correlates with the sophistication of the writing — something in which non-native speakers are naturally going to trail their U.S.-born counterparts.”

Zou also highlighted the ethical angle in these detectors disproportionately singling out non-native English speakers and said, “These numbers pose serious questions about the objectivity of AI detectors and raise the potential that foreign-born students and workers might be unfairly accused of or, worse, penalized for cheating.”

Why do AI Detectors Incorrectly Flag Human Text?

Zou and his coauthors say that AI detector tools single out non-native speakers as they tend to score lower on common perplexity measures such as lexical diversity, lexical richness, and grammatical complexity. They often simply know fewer words so it’s natural that their text would have less perplexity.

He adds that it is easy to game AI detectors by asking ChatGPT to use literary language. This would mean that while AI detectors flag some original text as AI-generated they fail to identify some text that was generated using AI tools. According to Zou, “Current detectors are clearly unreliable and easily gamed, which means we should be very cautious about using them as a solution to the AI cheating problem.

AI Detectors Should Not Be the Final Word

Relying only on AI detectors is not the ideal solution for identifying AI-generated content as these tools are far from perfect and they produce far too many false positives. According to Copyleaks co-founder and Chief Executive Officer Alon Yamin, “We’re making it very clear to the academic institutions that nothing is 100% and that it should be used to identify trends in students’ work.”

Yamin who says that his company’s technology is 99% accurate, added, “Kind of like a yellow flag for them to look into and use as an opportunity to speak to the students.” However, when you claim that your product is 99% accurate, many teachers are going to assume that it’s correct and leave it at that.

AI-detection tools should not be the final word on whether the text is written by a human or through AI writing assistance tools. Just like AI, AI detection tools are currently prone to error.

Does AI actually help students learn? A recent experiment in a high school provides a cautionary tale. @jillbarshay @hechingerreport

Kids Who Use ChatGPT as a Study Assistant Do Worse on Testshttps://t.co/NWMGfan62Z

— MindShift (@MindShiftKQED) September 4, 2024

Are AI Tools Actually Good for Students?

There is still no good solution for the AI cheating epidemic in schools across the globe and the false positive rates of AI detectors are only making it worse. While most experts believe that AI would be able to contribute positively to society, some others believe it would be a net negative.

For instance, a study by researchers at Wharton University and Pennsylvania University conducted on nearly 1,000 school math students in Turkey showed that using GenAI tools makes it tougher for kids to learn and acquire new skills – even as it helps improve their performance in the short term.

Wharton professor Hamsa Bastani who co-authored the paper said, “We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future.

Ecommerce Expert

Mohit Oberoi is a freelance ecommerce, business, and finance writer for Business2Community.com based in India. He has over 15 years of experience in writing about business, ecommerce, and financial markets, learning and improving with every article he writes. Mohit has been writing extensively on global markets for the last eight years and has written over 8,000 articles. Mohit has completed his MBA with finance as a major from ICFAI University India. He also holds a CFA charter and cleared all three levels in the first attempt only. Besides Business2Community, Mohit’s work has been published in leading online publications including MarketRealist, Economywatch, LearnBonds, and Buy Shares. He covers business, marketing, finance, and macroeconomic topics. He also loves writing on personal finance and topics related to valuation. Mohit also has experience in managing multi-asset portfolios for HNI clients. He is a news junkie and loves tracking global political and economic developments.

Show more

View all posts by Mohit Oberoi

Latest News

More

3 Men Die Following Google Maps’ Instructions: Is Google Liable?

Three people died in an accident in the Indian state…

November 29, 2024

Business News

Appeals Court Overturns OFAC’s Prohibitions on Tornado Cash

An appeals court in the United States has overturned the…

Alejandro Arrieche

November 28, 2024

Business News

FTC Finally Targets Tech Support Scams With New Rule: Will it Help?

The Federal Trade Commission (FTC) approved an important amendment to…

Alejandro Arrieche

November 28, 2024

Business News

Artists Testing OpenAI’s Sora AI Tool Leaked It Online: OpenAI Can’t Catch a Break

November 27, 2024

Business News

Here Are the Top Contenders For Trump’s SEC Chair: Will Crypto Win Out?

Alejandro Arrieche

November 27, 2024

Crypto News

Google Faces New £7 Billion Class Action Lawsuit in the UK: Is $GOOG Doomed?

Alejandro Arrieche

November 27, 2024

Business News

McDonald’s Adds New $5 Meal Deal as Inflation Persists: Will It Be Enough?

November 27, 2024

Business News