AI or Not AI: The Controversial Truth Behind Detection Tools for AI-Generated Text

AI-detectors are proliferating, and they are mostly used to determine whether a document was created using AI tools.

So I did some research and tried to figure out how these tools function, specifically how they determine whether a piece of text was generated by an AI chatbot.

I won’t say which AI detector(s) I employed for this study because… Well, it goes unsaid, doesn’t it?

I’ve tested numerous tools. What intrigued me was the inconsistency in the results. For instance, the same tool that identified text X, which was completely AI-generated, as 100% AI-generated, would then suggest that text Y, also created by AI, wasn’t AI-generated at all. Despite employing the same chatbot for both texts — which covered different subjects, one about tomatoes and one about cars — the outcomes varied significantly.

For this particular experiment I settled on using just three AI-detectors.

To begin with, I pasted the first statement of this article [i.e.: AI-detectors are proliferating … it goes unsaid, doesn’t it?] into these tools to test their reliability.

Rest assured, I wrote the content myself, not a single word was AI-generated.

The outcomes were as follows:

Tool #1 detected that 13% of the content was AI-generated.

Tool #2 detected that 49% of the content was AI-generated.

Tool #3 detected that 27% of the content was AI-generated.

I was completely puzzled.

Then, I resorted to a paraphrasing tool on the same text.

Here is the paraphrased text:

AI detectors are growing, and they are mostly used to determine whether a document was written using AI algorithms.

So I did some study to find out how these tools work, particularly how they assess whether a piece of text was written by an AI chatbot.

I won’t reveal whatever AI detector(s) I used for this investigation because, well, it goes unspoken, doesn’t it?

Once again, I copied and pasted the text into the same AI-detectors

Here’s the result:

Tool #1 detected 2% of the paraphrased text was AI-generated.

Tool #2 detected 6% of the paraphrased text was AI-generated.

Tool #3 detected 0% of the paraphrased text was AI-generated.

These results left me even more incredulous. The observed decrease in AI-generated content percentages was misleading. Using a paraphrasing tool essentially means relying on machine-generated language, which indirectly implies that the creation process is not entirely human-driven.

What do you mean? Are you suggesting that text I’ve written is deemed AI-generated, whereas the paraphrased text, essentially produced by a machine, is regarded as more “human-like?”

Wow!

Continuing my experiment, I requested an AI chatbot to provide a brief overview of the “transport system.” Thus, this is entirely AI-generated.

Here’s the reply:

The transport system is the lifeblood of modern society, seamlessly connecting people and goods across vast distances. From bustling urban centres to remote rural areas, it facilitates economic growth, social interaction, and cultural exchange. Whether by road, rail, air, or sea, transport enables the movement of people to work, school, and leisure activities. It delivers essential goods to our doorsteps, sustains global trade, and supports industries worldwide. Yet, challenges such as congestion, emissions, and infrastructure maintenance loom large. As we embrace innovation and sustainability, the transport system continues to evolve, shaping the way we live, work, and travel.

Tool #1 detected that 100% of the content was AI-generated. — Bravo!!

Tool #2 detected that 0% of the content was AI-generated. — God help us!!

Tool #3 detected that 1% of the content was AI-generated. — May divine assistance be with us!

See the results mentioned above and observe the inconsistency. Tool #2 indicating that 49% of my personally written content was AI-generated now states that 0% of the content is created by AI. Tool #3, which originally assessed 27% of my content as AI-generated, now replies that only 1% is generated by AI, despite the fact that in reality, it is 100% AI-generated.

Although it’s been some time since I last coded software, my entire career has been in the IT industry. As a software analyst and business engineer, I’ve managed software projects for the past 35 years. Of course, once an IT engineer, always an IT engineer. Therefore, it comes as no surprise to me that no two pieces of software are made equal. Each has its strengths and weaknesses.

So, what does an AI-detector rely on to determine whether a text was written by a chatbot or not?

In my pursuit of knowledge, delving into research and engaging in talks with some coders — who are unwilling to give their secrets and algorithms even under torture — I’ve come to know that these tools base their algorithms on:

Stylometric Analysis

This method examines writing styles to detect distinctive patterns. Texts produced by AI often show stylistic traits not typically found in human writing, like repetitive phrases or a diminished expression of complex emotions.

This strikes me as somewhat odd, to say the least. Even humans may repeat phrases and, regarding the expression of complex emotions, well, what exactly does that mean? Are we certain this is a criterion that can be realistically taken into account? I mean, consider Shakespeare and myself. I could never convey emotions with the same depth as the Bard, if you get what I mean, and, of course, you do! Of all, the machine lacks the human being’s creativity, let alone that of the genius Shakespeare! But does a distinctly “human” style of writing truly exist? Is there indeed a clear, identifiable characteristic that sets it apart? This is unrelated to talent.

Consistency and Coherence

Texts generated by AI may show a lack of uniformity or logical flow across extended sections. This results in inconsistencies or abrupt style changes.

This may be, so to speak, easier to detect. However, I am sceptical here as well. Can’t a human write in a rambling style? Aside from a lack of culture and skill, this can occur for reasons relating to a person’s mental state (biological or otherwise) or because that piece of text pertains to a certain character in a novel, among other things.

Pattern and Repetition

AI systems tend to repeat phrases, structures, or errors more often than humans.

I have some doubts about this as well. What I said above is true, for better or worse.

Machine Learning

Detectors may employ machine learning algorithms trained on vast datasets of content created by both AI and humans. These algorithms learn to spot the minor discrepancies in language usage, sentence construction, clarity, and more, to differentiate between the two.

This approach is more convincing. However, I think it requires significant refinement, considering the outcomes. Despite this, I must admit my complete ignorance regarding the algorithms and structure of the AI detectors I used in this experiment.

Digital Watermarking

Certain AI content creation tools insert hidden watermarks or distinct patterns into their outputs. These embedded signals enable detectors to identify content as AI-generated.

I must acknowledge that this argument holds considerable weight. However, while I can grasp its applicability to images, I find myself puzzled about how it would function with copied and pasted text. Is there anything that remains persistent, even after multiple instances of copying and pasting, just to say? Given my lack of understanding on this matter, I remain doubtful.

Error Examination

Content from AI might exhibit unique mistakes or oddities in grammar, syntax, or factual details that are unusual for human-created content. Such irregularities can be scrutinised to highlight potentially AI-generated content.

Well, among the various AI chatbots I’ve used for text generation, I’ve seldom found errors. It’s easier for me to make mistakes in writing than it is for them. What occurs if I spot some mistakes and make corrections? My evidence is limited in this area because, having used at least four different AI chatbots to generate text, I encountered no errors and thus have no feedback to offer.

All right, let’s go back to the “Transport” text that was entirely generated by the chatbot.

I made minor edits, simply using synonyms, and then resubmitted the text to the three AI-detection tools. I’ve highlighted the synonyms in bold (long, helps, permits).

The transport system is the lifeblood of modern society, seamlessly connecting people and goods across long distances. From bustling urban centres to remote rural areas, it helps economic growth, social interaction, and cultural exchange. Whether by road, rail, air, or sea, transport permits the movement of people to work, school, and leisure activities. It delivers essential goods to our doorsteps, sustains global trade, and supports industries worldwide. Yet, challenges such as congestion, emissions, and infrastructure maintenance loom large. As we embrace innovation and sustainability, the transport system continues to evolve, shaping the way we live, work, and travel.

Tool #1 detected that 100% of the content was AI-generated.

Tool #2 detected that 0% of the content was AI-generated.

Tool #3 detected that 96% of the content was AI-generated.

The most intriguing result comes from Tool #3, which suggests a 96% probability, as if it detected that those three words (“long,” “helps,” “permits”) were of human “origin” — meaning, introduced by me rather than the chatbot. I say this with a hint of sarcasm. Clearly, the percentages provided by these tools represent probabilities, indicating there is x% chance that this content was generated by AI.

Who can claim that a chatbot didn’t insert those three words?

I then rephrased myself the entire text:

Transport infrastructure serves as an essential backbone enabling the movement of people and goods within and between communities. From trains and planes connecting rural towns to big cities, to ships moving raw materials and finished products around the globe, transport allows economic activity to thrive. Without reliable roads, railways, airports, and seaports, critical sectors like manufacturing, agriculture, healthcare, and education would grind to a halt. At the same time, individuals would lose access to jobs, markets, and leisure destinations. However, coping with extreme weather, overloaded highways and airports, and relentless traffic jams underscores the scale of challenges facing the transportation sector. Maintaining and enhancing this complex, interconnected web of infrastructure is imperative for communities to continue growing in a sustainable manner.

Tool #1 detected that 100% of the content was AI-generated.

Tool #2 detected that 96% of the content was AI-generated.

Tool #3 detected that 83% of the content was AI-generated.

Well, what would you like me to say? Once again, I find myself perplexed.

I then selected an article from a highly reputable newspaper, written by a columnist whom I don’t know personally, but I would confidently wager all my money (hoping not to end up homeless, haha) that they don’t rely on AI chatbots for writing.

Tool #1 detected that 18% of the content was AI-generated.

Tool #2 detected that 3% of the content was AI-generated.

Tool #3 detected that 75% of the content was AI-generated.

I took it a step further! I requested the chatbot to generate the content of the soliloquy from Act 3, Scene 1 of William Shakespeare’s Hamlet. Clearly, the output was machine-generated, but not originally authored by the machine (you get what I mean, obviously!) — and so, I compared the AI-generated text with the text from the book in my library (The Arden Shakespeare Third Series, edited by Ann Thompson et al.). The two texts matched perfectly. Then, I copied the chatbot’s text and pasted it into the AI-detectors.

This is a part of Hamlet’s soliloquy:

To be or not to be — that is the question:

Whether ’tis nobler in the mind to suffer

The slings and arrows of outrageous fortune,

Or to take arms against a sea of troubles

And, by opposing, end them. To die, to sleep —

No more — and by a sleep to say we end

The heartache and the thousand natural shocks

That flesh is heir to — ’tis a consummation

Devoutly to be wished […]

And this is the result:

Tool #1: 15% AI-generated

Tool #2: 4% AI-generated

Tool #3: 1% AI-generated

Dear Bard, you might’ve given us a heads-up that you were dabbling with those GPT Chatbots. Eh, you cheeky blighter, but as you can see, we’ve sussed you out!

Now, let’s put aside the social, moral, legal, artistic implications, and so on. Let’s focus purely on the technique. Are we truly confident these tools are effective? Because if not, then all the underlying social, moral, legal, artistic implications, and the like, simply fall apart, sparking an entirely pointless uproar!

Check the whole article on Medium by Bob Mazzei

Stay updated with the latest AI news. Subscribe now for free email updates. We respect your privacy, do not spam, and comply with GDPR.

Bob Mazzei
Bob Mazzei

AI Consultant, IT Engineer

Articles: 90