Counties 2024 (AI/BossAI) Flashcards
Let’s talk about A.I. writing software more generally. How exactly do those work?
- The AI language model works by being fed a large dataset of information and identifying patterns within them.
- Based on those patterns, it generates “tokens” which in this case would be words based on what is the most statistically probable word to be used within documents that AI analyzed.
- For example, let’s say within 90% of documents analyzed by an AI software through its training parameters, every sentence starts with the word “I” AI will generate the word “I.” And if within, say, 85% of the documents analyzed, the next word to follow that is “believe,” AI will likely generate the word “believe.”
Is there a simpler way you can explain that?
Absolutely, imagine the AI is a person who’s learning to write. And in order to do that, the person reads a ton of books, documents, periodicals, whatever. Then from all the things the person’s read, they write something based on how they’ve learned people write. That’s pretty much how a language model works. But while this synthesis of writing styles happens naturally in a person’s head, it works a bit differently with an AI. Granted, this is also a very simplified view on how AI works as there are far more complex parameters it relies on.
Could you explain how it’s different?
When a person goes to write a sentence, they have an idea and then express it through words. But when something like ChatGPT goes to write, it’s instead choosing the next word to write by which word would be statistically more probable to appear taking the prompt you give the AI into account. Because of that, an AI text generation tool often produces very formulaic text.
How does an AI detection software like BossAI work?
It’s like the AI language model but in reverse. We’ve trained BossAI on a massive dataset just like a text generation tool. Instead of predicting and generating the most likely next word, it determines based on a word or sequence of words whether the word that follows is the most likely to be chosen by an AI. Since it’s trained on a large dataset just like a AI text generation tool, BossAI is basically seeing if it agrees with the next word. And it does that with the entire text. The more BossAI “agrees”, the higher the likelihood of it being AI generated.
Let’s talk about the dataset BossAI was trained on. How big is it?
Like I said, massive. Thanks to the resources our company has available, we were able to train BossAI on about 500 gigabytes of text data and with about 200 million parameters.
How does that compare to Cameron Grey’s AI, GPTZero?
It’s significantly larger. Like I said, we’re fortunate to have trained BossAI on a massive dataset because of the resources our company can provide as well as the personal and professional connections I have with leading AI researchers, institutes, and universities. It’s something that’s not possible with a smaller tech firm.
How does that difference in the data BossAI and GPTZero affect their results?
Well, it increases the accuracy of standards. Commercially widespread AI text generation tools like are trained on truly massive sets of data. Numbers vary, but the latest iteration of ChatGPT, for example, was trained on anywhere from 570 gigabytes to 45 terabytes of data. Thus, the more data an AI detection model is trained on, the closer its predictions are to those of text generation tools. Practically, that means that the more data a detection model is trained on, the more accurate its standards are, and the better its results.
Are there any other results of differing sizes of datasets?
Yes, it also lowers the margin of error of the detection software. Compared to GPTZero’s 10%, BossAI only has a margin of error of 5%. The margin of error is the chance the AI will incorrectly tag something as AI-generated.
What’s the distinction between the accuracy you mentioned before and margin of error?
Here’s the simple version: a larger dataset does two things: first, it makes the detection tool’s standards for AI detection better—that’s accuracy of standards. Second, it decreases the chance that the AI will incorrectly tag something under those standards—that’s margin of error.
So given that both of these improve with a larger dataset, which of the two AI detection tools produces more accurate results?
Although no AI detection tool is perfect, BossAI is much more accurate.
What did you get when you fed BossAI the Alex Ross’s essay?
It received an AI score of 47%. For comparison, the plaintiff’s 2017 essay, which was written before the plaintiff would’ve been able to use commercially available AI text generation tools, received a score of only 12%.
How did it get a score of 12% if the plaintiff couldn’t’ve used AI?
The plaintiff’s simpler and more formulaic writing style from back then likely caused parts of it to be flagged by our detection software.
Does that mean you would’ve expected a score similar to 12% for the 2023 essay if it was written without AI?
No, I would’ve expected a lower score as the plaintiff grew older and his writing grew more complex.
What, then, did you ultimately conclude with regards to the use of AI in the plaintiff’s 2023 essay?
I concluded that the plaintiff’s essay was written at least partially by an AI text generation software. Specifically, I concluded—taking margin of error into account—that at least 42% of the essay was written by an AI.