ChatGPT Is a Blurry JPEG of the Web

Process

Status Items
Output None
Questions None
Claims None
Highlights Done See section below

	Status	Items
Output	None
Questions	None
Claims	None
Highlights	Done	See section below

Highlights

id592524387

Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation.

🔗 View Highlight

id592525371

But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable. You’re still looking at a blurry JPEG, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.

🔗 View Highlight

id592528917

This analogy makes even more sense when we remember that a common technique used by lossy compression algorithms is interpolation—that is, estimating what’s missing by looking at what’s on either side of the gap. When an image program is displaying a photo and has to reconstruct a pixel that was lost during the compression process, it looks at the nearby pixels and calculates the average.

✏️ ChatGPT takes two points in “lexical space” and fills in what would occupy the location between them (e.g. tell me about world history as if you were a pirate) 🔗 View Highlight

id592533777

Large-language models identify statistical regularities in text. Any analysis of the text of the Web will reveal that phrases like “supply is low” often appear in close proximity to phrases like “prices rise.”

✏️ It’s an advanced form of text completion. It sees correlations of things that exist repeatedly together throughout the internet, so that when you ask about one thing, it’ll bring up the other stuff that’s related. Is that understanding or is that just statistics? 🔗 View Highlight

id592534290

GPT-3’s statistical analysis of examples of arithmetic enables it to produce a superficial approximation of the real thing, but no more than that.

✏️ Another example of statistical correlation but not understanding. It doesn’t know basic math skills, but it can replicate whatever it finds, failing elsewhere. 🔗 View Highlight

id592534456

In human students, rote memorization isn’t an indicator of genuine learning, so ChatGPT’s inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something. When we’re dealing with sequences of words, lossy compression looks smarter than lossless compression.

✏️ The “understanding” we see from ChatGPT giving us essays is an illusion of it rephrasing material due to its lossy compression, triggering in us a comparison to how students rephrase things as well. Still, it’s all an illusion. 🔗 View Highlight

id592534995

if a model starts generating text so good that it can be used to train new models, then that should give us confidence in the quality of that text. (I suspect that such an outcome would require a major breakthrough in the techniques used to build these models.) If and when we start seeing models producing output that’s as good as their input, then the analogy of lossy compression will no longer be applicable.

✏️ This is when we know that we can better trust the output of these models. 🔗 View Highlight

id592535751

Your first draft isn’t an unoriginal idea expressed clearly; it’s an original idea expressed poorly, and it is accompanied by your amorphous dissatisfaction, your awareness of the distance between what it says and what you want it to say. That’s what directs you during rewriting, and that’s one of the things lacking when you start with text generated by an A.I.

✏️ Making an argument against using AI drafts as a starting off point. Not only are you depriving yourself of learning how to express ideas in the first place, you’re starting off with being given copies of ideas instead of whatever original thoughts you’re having. 🔗 View Highlight

Cabinets of Wisdom

RECENT OUTPUT

About Me

RIP Polygon and Giant Bomb

Disclaimers

Digital Gardening

RECENT HIGHLIGHTS

Bigger Shareholder Payouts Are Driving Up Corporate Profits

The Moms Who Caught Me When the Safety Net Failed

Before Luigi Mangione, There Was Gaetano Bresci

The Online Scam Industry Is Capitalism Built on Slave Labor

ChatGPT Is a Blurry JPEG of the Web | The New Yorker

ℹ️Properties

Highlights

id592524387

id592525371

id592528917

id592533777

id592534290

id592534456

id592534995

id592535751

Table of Contents