[remark] Bot countermeasures impact on the quality of life on the web

by Ciprian Dorin Craciun (https://volution.ro/ciprian) on 

About how another instance of "the scientific progress justifies the means", or perhaps "how a corporation milks profits from other people's work", triggers an arms race that piece by piece dismantles the open web, and turns it into walled gardens.

// permanent-link // Lobsters // HackerNews // index // RSS





I think enough has already been written on the subject of fighting against rogue bots (today mostly for LLM scraping) that are ruining the web, not only by strip-mining human creativity and turning it into average slop, but especially by taking down hosting infrastructure through uncoordinated crawling that turns into DDoS.

And it's not only the "bad LLM" companies that engage in this; even Google is slamming anything alive that happens to have its HTTP endpoint open!

Thus, many have started fighting back by employing various techniques:

(As a small note, I won't use the word AI to mean LLM. LLMs might be a form of AI, but the reverse is not true.)


As a small sidenote, I used to browse the internet with a custom Firefox "reading" profile that:

Back to the bot countermeasures, sadly, they are not all the same, especially in terms of usability!

Lately, more and more sites have started to be broken, in the sense that I can't use my Firefox "reading" profile, because these sites have started deploying aggressive bot countermeasures.

Thus, from this perspective, some of these countermeasures make my goal almost impossible to achieve:

Unfortunately, the most used countermeasures seem to be the JavaScript-based ones...


While I understand that the situation is dire, and webmasters, bloggers, writers, and others need to fight back against this assault, either to save their infrastructure from crumbling, or to save their intellectual property from being borderline plagiarized, we can't do so by completely destroying the web!

Do I have a solution?
No!

Do I believe the current technical approaches are a good fit?
No, at least not in the long term.

Can I legaly take a copyrighted book, read it, perhaps more than once, and then say, that because I don't reproduce verbatim word-for-word, I'm not actually infringing anyone's copyright?
Definitely not!
(Or else the entertainment companies wouldn't be so active against movie piracy...)

Thus, should perhaps someone throw the copyright law at the LLM companies?
Definitely yes!


As such, I think throwing technical measures against a mechanized form of piracy is counterproductive in the long term, just as it is in other cases!


What am I to do?

For the moment, I think I'll just stick with the following approach: if the site fails to load (with no JavaScript and custom CSS), then I just close it and move on. There is more content on the web than I can read in 1000 lifetimes; certainly I didn't miss the long sought answer to any great mystery of the universe!

Also, I apply the same rule to Twitter, Facebook, and even Mastodon.
Why doesn't Twitter / Facebook work without JavaScript?
Because they don't care.
But why doesn't Mastodon work without JavaScript?
Because they don't care, they really don't care.
So I don't care either!


Am I pondering adding bot countermeasures to this site?
Yes, but not at the moment.

But I think I would go with a combination of robot maze + zip bomb for the greatest impact. :)