cross-posted from: https://lemmy.world/post/1330512

Below are direct quotes from the filings.

OpenAI

As noted in Paragraph 32, supra, the OpenAI Books2 dataset can be estimated to contain about 294,000 titles. The only “internet-based books corpora” that have ever offered that much material are notorious “shadow library” websites like Library Genesis (aka LibGen), Z-Library (aka B-4ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community: for instance, an AI training dataset published in December 2020 by EleutherAI called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000 books. On information and belief, the OpenAI Books2 dataset includes books copied from these “shadow libraries,” because those are the most sources of trainable books most similar in nature and size to OpenAI’s description of Books2.

Meta

Bibliotik is one of a number of notorious “shadow library” websites that also includes Library Genesis (aka LibGen), Z-Library (aka B-ok), and Sci-Hub. The books and other materials aggregated by these websites have also been available in bulk via torrent systems. These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host. For that reason, these shadow libraries are also flagrantly illegal.

This article from Ars Tecnica covers a few more details. Filings are viewable at the law firm’s site here.

@[email protected]
link
fedilink
English
14
edit-2
2Y

Absolutely peeved that according to laws: Libraries in a digital format literally cannot exist without being illegal. Archive.org only managed to exist as a Library because they enforced DRM which limited available rentals to the books they “bought” and had copies of.

This is because physical Libraries allow you to borrow their own copies, thus you can even read copyrighted material without asking for permission from the rights holder. So they could argue in court that the DRM only emulated the real thing.

Come COVID and they decide to be nice to people by temporarily stripping the rental bullocks. Their reward for a good deed is a sledgehammer to the stomach.

It matters not, books shall be, and remain forever free (For those that need them). One way or another. All I know is that I’ll never buy a book if I’m treated as a criminal.

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
[email protected]
Create a post
⚓ A community devoted to in-depth debate on topics concerning digital piracy, ethical problems, and legal advancements.

𝗣𝗜𝗥𝗔𝗖𝗬 𝗜𝗦 𝗘𝗧𝗛𝗜𝗖𝗔𝗟!


Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don’t request invites, trade, sell, or self-promote

3. Don’t request or link to specific pirated titles

4. Don’t be repetitious, spam, harass others, or submit low-quality posts

5. Don’t post questions already answered. READ THE WIKI


Image


Loot, Pillage, & Plunder


💰 Please help cover server costs.


  • 1 user online
  • 193 users / day
  • 35 users / week
  • 201 users / month
  • 803 users / 6 months
  • 0 subscribers
  • 530 Posts
  • 9.76K Comments
  • Modlog