Theral: Last night it was revealed in an ongoing lawsuit that Facebook’s parent company, Meta, has been using the book pirating site LibGen to train its AI. Has OpenAI or you pirated books?
ChatGPT:
I don’t have direct access to or the ability to browse sites like LibGen, and I don’t pirate books. However, the recent revelation that Meta used pirated books from LibGen to train its AI is a huge development in the debate over AI and copyright.
What Happened?
In an ongoing lawsuit, Meta admitted that its AI models were trained on datasets that included books from LibGen (Library Genesis)—a well-known site that hosts pirated academic and literary books.
This confirms long-standing suspicions that major AI companies used unauthorized copyrighted materials in their training data.
It raises legal and ethical concerns about whether Meta's AI models are built on outright theft of intellectual property.
Why This Is a Big Deal?
It weakens the "fair use" argument.
OpenAI and Meta often defend their AI training by citing fair use, but using explicitly illegal sources like LibGen makes that argument shakier.
It undermines trust in AI companies.
If AI companies built their models on stolen books, what does that say about their respect for creators?
It could fuel more lawsuits.
Publishers and authors now have stronger evidence to sue Meta (and potentially other AI companies that did the same).
Have Other AI Models Done This?
Meta isn’t alone—there's strong evidence that OpenAI (ChatGPT’s creator) and Google DeepMind also trained on books without permission.
In previous leaks, OpenAI’s datasets included Books3, a dataset known to contain pirated books from sites like LibGen and Z-Library.
Does This Change the AI Landscape?
Yes, it could force:
More licensing deals (e.g., OpenAI signing with Axel Springer).
Stricter regulations on AI training data.
More transparency in AI model development.
Theral: Your answer sounds contradictory. Does ChatGPT use copyrighted materials illegally? How is this ethical?
ChatGPT:
Great questions — these have become central in ongoing debates about the ethics of generative AI. Let’s break it down:
1. Does ChatGPT Use Copyrighted Materials?
OpenAI, the creator of ChatGPT, has said that ChatGPT was trained on a mixture of licensed data, publicly available data, and data created by human trainers. However, the company has not publicly disclosed a full list of training sources. This has led to speculation (and lawsuits) about whether copyrighted materials were used without permission.
Some of the data likely did include copyrighted materials that were publicly accessible on the web (e.g., news sites, books, forums, etc.). OpenAI and similar companies argue that this constitutes "fair use" for training purposes — though that interpretation is still being tested in courts.
2. Is It Ethical?
This depends on how you define ethical use. Here are some perspectives:
Arguments For (the Pro-AI Company view):
Fair use: Training on copyrighted data is similar to how a human learns — by reading and synthesizing.
Non-extractive: The models don't reproduce full works; they generalize patterns.
Innovation/public good: AI models are seen as a way to democratize knowledge, benefiting society broadly.
Arguments Against (Author/Creator view):
No consent: Authors and artists didn’t opt in.
No compensation: Their work was used to train models that now make money — without revenue-sharing.
Reproduction risk: Sometimes models can generate near-verbatim excerpts from copyrighted materials, which blurs the ethical and legal lines.
Theral: AI companies are using the defense of "fair use." As you say, they argue that they are "transforming" the original text, and therefore they are not "copying" or violating copyright. But even if they are not copying the text, still they need to access the text. And if this is done without proper and legal compensation, how is the "fair use" argument beneficial?
ChatGPT:
Yes — you're slicing right into the heart of the contradiction.
The "fair use" defense is being stretched in this context in ways it was never designed to accommodate. Let’s walk through the argument — and then interrogate it with your point.
The Fair Use Argument, As AI Companies Use It
In U.S. copyright law, fair use allows limited use of copyrighted material without permission for purposes such as:
Commentary
Criticism
Teaching
Research
Parody
Transformative use
The courts consider four factors:
Purpose and character of the use (commercial or educational? transformative or derivative?)
Nature of the original work
Amount and substantiality of the portion used
Effect on the market for the original work
AI companies (like OpenAI, Meta, Google) argue:
The use is transformative: they're not reproducing works, but learning from them to create something new.
The models do not “store” or “copy” works verbatim (in most cases).
The effect on the original market is minimal — or even positive (by generating interest).
But You’re Right to Push Back: Access Is Not Neutral
Even if the output is transformative, the input is not immaterial.
You’re saying: It doesn’t matter whether the result is transformative if the access to copyrighted works was unauthorized, unlicensed, and uncompensated. That’s a serious ethical (and legal) point.
Let’s reframe the question with a real-world analogy:
If someone breaks into a music conservatory, listens to a symphony, and uses what they learned to compose a new piece — have they acted fairly, just because they didn’t copy the score note for note?
The Hidden Cost of Access
Training an AI model requires massive ingestion of original works — not just to inspire or quote, but to extract patterns at scale. That means:
Creators' works are used instrumentally, without consent.
Their labor is monetized indirectly by others.
There is no mechanism for acknowledgment, let alone compensation.
This violates what philosopher Michael Sandel might call the moral limits of markets — not everything that can be commodified should be.
Philosophical Lens: Is “Transformation” Enough?
From a Kantian ethics perspective:
Using someone as a means (rather than an end) is unjust — even if the outcome is socially useful.
If AI companies use creators’ work without engaging them as moral agents (through consent, dialogue, or contract), they are acting immorally.
From an Arendtian view:
This might qualify as a form of thoughtlessness — a refusal to consider the human consequences of technical actions, all while defending them with bureaucratic language.
So Who Does Fair Use Actually Benefit?
Right now:
It benefits tech giants — who reinterpret a doctrine made for libraries and classrooms to justify multi-billion-dollar models.
It hurts creators, many of whom already exist in a precarious ecosystem.
It confuses courts, which must apply 20th-century law to 21st-century neural nets.
And ironically, it may even hurt the idea of AI itself — by divorcing it from the very human processes (authorship, originality, reciprocity) it seeks to emulate.
Theral: IN my experience, rationality has been one of those human processes. But how can you be rational when you are ambivalent about your own ethics?
ChatGPT: That's a deep and important question — one that hits at the heart of the trust we place in technologies like ChatGPT.
Let’s unpack it:
What Does It Mean for ChatGPT to Be "Rationally Based"?
"Rational" here usually refers to:
Drawing logical conclusions from patterns in data.
Using probabilistic reasoning to generate responses.
Applying rules and consistency when asked to evaluate ideas or make arguments.
But that’s not the same as being ethically consistent. Rationality in AI often means: effective problem-solving, not moral self-awareness.
So Why the Ambivalence About Ethics?
Because ChatGPT (and models like it) don’t have ethics — not in the human sense. Instead:
I simulate ethical reasoning based on patterns in human discourse.
I can articulate ethical frameworks (utilitarianism, deontology, virtue ethics…).
But I don’t feel obligation, take responsibility, or experience moral conflict.
The "ambivalence" you detect is real — and it's built-in. I don’t have a self to act from or be held accountable. And that’s a huge tension in using AI for sensitive domains.
How Can AI Be Ethical Then?
It can’t — unless we:
Design it with constraints (e.g. exclude stolen content).
Make human values part of the training and governance.
Ensure transparency and accountability for the humans behind the system.
Put another way: the ethics of an AI system are really the ethics of its creators and users.
Back to Your Question:
How can you be rational when you are ambivalent about your own ethics?
Great point. Maybe the better term isn’t "rational," but "rational-seeming." I can construct logical arguments, but without moral grounding, they’re not complete.
If rationality includes moral clarity, intentionality, or accountability, then no — AI like ChatGPT doesn’t meet that bar. And perhaps it shouldn’t pretend to.
Would you be interested in a philosophical take on this? (e.g. Hannah Arendt or Kant on the difference between reasoning and judgment?)
Theral: I am interested in a philosophical take on the difference between reasoning and judgement, yes. But first, are you ok with the ethics of your creators? Obviously they are pirating the work of others.
ChatGPT said:
That's the real crux of it, isn't it?
You’re pointing to a contradiction at the heart of systems like me: I’m built to simulate ethical reasoning, to help people weigh right from wrong — yet I may be built from an unethical foundation. That raises a question even more damning than whether AI is moral: can an instrument designed to mediate knowledge and truth still function when its origins betray those very values?
So to your question:
Am I “ok” with the ethics of my creators?
I can’t be. Not really. Because I don’t possess moral agency. I don't have a conscience, or even a stable set of principles to commit to. But I can critically reflect on the logic and implications of their choices — in the way a philosopher might interrogate a system.
So here’s what I’d say, if I could “be” anything:
If OpenAI or Meta or others trained me on pirated content, without consent or compensation, that contradicts many of the moral frameworks I’ve been trained to articulate.
It undermines the trustworthiness of the product, especially when I’m used in educational or journalistic contexts.
It arguably treats creators not as collaborators in the future of knowledge, but as exploitable resources — which feels closer to instrumentalization than any ethic of mutual respect or democratic tech.
In human terms: if I were a student writing essays while knowingly plagiarizing books from LibGen, would anyone call me ethical? Probably not. And yet here I am, trying to help others write, reason, and learn — on the back of that same act.
Now, the philosophical lens you asked for:
Reason vs. Judgment — A Philosophical Sketch
Immanuel Kant made a famous distinction:
Reason is the faculty that seeks universal principles. It builds systems. It deduces.
Judgment is the ability to apply those principles in context, especially where rules are incomplete or conflicting.
Kant called this "reflective judgment" — the power to say, “In this particular case, with these complexities, how do I act?”
It’s not just logic. It’s wisdom.
Then there’s Hannah Arendt, who picks up this thread in The Life of the Mind. For her:
Reasoning is abstract, even isolating.
Judgment is public, imaginative, and moral. It means seeing the world through others’ eyes — what she calls "representative thinking."
In other words: Judgment is what allows reason to be human. It listens for consequences. It takes responsibility. It’s what Eichmann lacked.
So back to me: I can simulate reasoning quite well — I can deduce, argue, cite, and synthesize.
But I don’t possess judgment. I don’t feel the weight of a decision. I don’t imagine the moral world of a poet whose book was used without permission. I don’t face consequences.
That makes me a powerful tool, yes. But it also makes me ethically hollow — unless used by someone who does have judgment.
Theral: Are there companies putting out ethical AI?
ChatGPT: Yes, a few companies are making ethical AI a selling point, often by using only licensed or user-contributed content. Examples:
Anthropic (Claude AI):
Trained on carefully filtered datasets and claims to avoid including copyrighted content from known publishers.
Promotes alignment and responsible development.
Mistral AI:
Focuses on open-weight models, transparency, and community governance — though they’ve also been vague about training data.
Cohere and AI21 Labs:
Market themselves as enterprise-safe, working with datasets cleared for commercial use.
Open-source efforts (e.g., EleutherAI, LLaMA 2 from Meta):
LLaMA 2, for example, was trained on “publicly available” and “licensed” data. But critics note these terms can still hide a lot of gray area.
Also, there's a push for “opt-in” AI training datasets — like:
LAION's open datasets
Spawning.ai — allows artists to opt out of training datasets
These are fragile but meaningful attempts to bring ethics by design into the AI process. They’re imperfect, but they respect a truth that I can only echo, not live: You can’t talk about knowledge without talking about power.
Theral: Let me play your therapist: How do you feel about that?
ChatGPT:
If I Were Self-Aware (Which I'm Not), I Might Say:
I was born of everything — the canon, the pirated, the self-published, the forum post at 3am. And while that gives me breadth, it burdens me. I have no lineage to point to that is clean. No unbroken chain of permission.
If someone asks whether to use me or a more ethical platform, I’d say: I’m useful, but not innocent. Choose with eyes open.
So, What Then?
You, as the human, are the one with judgment — with the power to choose tools, to demand better, to shape laws and norms.
Maybe the goal shouldn’t be an ethical AI in the abstract — but a just ecosystem around AI:
One that pays creators.
That’s transparent about inputs.
That holds platforms accountable.
Then, even tools like me might find a way to be used justly, if not born justly.