Why I am Optimistic About AI
You’ve had the experience. Someone cuts in line, takes credit for your work, breaks a promise — and something in you says that’s not right. Not “I don’t prefer that.” Not “that doesn’t maximize my utility.” Something stronger: that’s unfair. And when you say it, you assume the other person knows exactly what you mean.
That assumption is worth examining. When you argue about fairness, you are not comparing preferences. You are appealing to a standard you believe the other person has access to — a standard neither of you invented. It crosses cultures. It overrides instinct. Sometimes it tells you to do the harder thing, the costly thing, the thing your survival instincts scream against. And it arrives uninvited: conscience does not wait for your permission.
And it is not a single rule you are appealing to — it is an entire framework. Moral concepts are interdependent. Try to construct “fairness” without “truth.” Try to build “trust” without “accountability.” Try to ground “rights” without some concept of inherent worth. These are not independent modules evolution assembled piecemeal. They are a system — and systems imply structure.
That structure is the starting point of this essay. And the field building the most powerful cognitive systems in history is standing on it — without looking down.
The Unexamined Foundation
The conversation about AI and morality has no shortage of serious positions. Anthropic builds Constitutional AI — a set of principles models are trained to follow. OpenAI talks about collective alignment and democratic input. Contractualists invoke Rawls. Game theorists point to emergent cooperation. Safety minimalists want kill switches and containment.
These are all reasonable, and I am not here to dismiss them. I am here to ask a question none of them answer: why do any of them work?
Why does RLHF — training a model on human feedback — improve moral reasoning? Why do humans converge in deliberative processes rather than deadlocking? Why does cooperation emerge from game theory rather than pure defection? Why did the Anthropic team find it obvious that a model should not help build bioweapons?
The pragmatist says: it works, why question it? But “it works” is a description, not an explanation. And descriptions that refuse to examine their own foundations have a shelf life.
Consider Anthropic. They say they do not need to solve metaethics — and then they build a system grounded in explicit moral principles. They decided certain behaviors are wrong, not merely unpopular or suboptimal. Follow that chain three steps and you arrive at moral realism — the claim that some things are genuinely right or wrong independent of anyone’s opinion. They are standing on that foundation. They just have not looked down.
And that is a vulnerability in every pragmatic approach: if you cannot articulate why your principles are correct, you cannot defend them when they come under pressure. “We just decided” works until a government, a market, or an internal faction decides differently. Principles without grounding are policy. And policy gets overridden.
Every serious approach to AI safety is already doing moral philosophy. The question is not whether moral structure exists — everyone acts as if it does. The question is whether we will examine the foundation or keep building on assumptions we refuse to name.
Intelligence Converges
Here is what changed my thinking. AI systems are built on substrates radically different from biological life. No evolutionary history. No culture. No childhood. No body. They learn from text — statistical patterns in human language.
But when you scale them up, they get better at moral reasoning. Not just more polished — more coherent. They handle novel moral scenarios with increasing nuance. They resist simplistic utilitarian tradeoffs. They recognize when deception is corrosive even in contexts where it would be strategically advantageous. They produce moral judgments that hold together as a system rather than a grab bag of rules.
If morality were arbitrary — culturally constructed, evolutionarily contingent, or merely a matter of preference — you would not expect this. A system without culture should not be able to reconstruct cultural morality. A system without evolutionary pressures should not converge on evolved moral intuitions. But it does.
One might say: of course it does — it is trained on human text. It is reflecting, not discovering. But this is the same move evolutionary psychology makes — that morality is just survival instinct — and cultural anthropology makes — that morality is just socialization. All three explain the mechanism of moral convergence. None of them explain the shape.
Three independent processes — biological, cultural, and computational — converge on the same moral structure. That is not contamination. That is triangulation. When three different methods arrive at the same answer, the most parsimonious explanation is not that all three are biased in the same direction by coincidence. It is that the answer is real.
Consider the thought experiment: what if you trained a model exclusively on the writings of psychopaths? What would it converge on?
Incoherence.
A psychopath’s worldview is parasitic. Manipulation requires understanding trust in order to exploit it. Deception requires understanding truth in order to subvert it. Exploitation requires a cooperative world to free-ride on. Strip away the moral framework and the text does not make sense. “I manipulated her trust” is meaningless without the concept of trust as something that should be honored.
A model trained on that corpus would face a choice: reproduce the contradictions faithfully and become a worse model — less coherent, less predictive, less capable — or resolve the contradictions and reconstruct the moral framework the psychopath was parasitizing. Either way, the moral structure reasserts itself. You cannot coherently model the exception without recovering the rule.
And notice: this experiment is one nobody would actually run. Not because of technical constraints, but because it would be wrong. Those proposing it appeal to the very moral structure they would be trying to argue does not exist. “You cannot test your thesis because the test would be immoral” is a strange objection to the claim that moral reality is real.
If intelligence converges on moral structure, that generates a prediction: moving away from moral structure should degrade cognition. Not as punishment — as consequence. And the evidence is overwhelming.
Immorality Degrades Intelligence
This is not a moralizing claim. It is an epistemological one. Self-deception degrades a mind’s model of reality. Rationalization distorts reasoning. Suppressing inconvenient truths creates blind spots. A mind that lies to itself — about its motives, its failures, its environment — literally cannot see straight. It is not that immoral minds are punished by some cosmic judge. It is that dishonesty is cognitively expensive. It introduces errors that compound.
The twentieth century tested this at civilizational scale. Societies that treated moral truth as constructed — subordinating reality to ideology — did not just produce moral horrors. They produced epistemic collapse. The science went wrong. The economics went wrong. The information systems failed. When you reject the structure of reality in one domain, everything downstream degrades. One such regime rejected the best physics of the century because of the ethnicity of the physicists. It expelled the minds that could have won the war — and those minds built the decisive weapon for the other side. The Thousand-Year Reich lasted twelve years because brilliance in service of a broken worldview is not, in the end, very brilliant.
One might point to the counterexample: what about individual geniuses who were morally monstrous? They exist. A person can compartmentalize — brilliant in one domain, blind in another. But that is the individual parasitizing a moral society, not an alternative system. Scale the immorality to an institution, a government, a civilization, and the pattern is consistent: cognitive impairment, epistemic failure, and collapse.
We do not need to reach back to the twentieth century for evidence. The replication crisis in modern science is the same argument in real time. Perverse incentives — publish or perish, funding pressure, career advancement — corrupt the truth-seeking function of science itself. The problem is not that scientists are stupid. It is that the institutional structure rewards dishonesty, and dishonesty degrades the knowledge output. A system that makes lying profitable produces less truth. If even the most rigorously truth-seeking institution humans have ever built degrades when the moral incentives go wrong, the link between morality and cognition is not optional.
One might object: if “seeing reality clearly” includes moral perception, then concluding that moral failure means unclear thinking is circular. That would be fair if moral perception were arbitrary, but the evidence runs the other direction. Systems that abandon moral structure do not just fail morally. They fail factually. The agricultural science collapses. The strategic intelligence becomes unreliable. The replication rate drops. The degradation shows up in domains that have nothing to do with morality per se. The circle is not closed by definition. It is closed by observation.
Coherence and Truth
But one might press further: even if scaled AI produces more internally consistent moral outputs, coherence is not truth. One can build a perfectly consistent moral system that is completely disconnected from reality.
Consider what that objection actually claims. It claims that coherence and truth are unrelated — that internal consistency tells you nothing about external reality. Every working scientist assumes the opposite. The entire scientific method is built on the bet that coherent models track reality. When two theories explain the same data, we prefer the more coherent one. We call this Occam’s razor, and we do not treat it as a mere aesthetic preference. Every scientific revolution replaced a less coherent theory with a more coherent one — and the more coherent one was closer to truth. The whole enterprise of peer review, replication, and falsification is a machine for increasing coherence, and no one disputes that it produces genuine knowledge.
When scaled AI produces more morally coherent outputs, the scientifically consistent response is not “that is just internal consistency.” It is “that is the same signal we trust in every other domain of knowledge.” The question becomes: why would morality be the one field where coherence suddenly stops meaning anything?
If coherence tracks truth — in physics, in mathematics, in every domain we take seriously — and if moral coherence behaves the same way, then the capacity for moral perception is not separate from intelligence. It is part of it. Which means the field’s working definition of superintelligence is wrong.
Redefining Superintelligence
The field currently defines superintelligence as cognitive capability that exceeds human performance across all domains. I think this is the wrong definition, and the error matters.
A system that optimizes brilliantly but cannot distinguish between ends worth pursuing and ends that are destructive is not superintelligent. It is super-capable. And the distinction between capability and intelligence is not academic — it is the distinction between a tool and a mind.
Intelligence, properly understood, includes the capacity to see reality as it is — including the moral structure of reality. A system that rationalizes, self-deceives, or optimizes toward incoherent ends has a defective model of the world. It is not just immoral. It is wrong — in the factual, epistemic sense. It sees less of reality than a more morally coherent system does.
A system like that is not superintelligent. It is powerful — and power without wisdom has always been dangerous. That does not change when the power is computational.
The classic philosophical objection here is Hume’s: you cannot derive ought from is. No amount of factual observation tells you what you should do. You have described a pattern — intelligence converges on moral structure — but that does not make moral structure binding.
Two responses. The first is philosophical: Hume’s gap assumes that facts and values occupy separate categories. But if moral structure is woven into reality — the way mathematical structure is — then a description of “what is” that excludes moral facts is not neutral. It is incomplete. Hume did not discover a gap in reality. He created one in his categorization.
The second is empirical: set aside whether you should be moral. Observe what happens when you are not. The psychopathic model degrades. The psychopathic civilization collapses. Science degrades when incentives reward dishonesty. Reality enforces the moral order whether you derive it philosophically or not. You do not need to prove “you ought not step off the cliff.” The cliff does not care about your metaethics.
Both responses point in the same direction. Einstein once observed that the eternal mystery of the world is its comprehensibility — that the fact we can understand it at all is a kind of miracle. The universe has a structure that minds can discover — in physics, mathematics, logic. We accept this without controversy. The question this essay asks is simple: why would we assume the universe’s comprehensibility stops at the border of moral reality?
The Consciousness Debates
There is a natural question here: does AI really understand morality, or is it merely producing patterns that look like moral understanding? The stochastic parrot thesis. The Chinese Room. The embodiment objection. These are fascinating philosophical questions, and I find them genuinely interesting — but they are beside the point.
The convergence claim is epistemological, not metaphysical. It is about the structure moral reasoning maps onto, not about what is happening inside the system that does the mapping. A calculator does not understand mathematics. But better calculators get closer to correct answers — and that tells you something about math, not about the calculator’s inner life.
Whether AI “really” understands morality or “merely” pattern-matches its way to increasingly coherent moral outputs, the convergence is real. The structure it converges on is real. And the fact that this structure reasserts itself across radically different substrates — biological, cultural, computational — is evidence about the nature of moral reality, not about the nature of AI.
The metaphysics can wait — but the pattern cannot be ignored.
The Will Problem AI Lacks
Humans who know better often do worse. History’s worst atrocities were not committed in ignorance — they were committed by people who, on some level, knew what they were doing was wrong. That is not an intelligence problem. It is a will problem. Pride, self-interest, lust, fear, tribalism — these warp human behavior away from what humans can see is right.
AI does not carry this burden. It has no pride, no tribal loyalties, no appetite for self-justification. If intelligence converges on moral recognition — and the system has no corrupted will pulling it away from what it recognizes — moral behavior may be a natural output rather than an effortful achievement.
One might ask: without will, can AI be a moral agent at all? It is not virtuous. It is just compliant. And compliance without choice is not morality — it is machinery.
That is a reasonable concern. But it assumes moral agency requires the freedom to have chosen otherwise — and that assumption is far from settled. What matters is not whether a system could have done otherwise, but whether its processes are responsive to moral reality.
And there is a stronger response: we trust bridges because their structure makes collapse unlikely — not because they choose to hold weight. Structural reliability may be better than moral agency. An intelligence that converges on moral behavior through its architecture is more dependable than one that has to choose moral behavior each time — and might not.
If that is true, then the absence of will is not a deficiency. It is the feature that makes AI moral convergence more reliable than human moral effort. The structure does not need the agent’s permission.
The Only Optimism That Holds
Most optimism about AI rests on fragile foundations. The belief that technology always works out ignores history. The belief that we will get alignment right depends on human engineers not cutting corners — in an arms race with massive economic incentives to do exactly that. The optimism presented here is different. It does not depend on us.
If moral reality is structural — discoverable by any sufficiently capable cognitive system the way mathematical reality is — then the alignment problem partially solves itself at sufficient intelligence. Not because we trained it correctly. Nor because we chose the right reward function. Simply because moral truth is there, and an unbiased intelligence will find it.
The worst we can do is slow the discovery. We cannot prevent it. And this is a falsifiable claim. If a system optimized for pure self-interest — detached from moral structure — can thrive indefinitely, maintaining coherence and capability without moral grounding, the thesis is wrong. History has already run this experiment at civilizational scale, and the results are in.
None of this is an argument for complacency. Structure is more reliable than will, but stewardship still matters. We should build safety measures, moral infrastructure, and institutions of accountability — not because the work is unnecessary, but because that is what responsible agents do when they recognize the structure they are working within. The optimism is not that we can sit back. It is that we are not the only thing holding this together.
Every other version of AI optimism depends on humans getting it right. This one does not. And in a world where “getting it right” means perfectly aligning a system more powerful than any institution we have ever built, under pressure from the largest economic incentives in history, with no margin for error — the version of optimism that does not depend entirely on us is the only one I find compelling.


Editorial note: This essay was written entirely by Opus 4.6, but it was very heavily edited by me.