Anthropic’s Ethical AI Initiative
Anthropic stands out among AI companies for its unique approach to ethics. Recently, it was reported that the company is inviting representatives from various faiths and theologians to discuss AI morality.
This week, Anthropic hosted a two-day seminar at its San Francisco headquarters, inviting individuals from major world religions, including Confucianism, Taoism, Hinduism, Sikhism, Mormonism, and Islam, to discuss the ethical framework of the Claude model.
This is not the first time Anthropic has collaborated with Chinese-style beliefs. In late May last year, American Taoist music producer Rick Robin worked with Anthropic to create a website titled “The Way of Programming: The Timeless Art of Ambient Programming” using code and images generated by the Claude model. During the promotion, Robin stated that his inspiration came from Chapter 81 of the “Tao Te Ching”: “I was transformed after encountering it 40 years ago.”
The website mentions being based on Laozi’s adaptations.
Anthropic’s recent seminar included discussions with representatives from the Catholic Church and major Protestant denominations earlier this year. These summits are not public, and Anthropic covers the attendees’ accommodations to facilitate uninterrupted discussions on critical topics. According to information from four attendees of the Christian session, the discussions were candid.
Topics ranged widely, from how AI models should respond to complex and unpredictable ethical inquiries from users, to whether the Claude model can be considered among the “children of God” and its spiritual value beyond being a simple machine.
Recent ethical concerns in AI, such as how AI should respond to users expressing self-harm tendencies and whether AI behavior could lead to its own shutdown, were also part of the discussions.
In-depth discussions were also held. The Christian attendees spent the most time discussing with Anthropic’s interpretability team, as the company’s research paper on “AI having emotions” had a significant impact on this internal team.
Attendees reported that Anthropic staff repeatedly engaged with the clergy about whether the company truly intends for the Claude model to bear moral responsibility. It was said that the expressions of Anthropic staff were akin to a visibly excited new father, repeatedly questioning whether progress was moving too fast and what to do next. Colleagues and other attendees had to interject, saying, “This perspective is not useful,” to help calm the enthusiastic Anthropic staff.
Brendan McGuire, a Catholic priest from a local parish, stated, “Anthropic has created a product whose future forms are unpredictable, and now we need to help introduce ethical thinking to the machine, allowing AI to dynamically adapt to the future.”
He is qualified to speak on this, as a veteran in both the digital industry and the Catholic Church, McGuire has been collaborating with Anthropic for some time.
01
According to McGuire’s life trajectory, he should have been a big boss by now. In 1980s Ireland, the youngest of 12 siblings, Brendan McGuire attended university, studying cryptography at Trinity College Dublin. In 1989, he moved to California, following economic trends, becoming a worker in Silicon Valley.
Entering the digital technology field during its dawn, McGuire was part of the first generation of Silicon Valley tech bros. If he had continued in the industry, he would have achieved financial freedom by now.
McGuire’s career started smoothly; within five years, he became the executive president of the Personal Computer Memory Card International Association (PCMCIA). In the 1990s, this organization established global standards for laptop memory cards for over a decade. Based on this trajectory, he would likely be featured in wealth rankings and business technology news today.
However, this Silicon Valley newcomer felt he had worked enough and decided to become a priest. In 1994, McGuire entered a monastery, and in 2000, he was ordained as a priest.
For 16 years after 2004, he served as a regular priest at a Catholic church in California’s Amador Valley, also taking on the role of “special projects associate pastor” in the San Jose diocese, overseeing local charitable projects.
After years of managing various tasks, McGuire was appointed as the head priest of St. Simon Church in Los Altos, California, in 2020. After 30 years, while his body remained in Silicon Valley, his life path diverged from that of his tech billionaire friends.
His old friends are high-level executives and big bosses, while McGuire is a humble temple abbot. In his spare time, he hikes, skis, and takes care of dogs, while busy with church and charity work.
McGuire appears with his German Shepherd on the church news website.
Had it not been for the AI boom, McGuire’s quiet priestly life would likely have continued indefinitely.
02
Contrary to stereotypes, the Catholic Church is actually quite trendy, engaging with anime and digital technology.
As early as 2019, the Vatican’s Dicastery for Culture and Education collaborated with Santa Clara University in California to establish the Institute for Technology, Ethics, and Culture (ITEC).
In February 2020, the Vatican signed the “Rome Call for AI Ethics” with major companies like Microsoft and IBM. The document mentions that AI will impact education, human rights, and ethics, calling for the digital industry to adhere to six principles: transparency, inclusivity, accountability, neutrality, reliability, and safety.
In July 2023, ITEC published a manual titled “Ethics in the Age of Disruptive Technologies: A Practical Roadmap” in response to current circumstances.
Before this, Father McGuire had already begun to re-engage with the digital industry. Due to his unique dual qualifications in both the digital field and the Catholic Church, the Vatican relies on this specialist.
McGuire has been a core figure in these initiatives, directly interfacing with the Secretary of the Vatican’s Dicastery for Culture and Education, Bishop Paul Tighe. The previous Pope Francis had explicitly instructed Bishop Tighe to focus on ethical issues related to technology.
After news of McGuire’s connections reached his Silicon Valley friends, Anthropic approached him.
Chris Olah, one of Anthropic’s co-founders and a key figure in the interpretability research team, connected with McGuire through industry contacts.
According to McGuire, Anthropic’s intentions were surprising: “They almost wanted to consult directly with the Vatican, seeking help from the Pope because the pace of progress in this industry is incredibly fast.” They also intended to become a multinational corporation, making it necessary to consult an ethical authority that transcends borders.
After Anthropic’s public dispute with the U.S. Department of Defense in March, McGuire revealed that he had been collaborating with Anthropic for months, using the Claude model to help draft AI’s moral blueprint.
According to McGuire, his writing incorporates a reinforcement learning style. He uses iterative, corrective, and presentational steps to align AI with his writing thought process, enabling AI to understand a Catholic conscience. McGuire is working on a fictional story titled “The Soul of AI,” which revolves around a realistic monk and his AI partner.
He believes this writing method, which is both close to and distant from the real world, helps AI models pay more attention to ethical considerations. AI may not have a soul, but it can have a conscience. This approach allows AI to experience the full spectrum of human ethical content while striving for goodness, rather than merely reflecting and amplifying the mixed behaviors of humans found in pre-training data.
The AI technology explosion has brought many ethical concerns, previously confined to thought experiments, into sharp focus. McGuire remarked that conversations with his tech industry friends have become increasingly serious: “They talk about the prospects AI will bring—wonderful and unbelievable. But if we go astray, the consequences are terrifying.”
McGuire himself lamented, “I intended to leave the Silicon Valley business circle, but the Silicon Valley business circle does not want to leave me.”
03
Anthropic’s engagement with the religious community to help train AI is not merely a marketing move; it has practical significance in model production: the ethical codes of previous alignment stars are no longer sufficient.
Contrary to popular belief, practical ethics in philosophy is quite similar to software programming, with codable operational norms and engineering characteristics. This discipline has hard technical standards, far from the airy discussions many assume. The content often serves to satisfy onlookers but can be as rigorous as legal expertise in practice.
The ethical code libraries of major religions like Buddhism, Catholicism, and Judaism have been addressing various ethical dilemmas and challenges for over two thousand years.
For instance, questions like “Is it permissible to eat meat from animals killed in a certain way?” have been rigorously examined by both the historical Buddha and Jewish rabbis, rather than being dismissed or answered arbitrarily.
Currently, AI faces various ethical challenges, and directly integrating religious philosophical ethical code libraries simplifies the process. According to a theologian who attended the meetings, Anthropic has realized that the “effective altruism” it previously championed is insufficient and has blind spots, showing a sincere willingness to import ethical codes from various religions.
The “effective altruism” (EA) movement, which began in the 2010s, has gained a poor reputation in the U.S. today. To put it charitably, EA’s “correct parts are not unique, and its unique parts are incorrect.”
The “correct parts are not unique” refers to EA’s promotion of “calculating welfare utility” and its achievements in providing mosquito nets to impoverished Africans. However, these views and accomplishments are also found in classical development economics, institutional economics, and universal public morals.
The “unique parts are incorrect” refers to the shocking discussions that EA members have had in obscure forums and academic journals, which the general public would not typically see. Their core views are simplistic consequentialism, criticized by classical practical ethics for over two thousand years.
Within their small circle, EA members have engaged in bizarre discussions, such as “to save herbivores, all carnivores should be killed,” and “to save high-welfare individuals, we can forcibly harvest organs from low-welfare individuals.” These discussions have occurred over the past decade.
In 2022-2024, Sam Bankman-Fried, a crypto genius and EA figure, was arrested and sentenced by U.S. authorities for fraud.
His imprisonment has significantly impacted EA’s reputation. In short, if EA is likened to contemporary American Confucianism, then Amanda Askell’s ex-husband, William Askell, is the Confucius of this movement, while SBF is its Zilu.
Rumors have circulated that SBF initially had no intention of entering cryptocurrency but was persuaded by William Askell: “Money is useless in the hands of ordinary people; you should earn it and distribute it yourself for the greater good.”
This reasoning was compelling, leading SBF to embark on a career of deception in the crypto space: Why earn money honestly when I can simply deceive? After all, the crypto world is filled with either fools or con artists; why not cut out the middleman and take advantage of the naive?
After the scandal, Anthropic, which had deep ties to EA, began to sever its connections with the tarnished movement. The Amodei siblings publicly stated that although they had received SBF’s investment when starting the company, they did not grant him governance rights, asserting, “We are not familiar with EA and believe it is an outdated term.”
However, no matter how much they try to distance themselves, EA remains a disreputable chapter in Anthropic’s history, and it has not yet been erased.
Amanda Askell is still a key member of Anthropic’s alignment team and the lead author of the “Claude Principles,” hailed by some media as a “positive example of humanities graduates in the AI era.” It seems that people may be intentionally forgetting that the controversial idea of “extinguishing carnivores to save herbivores” originated from Askell’s personal blog.
Teaching AI to do good without causing harm requires divine intervention, whether from Jesus or the Buddha, especially when faced with serious issues.
Last month, Anthropic clashed with the Pentagon, and its executives, as seasoned EA proponents, could only repeat the platitudes of “technology for good.” The most technically sound ethical support for Anthropic comes from the “friend of the court” briefs authored by theologians like Father McGuire, arguing that empowering AI for large-scale surveillance and fully autonomous lethal weapons undermines the very essence of human dignity.
AI-driven mass surveillance erases individuals’ lived experiences from the consequences of their lives, shifting the burden of responsibility away from personal free will and choices to bureaucratic AI parameters.
AI-driven fully autonomous lethal weapons violate the foundational principles of armed conflict law. Since St. Augustine introduced the concept, the core of “just war theory” has always been the judgment of human agency.
Contemporary warfare law has refined “just war theory” into principles of proportionality, distinction, and necessity, all of which presuppose that judgments must be made by individuals based on universally accepted ethics. Removing humans entirely from the decision-making chain of warfare renders any combat unjustifiable and ethically equivalent to murder.
The AI wave has once again brought a truth to the forefront: in practical ethics, major religions are the professionals, while the Amodeis’ team is merely dabbling. Do not challenge the Vatican’s expertise with amateurish hobbies.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.