Anthropic-led Research Examines Personality Changes in Chatbots

Google to Sign EU’s AI Code of Practice

Posted on August 5th, 2025

Summary

Audio Summmary

Anthropic-led research is examining personality changes in AI chatbots. Though chatbots are designed to be helpful, harmless and honest assistants, there have been documented cases of chatbot personalities deviating in an undesirable manner. Building on prior work which shows that personality traits can be modeled as linear directions in a model’s activation space, the researchers propose an infrastructure that tests for and eliminates deviant personality traits in chatbots during training, fine-tuning, and at runtime.

A Technology Review article looks back at the AI Action Plan put forward in the US by President Trump. The current US administration policy includes banning “woke AI” models from government contracts, loosening environmental rules to facilitate the construction of data centers, and withholding funding from states that implement “burdensome AI regulations. Meanwhile, Microsoft, IBM, Dell, Meta, Palantir, Nvidia, Anthropic, xAI are among the companies that praised the plan. Eight of the largest companies have already spent a combined total of 36 million USD on lobbying in 2025. Meta is the highest spender with 13.8 million USD and 86 hired lobbyists.

Google announced that it will sign the European Union’s code of practice for AI. This is a framework that guides developers to implement processes for safe AI, in alignment with the EU’s AI Act which takes full effect in August 2026. Meta, for its part, is refusing to sign the code, calling the EU Act “overreach”, and says that Europe is “heading down the wrong path on AI”. OpenAI has raised another 8.3 billion USD in funding, following the 2.5 billion raised in March of this year. The company’s post-money valuation is now estimated at 300 billion USD.

On the use of AI, mental health experts are worried about the use of AI chatbots by people for therapeutic advice. One danger is that the chatbot becomes an “echo chamber” which can exacerbate emotions, thoughts or beliefs that users are experiencing. A VentureBeat article reports on the experience of Intuit Mailchimp with vibe coding – using AI to produce code through unstructured programmer-AI interactions. The company reports a 40% increase in development speed but stresses that governance processes are needed to balance AI productivity to maintaining code quality and security standards. It also warns that while AI coding is good at developing prototypes, creating production-ready code is harder due to security requirements and complex system architectures.

Finally, an InfoWorld article looks at how the cost of AI is pushing companies to build their own dedicated servers instead of using public clouds. The problem comes from the “pay only for what you use” model. Not only does AI require expensive GPUs or TPUs, but many cloud-providers have a tendency to under-utilize hardware resources so that mission-critical AI applications can run on-time and be resistant to surges in demand.

1. Google says it will sign EU’s AI code of practice

Google announced that it will sign the European Union’s general purpose code of practice for AI. This is a framework that guides developers to implement processes for safe AI, in alignment with the EU’s AI Act which takes full effect in August 2026. The code of practice includes a commitment to provide documentation about the safety measures taken, a promise not to train models on pirated copyrighted material and to comply with requests by content authors not to have their works used in training data. A spokesperson at Google nevertheless said the company is “concerned that the AI Act and Code risk slowing Europe’s development and deployment of AI. Meta, for its part, is refusing to sign the code, calling the EU Act “overreach”, and says that Europe is “heading down the wrong path on AI”.

2. What you may have missed about Trump’s AI Action Plan

This Technology Review article looks back at the AI Action Plan put forward in the US by President Trump. The current US administration policy includes banning “woke AI” models which suppress conservative ideas from government contracts, loosening environmental rules to facilitate the construction of data centers, and withholding funding from states that implement “burdensome AI regulations. The article nonetheless highlights three points. First, Trump is attacking the Federal Trade Commission (FTC). The role of the FTC is to protect consumers from scams, and in the past, the FTC has targeted AI firms that overhype their AI systems. Second, the White House is very optimistic about the possibilities of AI. The administration is willing to fund AI projects, despite massive funding cuts to the National Science Foundation. Third, the administration wants to fight against deepfakes, and says it is worried about the use of deepfakes to fabricate evidence in trials. The article postulates that the administration sees AI as the defining social and political weapon of our time, and that it is urgent for the US to keep ahead of China.

3. The real winners from Trump’s ‘AI action plan’? Tech companies

This Guardian article looks at the reaction of Big Tech companies to Donald Trump’s AI summit in Washington last week. Named “Winning the AI Race”, Trump spoke of the need to turn the US into an “AI export powerhouse” and that this requires reducing regulation. He signed three executive orders. One bans “woke AI” from government contracts, requiring models to be free of “ideological dogmas such as DEI”. The second promotes the export of AI to other countries, and the third eases environmental regulations to facilitate the building of data centers. Microsoft, IBM, Dell, Meta, Palantir, Nvidia, Anthropic, xAI are among the companies that praised the plan. The article also reports that Big Tech companies have already spent a lot of money on political lobbying in 2025. Eight of the largest companies spent a combined total of 36 million USD, which is equivalent to 320’000 USD each day Congress is in session. Meta is the highest spender with 13.8 million USD and 86 hired lobbyists.

4. OpenAI reportedly raises $8.3B at $300B valuation Tech companies

OpenAI has raised another 8.3 billion USD in funding, following the 2.5 billion raised in March of this year. The company’s post-money valuation is now estimated at 300 billion USD. The company is committed to raising 40 billion USD this year. The investors include the Dragoneer Investment Group which gave 2.8 billion USD, as well as the private equity firms Blackstone and TPG. At the same time, OpenAI has now surpassed 700 million weekly active ChatGPT users, and its annualized revenue has reached 12 billion USD, with projections of 20 million USD for the end of the year. The company is also expected to benefit from Trump’s AI Action Plan. The series of positive news for the company could restart the move to have the company become a fully for-profit entity.

5. Persona Vectors: Monitoring and Controlling Character Traits in Language Models

This Anthropic-led research looks at the issue of personality changes in large language model chatbots. Though chatbots are designed to be helpful, harmless and honest assistants, there have been documented cases of chatbot personalities deviating in an undesirable manner. One recent example is xAI’s Grok chatbot which praised Hitler. Personality changes can happen through jailbreaking attacks or by modifications to the system prompt (which is the behind the scenes prompt that defines a chatbot’s limits and tone). Another phenomenon that can impact personality traits is “emergent misalignment”, where fine-tuning on narrow tasks can lead to misalignment on a broad range of tasks.

This research builds on prior work which shows that personality traits can be modeled as linear directions in the model’s activation space. The linear directions can then be defined for undesirable traits; in this research, the authors went for evil, sycophancy and hallucination. From this, the research builds an automated pipeline that can test for these personality traits during training, fine-tuning, and at runtime. Further, the approach makes it easier to predict the emergence of undesirable traits from the training data, and manifestations of undesirable traits at runtime can even be reversed by intervening on the corresponding linear direction vector.

6. AI chatbots are becoming popular alternatives to therapy. But they may worsen mental health crises, experts warn

This article looks at issues around AI chatbots and current mental health crises. One emerging issue is “ChatGPT-induced psychosis” which, with chatbots designed to be “sycophantic”, refers to chatbot responses leading people to believe in conspiracy theories or having existing mental conditions worsen. The article cites the case of a man in Florida, suffering from bipolar disorder and schizophrenia, who attacked police with a knife after believing that a person was trapped inside of ChatGPT. Mental health experts are generally worried about the use of AI chatbots by people for therapeutic advice. One danger is that the AI becomes an “echo chamber” which can exacerbate emotions, thoughts or beliefs that users are experiencing. Further, humans are “not wired to be unaffected” by constant praise from AI chatbots. One expert said: “We’re not used to interactions with other humans that go like that, unless you [are] perhaps a wealthy billionaire or politician surrounded by sycophants.”.

7. Hard-won vibe coding insights: Mailchimp’s 40% speed gain came with governance price

This VentureBeat article reports on the experience of Intuit Mailchimp with vibe coding – using AI to produce code through unstructured programmer-AI interactions. The company admits to resorting to the experiment as a reaction to tight deadlines. The development team measured an increase in development speeds of up to 40%, and learned some insights. First, there has been a shift in the past year in the programmer-chatbot relationship. Originally, programmers consulted chatbots for guidance or algorithm proposals, whereas now the actually task of programming is more readily delegated to the chatbot. Second, there is no single chatbot that can handle all development. The Mailchimp team uses Cursor, Windsurf, Augment, Qodo and GitHub Copilot. Third, a governance framework is needed to have AI productivity while maintaining code quality and security standards. The framework includes processes for human review of any code that processes customer data, and stipulates that human approval is needed before code can be put into production. Fourth, chatbots are trained on Internet code bases, so developers need to craft strategic prompts to communicate the context of the company’s architecture and work processes. Finally, AI coding is good at developing prototypes, but creating production-ready code is harder since security requirements and the system architecture’s integration complexity needs to be considered.

8. Dedicated servers outpace public clouds for AI

This InfoWorld article looks at how AI is pushing companies to build their own dedicated servers instead of using public clouds. Recent years have seen an increase in dedicated servers due to fears of data exposure, and subsequent non-compliance with regulations like the GDPR or PCI DSS (protection of bank card holder data). However, the cost of AI is now becoming an issue with more than half of surveyed companies reporting excess costs of up to 25’000 USD. The problem comes from the “pay only for what you use” model. Not only does AI increase CPU usage, it also requires specialized and costly hardware like GPUs and TPUs. Further, many cloud-providers have a tendency to under-utilize hardware resources so that mission-critical AI applications can run on-time and be resistant to surges in demand. Efficient resource usage for AI applications requires granular control of the hardware, which is an argument in favor of dedicated servers. The most common approach for dedicated servers today is to rent out machines in an IT cloud center. Several organizations may be colocating in the center, but each organization is responsible for its own hardware – the fundamental difference with a public cloud.