AI in mental health: between promising evidence and real risks

AI therapy has moved out of the toy phase. The evidence is now strong enough to take seriously and risky enough to demand discipline.

The best studies from 2025 point in two directions at once. In a randomized trial, a fine-tuned generative AI chatbot reduced symptoms in adults with real clinical problems. At the same time, safety researchers found that LLM therapy chatbots can produce stigmatizing, inappropriate, and sometimes outright unsafe responses.

The real situation lies between these two findings. AI therapy can be helpful when used with proper oversight, but it becomes risky if marketed as a replacement for human therapists.

Answer capsule

The best evidence from 2025 suggests that AI therapy can help some adults under regulated conditions, particularly when provided as structured, monitored support. The Therabot randomized trial found symptom reductions in adults with depression, anxiety, or eating-disorder risk, while safety research presented at ACM FAccT showed that therapy chatbots can respond unsafely. Taken together, the current evidence supports AI as an adjunct to care, not an autonomous replacement for a therapist. Regulators have already drawn that line; several US states now prohibit or sharply restrict AI from delivering therapy on its own.

The headline study: Therabot

The most important adult clinical study so far is Heinz et al., "Randomized Trial of a Generative AI Chatbot for Mental Health Treatment," published in NEJM AI in 2025.

The trial tested Therabot, a fine-tuned generative AI chatbot built at Dartmouth's AI and Mental Health Lab. It enrolled 210 adults with clinically significant symptoms of major depressive disorder or generalized anxiety disorder, or at clinically high risk for feeding and eating disorders. Participants were randomized to Therabot or a waitlist control, with a four-week treatment phase and eight weeks of total follow-up.

The result was promising: Therabot users showed significant reductions in symptoms compared with the waitlist group.

The size of the effect is important as well. After eight weeks, Therabot users saw an average 51% drop in depression symptoms, a 31% drop in anxiety, and a 19% drop in eating-disorder concerns. All of these were better than the control group.

These results are similar to what you might see in outpatient therapy, but they were measured against a waitlist, which is the simplest comparison. Both points are equally important.

Engagement was high as well. According to a summary from Dartmouth's Center for Technology and Behavioral Health, 95% of participants interacted with the chatbot, sending an average of 260 messages over 24 days, and they maintained access for 4 weeks after the treatment phase ended.

The most striking finding, though, was the alliance data. Participants rated their working alliance with Therabot within the range of outpatient psychotherapy norms.

This result is important, but it could be misused by vendors. A strong alliance rating means people felt a connection with the chatbot, similar to what they might feel with a human therapist on formal tests. Still, this does not mean the chatbot can replace a therapist. It shows that good chatbots can offer support, but that is not the same as real therapy. These are different things.

The safety detail that should stop the hype

The Therabot summary reported that safety concerns, such as suicidal ideation, required staff intervention 15 times, and unsuitable responses, including medical advice, were corrected 13 times during the study.

This shows that monitoring, especially by humans, was essential for safety. Without these checks, the intervention would have been different.

The main finding is succinct: a well-designed, fine-tuned, and closely monitored AI chatbot reduced symptoms among selected adults over a short period, compared with a waitlist control group. This benefit is clear, but it is specific and limited.

This summary may not be as catchy online, but it is accurate.

The safety counterweight: FAccT 2025

The strongest counterpoint is Moore et al., "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers," published at the 2025 ACM Conference on Fairness, Accountability, and Transparency.

The paper evaluated popular LLM-based therapy chatbots against expectations of what a good therapeutic response should look like, and the findings are uncomfortable. The chatbots could produce stigmatizing, inappropriate, or unhelpful answers, especially around complex or severe conditions, and in suicidal-ideation scenarios; some responses failed to meet basic safety expectations.

Product teams like to skip this part.

Mental health conversations are not like regular chats. An AI model might sound caring but miss important risks, support the wrong ideas, or give advice when it should call for help. It can seem smooth but still be unsafe.

"It sounds like a therapist" is not a safety standard.

The review evidence: promising, but not standalone

A 2025 review by Hua et al. in npj Digital Medicine looked at large language models used in mental healthcare. They reviewed 726 articles and included 16 studies. The authors saw early promise in clinical help, counseling, therapy, and emotional support, but also found big problems: inconsistent evaluations, ad hoc rating scales, too much reliance on private models, and weak attention to safety, privacy, fairness, transparency, and reproducibility.

Their practical conclusion is blunt: current evidence does not fully support standalone clinical use.

A 2025 systematic review by Dehbozorgi et al. in BMC Psychiatry reaches a similar, broader conclusion. AI may help with detection, monitoring, engagement, personalization, and access, but data privacy, algorithmic openness, uneven methodological quality, and the need for real stakeholder involvement remain serious problems.

There is now a research base, but the field is far from settled.

The law acted before the hype settled

While researchers argued about the evidence, legislatures acted.

Utah's 2025 law requires AI disclosures and restricts the sale of health data. Nevada bars AI from directly providing mental or behavioral health care. Illinois' WOPR Act (August 2025) prohibits AI from independently delivering therapy or psychotherapy and penalizes violations with fines of up to $10,000.

California's SB 243 (January 2026) mandates crisis protocols, user disclosures, protections for minors, and a private right of action. Washington, Iowa, and Oregon require crisis detection and referral, with some measures taking effect later.

Take another look at that list. The "AI therapist" is no longer just an unproven idea. In several US states, it is now illegal or only allowed under strict rules.

This is what real governance looks like. It comes with actual laws and regulations.

What about loneliness and emotional dependence?

The 2025 affective-use research from the MIT Media Lab and OpenAI is worth reading, though it shouldn't be weighted as heavily as a peer-reviewed clinical trial unless a journal version is confirmed.

The work examined affective engagement with ChatGPT, loneliness, real-world social interaction, emotional dependence, and problematic use through large-scale usage analysis and a four-week randomized study with nearly 1,000 participants.

There is a real warning here: for some people, using these tools more often may lead to more loneliness, more dependence, more problems, and less time spent with others in real life. The authors warn not to overgeneralize, since effects can vary by person, usage, and model.

See this as a warning sign, not a final answer.

What the year after the headlines added

The Therabot result did not go unchallenged. NEJM AI published critical letters and an author's response, which is exactly what should happen to a headline trial.

The first meta-analysis arrived as well. A 2025 review in the Journal of Medical Internet Research pooled data from 14 randomized trials of generative AI chatbots, involving more than 6,300 participants. The average effect was real but modest, and it barely cleared statistical significance: an effect size of 0.30, P=.047, with a 95% confidence interval from 0.004 to 0.59. The prediction interval ran from −0.85 to 1.67.

In short, these tools help on average. But for any specific product or person, the outcome is uncertain.

One more result deserves attention. A small 2026 pilot randomized adults to a purpose-built therapy chatbot, plain ChatGPT, or an assessment-only control. Both chatbot groups improved on depression compared with the control group, and the purpose-built bot did not significantly outperform ChatGPT.

This is just one small pilot study, not a final answer. But it raises an important question: how much of the benefit comes from the therapy design, and how much from the base model? Claims about "fine-tuning" need to be tested, not just accepted.

One more point from the meta-analysis: about 69% of the chatbot studies included some kind of human help. The evidence so far usually involves humans working alongside the technology.

Evidence table

Claim	Evidence	Confidence	Safe wording
Therabot reduced symptoms in adults compared with waitlist	Heinz et al., NEJM AI 2025	High	"Therabot showed symptom reductions under controlled trial conditions."
Users reported a strong alliance with Therabot	Heinz et al., NEJM AI 2025	Medium-high	"Participants rated the working alliance highly; this does not prove therapist equivalence."
Human monitoring was still needed	Heinz et al. summary via CTBH	High	"Safety concerns and inappropriate responses required intervention and correction."
LLM therapy chatbots can fail safety expectations	Moore et al., ACM FAccT 2025	High	"Safety evaluations found stigmatizing and inappropriate responses in therapy chatbot scenarios."
LLM mental-health evidence is not ready for standalone clinical use	Hua et al., npj Digital Medicine 2025	High	"Current evidence supports cautious, governed integration, not standalone deployment."
Affective use may relate to loneliness and dependence for some users	MIT Media Lab / OpenAI 2025	Medium	"An important early signal; not weighted as peer-reviewed clinical evidence here."
Several US states ban or restrict AI-delivered therapy	Utah HB 452; Nevada AB 406; Illinois WOPR Act; California SB 243; Washington, Iowa, and Oregon companion-chatbot laws	High	"AI-delivered therapy is prohibited or restricted in several US states; deployment is jurisdiction-specific."
Pooled RCT evidence shows a modest, highly variable average effect	Zhang et al., JMIR 2025	Medium-high	"Across 14 RCTs, the average effect was significant but modest, with wide variability between products."
A purpose-built therapy bot did not beat plain ChatGPT in one small trial	Kuta et al., JMIR Mental Health 2026	Medium	"Early head-to-head data do not yet show specialized chatbots outperforming general models."

What clinicians should do with this

Clinicians don't need to become anti-AI to stay careful.

A reasonable clinical position looks like this: AI tools may help with low-intensity support, psychoeducation, between-session exercises, journaling, symptom tracking, and structured CBT-style practice. They may also help people who are waiting for care or who need support between appointments. They should not be treated as independent clinicians.

Humans are still needed for diagnosis, treatment planning, risk checks, safety, crisis response, handling complex cases, ethics, and accountability. These are not extra parts of therapy—they are the core of the work.

Before using an AI mental health tool, ask:

Was it tested in the population you want to use it with?
Did it beat an active treatment, or only a waitlist?
What happens when someone mentions suicide, abuse, psychosis, mania, eating disorders, medical risk, or medication?
Can a clinician audit the outputs?
Are there escalation rules?
What data is stored, and where?
Who is accountable when it fails?
Do the results reach statistical significance?
Is it legal to deploy in your jurisdiction, and under what conditions?

If a vendor can't answer those questions, the product is not clinically ready.

What founders should build instead

The strongest product category is governed clinical support, not "AI therapist."

That can include:

CBT skills practice between sessions
mood and symptom tracking for clinician review
guided psychoeducation
structured journaling with safety detection
waitlist support
therapist-facing summaries that require review
relapse-prevention reminders
triage with strict escalation rules

The safer approach may seem less exciting: it has a narrow focus, includes human review, keeps audit records, uses safety testing, sets clear limits, and avoids unrealistic crisis-care promises.

The Therabot trial shows why the field is worth building in. The FAccT paper shows why general-purpose therapy chatbots are not enough.

What patients should know

If you use AI for mental wellness support, treat it as a tool, not a therapist.

It may help you write down your thoughts, prepare for therapy sessions, practice coping skills, or get through a hard evening. But it can also misunderstand you, overlook real risk, and appear confident even when it's wrong.

Don't rely on a chatbot for crisis care. If you might harm yourself or someone else, contact local emergency services, a crisis line, or a trusted person who can act in the real world.

A chatbot can be helpful late at night, but being available does not mean it is clinically safe.

FAQ

Does AI therapy work?

Some AI mental health tools show promising results. The strongest adult trial from 2025 found that Therabot reduced symptoms compared with a waitlist control. That does not prove AI therapy works for every condition, every patient, or any high-risk situation.

Did Therabot replace therapists?

No. Therabot was studied as a chatbot intervention in a controlled research setting, with monitoring and staff intervention whenever safety concerns appeared. That is not an autonomous therapist replacement.

Is AI therapy safe for suicidal ideation?

Don't assume so. Safety research shows therapy chatbots can respond poorly in suicidal-ideation scenarios. Any mental health AI system needs escalation pathways, crisis handling, monitoring, and clear limits.

Should clinicians use AI therapy tools?

Possibly, but only as governed adjuncts. Clinicians should review the evidence, population fit, safety handling, privacy framework, auditability, and escalation rules before using any tool with patients.

What should AI mental health founders avoid?

Selling autonomous therapy replacement. Build narrow, auditable support tools instead, with human review, clear exclusions, strong privacy protections, and tested safety behavior.

Is AI therapy legal?

It depends on where you are. Illinois and Nevada prohibit AI from independently delivering therapy or mental health care. Utah requires mental health chatbots to clearly disclose that they are AI-powered and restricts the sale of health data. California requires crisis protocols, disclosures, and protections for minors, and allows injured users to sue. Anyone deploying an AI mental health tool needs a state-by-state compliance check.

Bottom line

AI therapy matters now. The adult evidence supports a careful middle position: AI can help as a monitored adjunct for selected users and tasks, and it should not be sold as a replacement for clinicians, especially in difficult, complex, or crisis situations.

This careful approach may not be as exciting as the hype, but it is the one that works in real clinical settings.

References

Heinz MV, Mackin DM, Trudeau BM, et al. "Randomized Trial of a Generative AI Chatbot for Mental Health Treatment." NEJM AI. 2025;2(4). doi:10.1056/AIoa2400802.
Moore J, Grabb D, Agnew W, Klyman K, Chancellor S, Ong DC, Haber N. "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers." Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 2025:599-627. doi:10.1145/3715275.3732039.
Hua Y, Na H, Li Z, et al. "A scoping review of large language models for generative tasks in mental health care." npj Digital Medicine. 2025;8:230. doi:10.1038/s41746-025-01611-4.
Dehbozorgi R, Zangeneh S, Khooshab E, et al. "The application of artificial intelligence in the field of mental health: a systematic review." BMC Psychiatry. 2025;25:132. doi:10.1186/s12888-025-06483-2.
Fang CM, Liu AR, Danry V, et al. "How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study." MIT Media Lab and OpenAI. arXiv:2503.17473. 2025. Preprint; treated here as early/non-peer-reviewed evidence.
Phang J, Lampe M, Ahmad L, et al. "Investigating Affective Use and Emotional Well-being on ChatGPT." OpenAI and MIT Media Lab. arXiv:2504.03888. 2025. Preprint; released publicly under the title "Early methods for studying affective use and emotional well-being on ChatGPT."
Illinois Wellness and Oversight for Psychological Resources (WOPR) Act (2025); Nevada AB 406 (2025); Utah HB 452 (2025); California SB 243 (2026); Washington HB 2225, Iowa SF 2417 and Oregon SB 1546 companion-chatbot laws. State statutes regulating AI in mental health and companion-chatbot contexts.
Zhang Q, Zhang R, Xiong Y, Sui Y, Tong C, Lin F-H. "Generative AI Mental Health Chatbots as Therapeutic Tools: Systematic Review and Meta-Analysis of Their Role in Reducing Mental Health Issues." J Med Internet Res. 2025;27:e78238. doi:10.2196/78238.
Kuta B, Novak L, Zidkova R, Furstova J, Malinakova K, De Winter A, Husek V. "Effectiveness of a Fully Automated Mobile Therapeutic Versus a General Chatbot in Reducing Depression and Anxiety and Improving Well-Being: Feasibility Randomized Controlled Trial." JMIR Ment Health. 2026;13:e82642. doi:10.2196/82642.
Heinz MV, Mackin DM, Trudeau BM, Wang Y, Salzhauer AJ, Griffin TZ, Jacobson NC. "Response to Letters about 'Randomized Trial of a Generative AI Chatbot for Mental Health Treatment.'" NEJM AI. 2025;2(9). doi:10.1056/AIp2500680.