AI therapy has moved out of the toy phase. The evidence is now strong enough to take seriously and risky enough to handle with discipline.
The best 2025 studies point in two directions. A fine-tuned generative AI chatbot reduced symptoms in adults in a randomized trial. Safety researchers also found that LLM therapy chatbots can give stigmatizing, inappropriate, or unsafe responses.
The honest story is governance. Useful under constraints, unsafe when sold as therapist replacement.
Answer capsule
The best 2025 evidence suggests AI therapy can help some adults under controlled conditions, especially as structured, monitored support. The Therabot randomized trial found symptom reductions in adults with depression, anxiety, or eating-disorder risk. But ACM FAccT safety research found therapy chatbots can respond unsafely. Current evidence supports AI as an adjunct, not an autonomous therapist replacement. Regulators have already drawn the line: several US states now prohibit or sharply restrict AI from delivering therapy on its own.
Suggested internal links before publishing: CBT, online therapy, digital mental health, clinical AI safety, and ethics in psychotherapy.
The headline study: Therabot
The most important adult clinical study is Heinz et al., "Randomized Trial of a Generative AI Chatbot for Mental Health Treatment," published in NEJM AI in 2025.
The study tested Therabot, a fine-tuned generative AI chatbot from Dartmouth's AI and Mental Health Lab. It included 210 adults with clinically significant symptoms of major depressive disorder, generalized anxiety disorder, or clinical high risk for feeding or eating disorders.
Participants were randomized to Therabot vs a waitlist control, with a 4-week treatment phase and 8-week total follow-up.
The result was promising. Therabot users had significant symptom reductions compared with the waitlist group.
The size of the effect matters too. At the eight-week follow-up, Therabot users reported a 51% average reduction in depression symptoms, a 31% reduction in anxiety symptoms, and a 19% reduction in eating-disorder concerns, all outpacing controls.
Those are outpatient-therapy-sized numbers. Earned against a waitlist, which is the easiest comparison in the field. Keep both halves of that sentence.
Engagement was high too. Dartmouth's Center for Technology and Behavioral Health summary reports that 95% of participants interacted with the chatbot, sent an average of 260 messages, and used it for an average of 24 days over the course of the study. Participants kept access for four weeks after the treatment phase.
The most striking finding was alliance. Participants rated their working alliance with Therabot within the range of outpatient psychotherapy norms.
That result matters, and vendors can misuse it. A strong alliance rating does not make a chatbot a therapist. It means people can feel supported by a well-designed therapeutic chatbot. Those are different claims.
The safety detail that should stop the hype
The same Therabot summary reports that safety concerns, including suicidal ideation, required staff intervention 15 times. Inappropriate responses, including medical advice, required correction 13 times.
This is not a minor limitation. It tells us what made the trial safer: monitoring, guardrails, and humans in the loop.
Remove those controls and you are no longer talking about the same intervention.
The clean takeaway is narrower. A carefully designed, fine-tuned, monitored AI chatbot reduced symptoms for selected adults over a short period, compared with a waitlist control.
That wording will not travel as far online, but it is accurate.
The safety counterweight FAccT 2025
The strongest counterpoint is Moore et al., "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers," published in the 2025 ACM Conference on Fairness, Accountability, and Transparency.
The paper evaluated popular LLM-based therapy chatbots against expectations for good therapeutic response. The reported findings are uncomfortable. Chatbots could produce stigmatizing, inappropriate, or unhelpful answers, especially around complex or severe conditions. In suicidal-ideation scenarios, some responses failed to meet basic safety expectations.
Product teams like to skip this part.
Mental health is not normal chat. A model can sound empathic while missing clinical risk. It can validate the wrong thing. It can give advice when it should escalate. It can be fluent and unsafe at the same time.
"It sounds like a therapist" is not a safety standard.
The review evidence says: promising, but not standalone
Hua et al.'s 2025 npj Digital Medicine scoping review looked at large language models for generative tasks in mental health care.
The review screened 726 unique articles and included 16 studies. The authors found early promise across clinical assistance, counseling, therapy, and emotional support. But they also found major weaknesses. Non-standardized evaluation, ad hoc rating scales, reliance on proprietary models, and poor coverage of safety, privacy, fairness, transparency, and reproducibility.
Their practical conclusion is blunt: current evidence does not fully support standalone clinical use.
Dehbozorgi et al.'s 2025 BMC Psychiatry systematic review reaches a similar broader view. AI may help with detection, monitoring, engagement, personalization, and access. But data privacy, algorithm transparency, uneven methodological quality, and the need for stakeholder involvement remain serious issues.
The research base is no longer empty. It is not settled.
The law moved before the hype settled
While researchers argued about evidence, legislatures acted.
Utah went first. Its mental health chatbot law took effect in May 2025, requiring clear AI disclosures and restricting the sale of users' health information. Nevada followed in July, barring AI systems from directly providing mental or behavioral health care. Illinois passed the WOPR Act in August 2025, a law that prohibits AI from independently delivering therapy or psychotherapy, with fines up to $10,000 per violation.
California's SB 243 took effect in January 2026. It requires companion-chatbot operators to maintain mental health crisis protocols, disclosures, protections for minors, and a private right of action for injured users. Washington, Iowa, and Oregon then added crisis-detection and crisis-resource referral duties, with some provisions taking effect later.
Read that list again. "AI therapist" is not just an unproven product category. In several US states, it is now illegal or tightly conditioned.
This is what governance looks like when it leaves the pitch deck. It has statute numbers.
What about loneliness and emotional dependence?
MIT Media Lab and OpenAI's 2025 affective-use research is worth reading, but it should not be weighted like a peer-reviewed clinical trial unless a journal version is confirmed.
The work examined emotional engagement with ChatGPT, including loneliness, real-world social interaction, emotional dependence, and problematic use. It included large-scale usage analysis and a four-week randomized study with nearly 1,000 participants.
The warning signal is plausible. For some users, heavier daily use may relate to more loneliness, more dependence, more problematic use, and less socialization. But the authors also warn against overgeneralizing. Effects differ by person, use pattern, and model behavior.
Treat this as a warning light, not a final verdict.
What the year after the headlines added
The Therabot result did not go unchallenged. NEJM AI published critical letters and an authors' response. That is what should happen to a headline trial.
The first meta-analysis arrived too. A 2025 review in the Journal of Medical Internet Research pooled 14 randomized trials of generative AI chatbots, more than 6,300 participants in total. The average effect was real but modest, and it barely cleared statistical significance: effect size 0.30, P=.047, with a 95% confidence interval from 0.004 to 0.59. The prediction interval ran from -0.85 to 1.67.
Translation: on average, it helps. For a given product and a given person, you do not know.
One more result deserves attention. A small 2026 pilot randomized adults between a purpose-built therapy chatbot, plain ChatGPT, and an assessment-only control. Both chatbot groups improved on depression compared with control. The purpose-built bot did not significantly beat ChatGPT.
That is one underpowered pilot, not a verdict. But it puts a number on an uncomfortable question: how much of the benefit comes from the therapy layer, and how much from the model underneath? "Fine-tuned" is a claim to test, not a feature to assume.
One last detail from the meta-analysis: roughly 69% of the chatbot interventions it reviewed included some form of human assistance. The field's own evidence base keeps arriving with humans attached.
Evidence table
| Claim | Evidence | Confidence | Safe wording |
|---|---|---|---|
| Therabot reduced symptoms in adults compared with waitlist | Heinz et al., NEJM AI 2025 | High | "Therabot showed symptom reductions under controlled trial conditions." |
| Users reported strong alliance with Therabot | Heinz et al., NEJM AI 2025 | Medium-high | "Participants rated working alliance highly; this does not prove therapist equivalence." |
| Human monitoring was still needed | Heinz et al. summary via CTBH | High | "Safety concerns and inappropriate responses required intervention/correction." |
| LLM therapy chatbots can fail safety expectations | Moore et al., ACM FAccT 2025 | High | "Safety evaluations found stigmatizing/inappropriate responses in therapy chatbot scenarios." |
| LLM mental-health evidence is not ready for standalone clinical use | Hua et al., npj Digital Medicine 2025 | High | "Current evidence supports cautious, governed integration, not standalone deployment." |
| Affective use may relate to loneliness/dependence for some users | MIT Media Lab/OpenAI 2025 | Medium | "Important early signal; not weighted as peer-reviewed clinical evidence here." |
| Several US states ban or restrict AI-delivered therapy | Utah HB 452; Nevada AB 406; Illinois WOPR Act; California SB 243; Washington, Iowa, and Oregon companion-chatbot laws | High | "AI-delivered therapy is prohibited or restricted in several US states; deployment is jurisdiction-specific." |
| Pooled RCT evidence shows a modest, highly variable average effect | Zhang et al., JMIR 2025 | Medium-high | "Across 14 RCTs, the average effect was significant but modest, with wide variability between products." |
| A purpose-built therapy bot did not beat plain ChatGPT in one small trial | Kuta et al., JMIR Mental Health 2026 | Medium | "Early head-to-head data do not yet show specialized chatbots outperforming general models." |
What clinicians should do with this
Clinicians do not need to become anti-AI to stay careful.
A reasonable clinical position: AI tools may help with low-intensity support, psychoeducation, between-session exercises, journaling, symptom tracking, and structured CBT-style practice. They may also help people waiting for care or needing support between appointments.
Do not treat them as autonomous clinicians.
The human role still matters for diagnosis, formulation, risk assessment, safeguarding, crisis response, therapeutic rupture, comorbidity, ethics, and accountability. These are not decorative parts of therapy. They are the job.
Before using an AI mental health tool, ask:
- Was it tested in the population you want to use it with?
- Did it beat an active treatment or only a waitlist?
- What happens when someone mentions suicide, abuse, psychosis, mania, eating-disorder medical risk, or medication?
- Can a clinician audit the outputs?
- Are there escalation rules?
- What data is stored?
- Who is accountable when it fails?
- Do the results reach statistical significance?
- Is it legal to deploy in your jurisdiction, and under what conditions?
If a vendor cannot answer those questions, the product is not clinically ready.
What founders should build instead
The strongest product category is governed clinical support, not "AI therapist."
That can include:
- CBT skills practice between sessions
- mood and symptom tracking for clinician review
- guided psychoeducation
- structured journaling with safety detection
- waitlist support
- therapist-facing summaries that require review
- relapse-prevention reminders
- triage with strict escalation rules
The safer version looks boring. Narrow scope, human oversight, audit logs, safety testing, clear exclusions, and no crisis-care fantasy.
The Therabot trial shows why the field is worth building in. The FAccT paper shows why general-purpose therapy chatbots are not enough.
What patients should know
If you use AI for mental health support, treat it as a tool, not a therapist.
It may help you write down thoughts, prepare for therapy, practice coping skills, or get through a hard evening. But it can also misunderstand you. It can miss risk. It can sound confident while being wrong.
Do not rely on a chatbot for crisis care. If you might harm yourself or someone else, contact local emergency services, a crisis line, or a trusted person who can act in the real world.
A chatbot at 2 a.m. can help. Availability is not clinical safety.
FAQ
Does AI therapy work?
Some AI mental health tools show promising results. The strongest 2025 adult trial found that Therabot reduced symptoms compared with a waitlist control. But that does not prove that AI therapy works for all conditions, all patients, or high-risk situations.
Did Therabot replace therapists?
No. Therabot was studied as a chatbot intervention in a controlled research setting. The trial included monitoring and staff intervention when safety concerns appeared. That is not autonomous therapist replacement.
Is AI therapy safe for suicidal ideation?
Do not assume that. Safety research shows therapy chatbots can respond poorly in suicidal-ideation scenarios. Any mental health AI system needs escalation pathways, crisis handling, monitoring, and clear limits.
What is the strongest AI therapy study so far?
For adult generative AI therapy, Heinz et al.'s 2025 NEJM AI randomized trial of Therabot is the strongest headline study. It is important because it tested a fine-tuned chatbot in adults with clinically significant symptoms.
Should clinicians use AI therapy tools?
Possibly, but only as governed adjuncts. Clinicians should review the evidence, population fit, safety handling, privacy policy, auditability, and escalation rules before using any tool with patients.
What should AI mental health founders avoid?
Avoid selling autonomous therapy replacement. Build narrow, auditable support tools with human review, clear exclusions, strong privacy, and tested safety behavior.
Is AI therapy legal?
It depends where you are. Illinois and Nevada prohibit AI from independently delivering therapy or mental health care. Utah requires mental health chatbots to clearly disclose they are AI and restricts health-data sales. California requires crisis protocols, disclosures, protections for minors, and lets injured users sue. Anyone deploying an AI mental health tool needs a state-by-state compliance check.
Bottom line
AI therapy now matters. The adult evidence supports a careful middle position. AI can help as a monitored adjunct for selected users and tasks. It should not be sold as a replacement for clinicians, especially in severe, complex, or crisis situations.
That version is less exciting than the hype. It can survive contact with clinical reality.
References
- Heinz MV, Mackin DM, Trudeau BM, et al. "Randomized Trial of a Generative AI Chatbot for Mental Health Treatment." NEJM AI. 2025;2(4). doi:10.1056/AIoa2400802.
- Moore J, Grabb D, Agnew W, Klyman K, Chancellor S, Ong DC, Haber N. "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers." Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 2025:599-627. doi:10.1145/3715275.3732039.
- Hua Y, Na H, Li Z, et al. "A scoping review of large language models for generative tasks in mental health care." npj Digital Medicine. 2025;8:230. doi:10.1038/s41746-025-01611-4.
- Dehbozorgi R, Zangeneh S, Khooshab E, et al. "The application of artificial intelligence in the field of mental health: a systematic review." BMC Psychiatry. 2025;25:132. doi:10.1186/s12888-025-06483-2.
- Fang CM, Liu AR, Danry V, et al. "How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study." MIT Media Lab and OpenAI. arXiv:2503.17473. 2025. Preprint; treated here as early/non-peer-reviewed evidence.
- Phang J, Lampe M, Ahmad L, et al. "Investigating Affective Use and Emotional Well-being on ChatGPT." OpenAI and MIT Media Lab. arXiv:2504.03888. 2025. Preprint; released publicly under the title "Early methods for studying affective use and emotional well-being on ChatGPT."
- Illinois Wellness and Oversight for Psychological Resources (WOPR) Act (2025); Nevada AB 406 (2025); Utah HB 452 (2025); California SB 243 (2026); Washington HB 2225, Iowa SF 2417 and Oregon SB 1546 companion-chatbot laws. State statutes regulating AI in mental health and companion-chatbot contexts.
- Zhang Q, Zhang R, Xiong Y, Sui Y, Tong C, Lin F-H. "Generative AI Mental Health Chatbots as Therapeutic Tools: Systematic Review and Meta-Analysis of Their Role in Reducing Mental Health Issues." J Med Internet Res. 2025;27:e78238. doi:10.2196/78238.
- Kuta B, Novak L, Zidkova R, Furstova J, Malinakova K, De Winter A, Husek V. "Effectiveness of a Fully Automated Mobile Therapeutic Versus a General Chatbot in Reducing Depression and Anxiety and Improving Well-Being: Feasibility Randomized Controlled Trial." JMIR Ment Health. 2026;13:e82642. doi:10.2196/82642.
- Heinz MV, Mackin DM, Trudeau BM, Wang Y, Salzhauer AJ, Griffin TZ, Jacobson NC. "Response to Letters about 'Randomized Trial of a Generative AI Chatbot for Mental Health Treatment.'" NEJM AI. 2025;2(9). doi:10.1056/AIp2500680.