AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

Can we use AI chatbots in law practices? Here’s the current thinking, from Stanford University’s Law School:

In a new preprint study by Stanford RegLab and HAI researchers, we put the claims of two providers, LexisNexis (creator of Lexis+ AI) and Thomson Reuters (creator of Westlaw AI-Assisted Research and Ask Practical Law AI)), to the test. We show that their tools do reduce errors compared to general-purpose AI models like GPT-4. That is a substantial improvement and we document instances where these tools provide sound and detailed legal research. But even these bespoke legal AI tools still hallucinate an alarming amount of the time: the Lexis+ AI and Ask Practical Law AI systems produced incorrect information more than 17% of the time, while Westlaw’s AI-Assisted Research hallucinated more than 34% of the time.

That’s not an ideal hallucination rate.

Read their report here.

Leave a comment