Making complex text understandable: Minimally-lossy text simplification with Gemini

This study demonstrates the potential of LLMs for accessible information dissemination, allowing expert knowledge to reach a broader audience without compromising accuracy.

The digital age offers ever growing access to vast amounts of knowledge, yet much remains locked behind complex language and specialist jargon. While complexity is often necessary in expert discourse, it can become a barrier when users need to understand information critical to their lives, such as navigating health information, understanding legal language, or grasping financial details. Tools that let users produce a simplified version of complex text that they encounter online can empower them to engage with those texts when they wouldn't have been able to otherwise.

Today, in “LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load”, Google Research introduces a system using Gemini models specifically designed for minimally-lossy (high fidelity) text simplification. The goal of this system is to enhance clarity while meticulously preserving the original meaning, detail, and nuance. This is a distinct goal from summarization (which is ok with dropping information) or explanation (which often adds information). We also launch this system in a new feature in the Google app for iOS, Simplify.

Achieving this requires models to paraphrase complex ideas accurately without introducing errors or omitting key details. The re-written text must help the reader understand challenging material without sacrificing the integrity of the original information.

This work offers two primary contributions. First, we present a novel system featuring an automated evaluation and iterative prompt refinement loop: this enables Gemini models to discover the most effective prompt for high-fidelity text simplification by iterating at a scale and speed impractical to achieve with manual prompt optimization. Second, through a rigorous, large-scale randomized study, we demonstrate that text simplification measurably improves user comprehension and reduces cognitive load.

Gemini-powered automatic evaluation and prompt refinement system

In order to achieve our goals, we developed an automated approach leveraging Gemini models for evaluation of simplification quality and self-refinement of prompts. However, crafting prompts for nuanced simplification, where readability must improve without sacrificing meaning or detail, is challenging. An automated system addresses this challenge by enabling the extensive trial-and-error needed to discover the most effective prompt.

Automated evaluation

Manual evaluation is impractical for rapid iteration. Our system employs two novel evaluation components:

  1. Readability assessment: Moving beyond simplistic metrics like Flesch-Kincaid, we used a Gemini prompt to score text readability on a 1-10 scale. This prompt was iteratively refined against human judgment, enabling a more nuanced assessment of comprehension ease. We observed in testing that this LLM-based readability assessment aligns better with human readability assessments than Flesch-Kincaid.
  2. Fidelity assessment: Ensuring meaning preservation is critical. Using Gemini 1.5 Pro, we implemented a process that maps claims from the original text to the simplified version. This method identifies specific error types like information loss, gain, or distortion, each weighted by severity, providing a granular measure of faithfulness to the original meaning (completeness and entailment).

Iterative prompt refinement: LLMs optimizing LLMs

The quality of the final simplification (generated by Gemini 1.5 Flash) heavily depends on the initial prompt. We automated the prompt optimization process itself via a prompt refinement loop: using the autoeval scores for readability and fidelity, another Gemini 1.5 Pro model analyzed the simplification prompt's performance and proposed refined prompts for the next iteration.

This creates a powerful feedback loop where an LLM system iteratively improves its own instructions based on performance metrics, reducing reliance on manual prompt engineering and enabling the discovery of highly effective simplification strategies. For this work, the loop ran for 824 iterations until performance plateaued.

This automated process, where one LLM evaluates the output of another and refines its instructions (prompts) based on performance metrics (readability and fidelity) and granular errors, represents a key innovation. It moves beyond laborious manual prompt engineering, enabling the system to autonomously discover highly effective strategies for nuanced simplification over hundreds of iterations.

Simplify1_SummaryHero

Summary of Gemini-based approach for minimally-lossy text simplification.

Measuring impact: A large-scale randomized study

To validate the real-world effectiveness of text simplified with this approach, we conducted a randomized controlled study.

Study design

  • Participants: A large cohort of 4,563 consenting participants was recruited after screening for topic expertise.
  • Texts: We used 31 diverse, real-world text excerpts across domains known for complexity: medical research, biology, law, finance, literature, philosophy, aerospace, and computer science.
  • Comparison: Using a randomized complete block design (a study design that compares groups while accounting for variations), participants were randomly assigned to read either the original text, the simplified version, or both. To evaluate any further effects of text simplification on short-term retention of the content, users were tested under two conditions: one where they could refer back to the text while answering questions, and one where they could not.
  • Measurements: We assessed comprehension using carefully reviewed multiple-choice questions (MCQs); self-reported confidence; and cognitive load via a simplified NASA Task Load Index.
Simplify2_DesignFinal

Study design to evaluate the simplification model with real texts.

Results

Our study, encompassing nearly 50,000 MCQ answers, yielded statistically significant results demonstrating the value of simplification.

Quantitative findings

Participants reading the simplified text achieved a 4% absolute increase in MCQ accuracy overall compared to those reading the original. The impact was most pronounced for the highly complex PubMed texts, which saw a 15% absolute accuracy gain. Significant gains were also observed in finance (6%), legal (4%), and technical — aeronautics/computer science (4%) — domains. These gains were robust even when participants could not refer back to the text, suggesting benefits for both immediate comprehension and short-term retention.

Beyond accuracy, participants reported higher confidence in their answers (average improvement of 0.24 on a -2 to 2 scale) and found the task easier (average improvement of 0.33 on a -2 to 2 scale, simplified from the task load index) when interacting with simplified text.

Qualitative insights

Reviewing examples where simplification significantly boosted participant MCQ accuracy (by 38% for one medical research text) reveals how it adds value. Consider the original passage below compared to its simplified version. The simplification enhances clarity by defining jargon (like 'emphysema' and 'fibrosis'), breaking down a dense sentence, and clarifying complex relationships.

Simplify3_Excerpt

*Excerpt from a biomedical article, PMC10177208, Creative Commons license (CC BY 4.0).

Simplification proved particularly helpful for texts where baseline comprehension was low, as further illustrated by additional examples in our paper.

Study limitations

Our study, while large-scale, has limitations. We leveraged a survey platform to recruit study participants, and this may not fully reflect a group of users actively seeking to understand complex information. While our system aims for high fidelity, LLM errors are possible, requiring ongoing vigilance. Lastly, MCQs, while scalable, offer an incomplete measure of deep understanding.

Available as a new Simplify feature

Starting today this capability is available in a new feature on the Google app for iOS, Simplify. To use it, users can select any complex text on a web page they’re visiting in the Google app, then tap the “Simplify” icon that appears to see a new, simpler version of the text, without having to stop reading or leave the page. This makes it easier for people to grasp new or complex topics they might encounter when trying to learn something new on the web.

Conclusion

We developed and rigorously validated an automated system using Gemini that iteratively learns to simplify text while maintaining fidelity to the original. By demonstrably bridging the comprehension gap for complex information, this capability significantly enhances understanding and reduces cognitive load for users across critical domains.

Acknowledgements

This work involved the efforts of a multidisciplinary team of software engineers, researchers, clinicians and cross functional contributors. Key contributors to this project include: Theo Guidroz, Jimmy Li, Adam Mansour, Paul Jhun, Nina Gonzalez, Xiang Ji, Mike Sanchez, Mathias MJ Bellaiche, Miguel Ángel Garrido, Faruk Ahmed, Divyansh Choudhary, Jay Hartford, Chenwei Xu, Henry Javier Serrano Echeverria, Yifan Wang, Jeff Shaffer, Eric (Yifan) Cao, Yossi Matias, Avinatan Hassidim, Dale R Webster, Yun Liu, Sho Fujiwara, Peggy Bui, Quang Duong.