DJ LightOn sets new standards for complex information retrieval (RAG) with GTEModernColBERT.
LIGHTON
LightOn sets new standards for complex information retrieval (RAG) with GTEModernColBERT.
07-May-2025 / 17:45 CET/CEST
Dissemination of a French Regulatory News, transmitted by EQS Group.
The issuer is solely responsible for the content of this announcement.
=----------------------------------------------------------------------------------------------------------------------
Press Release
Paris, May 7, 2025
LightOn sets new standards for complex information retrieval (RAG) with GTE-ModernColBERT.
LightOn is proud to announce the release of GTE-ModernColBERT, our new state-of-the-art, open-source, multi-vector
retrieval model. By leveraging ModernBERT architecture and our innovative PyLate library, we've created a solution that
sets a new milestone in the field and addresses the complex challenges of modern enterprise information retrieval. This
new model outperforms models of the ecosystem (Alibaba, Snowflake, Cohere, BAAI, JinaAI..) in the industry-standard
LongEmbed benchmark.
Breaking New Ground in Retrieval Technology
Traditional single-vector embedding models have become standard in the industry, but as enterprise needs evolve toward
handling longer contexts and specialized domains, their limitations become increasingly apparent.
GTE-ModernColBERT-base represents a significant leap forward with its state-of-the-art multi-vector (late interaction)
architecture, offering:
Outstanding generalization capability for long documents
GTE-ModernColBERT sets a new benchmark (SOTA - State of the Art) for generalization with long contexts. It outperforms
the best existing models by a 10 point margin (LongEmbed benchmark) on documents up to 32,000 tokens, equivalent to
texts spanning dozens of pages, even though it was initially trained only on 300 token excerpts from the MS MARCO
dataset. These early results indicate that GTE-ModernColBERT could further extend its capabilities, delivering
excellent performance even beyond this already impressive context window.
-- Extended context handling for documents up to 32,000 tokens
-- Superior generalization for domain-specific, confidential, or specialized content
-- Breakthrough performance as the first model to surpass ColBERT-small on the BEIR benchmark
-- Remarkable efficiency through ModernBERT's architectural advancements
LightOn's Technical Innovation
LightOn created GTE-ModernColBERT as an unique solution by identifying and building upon key elements:
1. Modern encoder: LightOn built ModernBERT to enable the creation of powerful and up to date retrieval
models. GTE-ModernColBERT is a direct follow-up of this first release to extend on the very promising multi-vector
approach.
2. PyLate Library: We developed a framework to enable streamlined implementation to experiment and train
multi-vector retrieval models. Only 80 lines of code are needed to reproduce the training process.
3. Knowledge Distillation: By training on MS MARCO via knowledge distillation, we've created a lightweight
yet powerful model that doesn't compromise on performance.
4. Compatibility Focus: Most major vector databases including QDrant, LanceDB,Weaviate and Vespa now support
multi-vectors indexation, making enterprise adoption frictionless.
Transforming Enterprise RAG Implementations
GTE-ModernColBERT fundamentally transforms how organizations can implement Retrieval-Augmented Generation (RAG) by:
-- Enhancing search quality within proprietary knowledge bases
-- Maintaining high performance even with highly specialized content
-- Supporting enterprise-scale document processing
-- Enabling more accurate retrieval for AI-generated responses
Real-World Impact
For knowledge management teams and AI solution developers, GTE-ModernColBERT offers the ideal foundation for
next-generation information systems. Its ability to process large volumes of text while maintaining contextual
understanding makes it particularly valuable for:
-- Legal document analysis
-- Scientific research repositories
-- Technical documentation search
-- Customer support knowledge bases
-- Internal enterprise knowledge management
Open Source Commitment
After the release of ModernBERT and ModernBERT-embed, by releasing GTE-ModernColBERT as an Apache 2.0 licensed
open-source solution, LightOn continues its commitment to advancing the field of AI while enabling organizations of all
sizes to benefit from cutting-edge retrieval technology and empower research through open sourcing PyLate as well.
For organizations seeking to stay ahead in Knowledge Management and RAG, GTE-ModernColBERT is now available. Try it out
and (re)discover the hidden value within your documents!
?? Try it today on Hugging Face
?? Get started: PyLate Documentation
About LightOn
Founded in 2016 in Paris and the first European generative AI company listed on Euronext Growth, LightOn is a
pioneering player in the field of sovereign GenAI. Its Paradigm platform enables organizations to deploy large-scale AI
while ensuring the confidentiality of their data. LightOn's technology ensures essential strategic independence by
offering tailored solutions. This technological mastery is accompanied by the ability to process large volumes of data
for industrial uses, with applications in various sectors such as finance, industry, health, defense, and public
services.
LightOn is listed on Euronext Growth® Paris (ISIN: FR0013230950, ticker: ALTAI-FR). The company qualifies for PEA and
PEA PME investment plans and is recognized as an "Innovative Company" by Bpifrance. To learn more: https://
www.lighton.ai
Contacts
SEITOSEI --ACTIFIN
LIGHTON Investor Relations
invest@lighton.ai Benjamin LEHARI
lighton@seitosei-actifin.com
KALAMARI
SEITOSEI --ACTIFIN
Media Relations Financial Media Relations
Camille Bernisson - +33 7 64 44 14 49 Jennifer JULLIA - +33 6 47 97 54 87
Maroua Derdega - +33 7 63 77 73 20 jennifer.jullia@seitosei-actifin.com
lighton@kalamari.agency
-----------------------------------------------------------------------------------------------------------------------
Regulatory filing PDF file File: ModernColBERT ENG
=----------------------------------------------------
Language: English
Company: LIGHTON
2 rue de la Bourse
75002 Paris
France
E-mail: contact@lighton.ai
Internet: www.lighton.ai
ISIN: FR0013230950
Euronext Ticker:
AMF Category: Inside information / Other releases
EQS News ID: 2132500
End of Announcement EQS News Service
=------------------------------------------------------------------------------------
2132500 07-May-2025 CET/CEST
Image link: https://eqs-cockpit.com/cgi-bin/fncls.ssp?fn=show_t_gif&application_id=2132500&application_name=news&site_id=dow_jones%7e%7e%7ef1066a31-ca00-4e1a-b0a4-374bd7d0face
(END) Dow Jones Newswires
May 07, 2025 11:45 ET (15:45 GMT)
© 2025 Dow Jones News

