
TrainAI's LLM synthetic data generation study benchmarks nine popular large language models on six data generation tasks across eight languages using human expert evaluators
When it comes to large language models (LLMs) and their ability to generate sentences and conversations, Claude Sonnet, GPT and Gemini Pro come out on top, according to TrainAI's latest LLM benchmarking study.
Unlike typical automated LLM benchmarks that assess performance on closed questions, TrainAI's LLM Synthetic Data Generation Study used human expert evaluators to test the ability of popular LLMs to generate sentences and conversations, assessing their general natural language processing (NLP) skills across a variety of languages.
"We conducted this study because reports suggest that the largest companies behind today's state-of-the-art LLMs are running out of data1 to train their newest models," explains Tomáš Burkert, TrainAI's technical solutions lead on the benchmarking project. "Companies like OpenAI, Anthropic and Google are exploring the use of synthetic data generated by the LLMs themselves (as opposed to humans) to train and fine-tune their AI models. We wanted to explore the potential impact of using LLMs to generate training and fine-tuning data for AI."
Nine LLMs were tested on six data generation tasks varying in complexity, across eight carefully selected languages with varying representation. For each language, three native speaking language specialists evaluated the LLM-generated outputs against specific criteria (such as grammar and naturalness). Overall, 38,000 sentences were generated, 115,000 annotations submitted, and 250,000 ratings from 1 (very poor) to 5 (very good) provided by 27 linguists across the globe.
"Because AI is built for humans, we chose humans not AI to evaluate LLM performance. Our study found that no single model outperformed the rest when generating synthetic data across languages and tasks, but some models performed better than others on key criteria like language proficiency, instruction adherence, creativity, speed and cost," said Vasagi Kothandapani, President of Enterprise Services at RWS. "The study underscores the importance of assessing the strengths and limitations of multiple LLMs for specific AI use cases or applications. Only then can genuine value and positive business impact be realized."
Notes to editors:
- Download your copy of TrainAI's LLM Synthetic Data Generation Study.
- TrainAI by RWS provides complete, end-to-end data collection, annotation validation, and generative AI training and fine-tuning services for all types of AI data, in any language, at any scale, based on the principles of responsible AI.
About RWS
RWS Holdings plc is a unique, world-leading provider of technology-enabled language, content and intellectual property services. Through content transformation and multilingual data analysis, our combination of AI-enabled technology and human expertise helps our clients to grow by ensuring they are understood anywhere, in any language.
Our purpose is unlocking global understanding. By combining cultural understanding, client understanding and technical understanding, our services and technology assist our clients to acquire and retain customers, deliver engaging user experiences, maintain compliance and gain actionable insights into their data and content.
Over the past 20 years we've been evolving our own AI solutions as well as helping clients to explore, build and use multilingual AI applications. With 45+ AI-related patents and more than 100 peer-reviewed papers, we have the experience and expertise to support clients on their AI journey.
We work with over 80% of the world's top 100 brands, more than three-quarters of Fortune's 20 'Most Admired Companies' and almost all of the top pharmaceutical companies, investment banks, law firms and patent filers. Our client base spans Europe, Asia Pacific, Africa and North and South America. Our 60+ global locations across five continents service clients in the automotive, chemical, financial, legal, medical, pharmaceutical, technology and telecommunications sectors.
Founded in 1958, RWS is headquartered in the UK and publicly listed on AIM, the London Stock Exchange regulated market (RWS.L).
For further information, please visit: www.rws.com.
______________________ |
1 Villalobos, P., Ho, A., Sevilla, J., Besiroglu, T., Heim, L. and Hobbhahn, M. (2024). Position: Will we run out of data? Limits of LLM scaling based on human-generated data. Proceedings of Machine Learning Research 235:49523-49544. Available from proceedings.mlr.press/v235/villalobos24a |
View source version on businesswire.com: https://www.businesswire.com/news/home/20250425786534/en/
Contacts:
RWS
Denis Davies
Corporate Communications
ddavies@rws.com
+44 1628 410105