Large Language Models Benchmarks

Frontier AI models don't just delete document content — they rewrite it, and the errors are nearly impossible to catch

Frontier AI models corrupt 25% of document content in multi-step workflows — rewriting rather than deleting, which makes the ...

Morning Overview on MSN

Google’s TurboQuant algorithm slashes the memory bottleneck that limits how many AI models can run at once

Running a large language model is expensive, and a surprising amount of that cost comes down to memory, not computation.

1mon

MLCommons Releases New MLPerf Inference v6.0 Benchmark Results

Today, MLCommons ® announced new results for its industry-standard MLPerf ® Inference v6.0 benchmark suite. This release includes several important advances that ensure the benchmark suite tests ...

Medscape

Language Models as Emergency Room Decision Support

A Science study finds modern large language models often match or exceed physicians in emergency room diagnostic decisions, ...

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Medical Device and Diagnostic Industry (MD+DI)

How Large Language Models Are Reshaping Health Prediction & Clinical Decision Making

Large Language Models (LLMs) such as GPT-4, Gemini-Pro, Llama 2, and medical-domain-tuned variants like Med-PaLM 2 have ...

AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.

AI IQ ranks frontier AI models like ChatGPT, Claude and Gemini on the human IQ scale, sparking debate over how artificial ...

iAfrica

Egyptian Startup Releases Open-Source AI Model That Outperforms Larger Global Rivals on Key Benchmarks

A Cairo-based artificial intelligence startup has released Horus 1.0-4B, a fully open-source large language model built in Egypt that outperforms several ...

Bloomberg L.P.

Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance

NEW YORK – Bloomberg today released a research paper detailing the development of BloombergGPT TM, a new large-scale generative artificial intelligence (AI) model. This large language model (LLM) has ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results