Key Enabling Technologies

Multi-Trillion Parameter Models

Cerebras combines Wafer-Scale architecture with our innovative weight streaming technology to support massive models simply and easily, without complex hacks.

Learn more

Data Parallel Is All You Need

Erase the pain of distributed computing. Cerebras Clusters run strictly data-parallel, so you can distribute work across tens of millions of Cerebras cores with a single keystroke.

Learn more

Linear Performance Scaling

Powered by our weight streaming technology, Cerebras Wafer-Scale Clusters effortlessly deliver near-linear scaling to hundreds of nodes.

Learn more

50K Context Out of the Box

Build models that can reason over huge documents with native 50K context length support.

Learn more

Train Faster with Sparsity

With native support for dynamic and unstructured sparsity at any level, you can train models up to 8x faster.

Learn more
Featured Case Studies

Introducing Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy - Cerebras

Cerebras and Neural Magic have achieved a major milestone in the field of large language models (LLMs).

Core42 Sets New Benchmark for Arabic Large Language Models with the Release of Jais 30B

Latest Jais model iteration shows stronger performance across content generation, summarization, Arabic-English translation.

How we fine-tuned Llama2-70B to pass the US Medical License Exam in a week

New open-access model by M42 outperforms GPT-3.5 in standardized medical exam.

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model

Cerebras and Opentensor introduce a new standard for compact large language models


Hugging Face

Explore all our latest open source models

Visit Cerebras Hugging Face

GitHub Model Zoo

See reference implementations of popular LLMs on Cerebras

Visit Cerebras Model Zoo


Your full guide to programming Cerebras hardware with PyTorch 2.0

Explore Developer Documentation

“We note that these training runs frequently take >1 week on dedicated GPU resources (such as Polaris@ALCF). To enable training of the larger models on the full sequence length (10,240 tokens), we leveraged AI-hardware accelerators such as Cerebras CS-2, both in a stand-alone mode and as an inter-connected cluster, and obtained GenSLMs that converge in less than a day.”

Award-winning research

2022 Gordon Bell Prize for COVID Research

A team led by researchers from Argonne National Laboratory and Cerebras was recognized  for developing the first genome-scale language model to study the evolutionary dynamics of SARS-CoV-2. Their work has the potential to transform how we identify and classify new and emergent variants of pandemic-causing viruses.

At Cerebras Systems, we love it when the CS-2 is vastly faster than large NVIDIA GPU clusters.

Read our blogRead the paper on bioRxiv


Kim Branson

SVP Global Head of AI and ML @

Ready to get started?

Contact Sales