Publication - Cerebras https://www.cerebras.net/category/publication/ Mon, 20 May 2024 15:07:22 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.2 https://www.cerebras.net/wp-content/uploads/2022/05/cropped-cerebras-logo-fav-32x32.png Publication - Cerebras https://www.cerebras.net/category/publication/ 32 32 MediSwift: Efficient Sparse Pre-trained Biomedical Language Models https://arxiv.org/abs/2403.00952#new_tab Mon, 20 May 2024 15:07:22 +0000 https://www.cerebras.net/?p=105485 Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine).…

The post MediSwift: Efficient Sparse Pre-trained Biomedical Language Models appeared first on Cerebras.

]]>
Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System https://arxiv.org/abs/2405.07898#new_tab Thu, 16 May 2024 00:54:00 +0000 https://www.cerebras.net/?p=105479 Molecular dynamics (MD) simulations have transformed our understanding of the nanoscale, driving breakthroughs in materials science, computational chemistry, and several other fields, including biophysics and drug design.…

The post Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System appeared first on Cerebras.

]]>
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment https://arxiv.org/abs/2405.03594#new_tab Thu, 16 May 2024 00:53:03 +0000 https://www.cerebras.net/?p=105478 Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up…

The post Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment appeared first on Cerebras.

]]>
Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware https://arxiv.org/abs/2311.01739#new_tab Mon, 13 Nov 2023 18:52:26 +0000 https://www.cerebras.net/?p=105021 The recent trend toward deep learning has led to the development of a variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras Wafer-Scale Engine 2 (WSE-2), features 40 GB of on-chip SRAM, making it a potentially attractive…

The post Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware appeared first on Cerebras.

]]>
Position Interpolation Improves ALiBi Extrapolation https://arxiv.org/abs/2310.13017##new_tab Wed, 08 Nov 2023 22:57:03 +0000 https://www.cerebras.net/?p=104980 Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation…

The post Position Interpolation Improves ALiBi Extrapolation appeared first on Cerebras.

]]>
Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems https://www.cerebras.net/publication/scaling-the-memory-wall-for-multi-dimensional-seismic-processing-with-algebraic-compression-on-cerebras-cs-2-systems Tue, 26 Sep 2023 23:42:19 +0000 https://www.cerebras.net/?p=104946

The post Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems appeared first on Cerebras.

]]>
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model https://arxiv.org/abs/2309.11568#new_tab Fri, 22 Sep 2023 17:28:00 +0000 https://www.cerebras.net/?p=104941 We introduce the Bittensor Language Model, called “BTLM-3B-8K”, a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing…

The post BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model appeared first on Cerebras.

]]>
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models https://arxiv.org/abs/2308.16149#new_tab Thu, 31 Aug 2023 19:39:26 +0000 https://www.cerebras.net/?p=104914 We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code…

The post Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models appeared first on Cerebras.

]]>
Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/IEEE%20Micro%202023-03%20Hot%20Chips%2034%20Cerebras%20Architecture%20Deep%20Dive.pdf#new_tab Mon, 22 May 2023 20:15:11 +0000 https://www.cerebras.net/?p=104721 IEEE Micro Volume 34, Issue 3, focuses on papers from last year’s Hot Chips 34 conference.
This article describes the Cerebras architecture and how it is designed specifically with this purpose, from the ground up, as a wafer-sized chip to…

The post Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning appeared first on Cerebras.

]]>
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster https://arxiv.org/abs/2304.03208#new_tab Fri, 07 Apr 2023 17:24:10 +0000 https://www.cerebras.net/?p=104639 We introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-training (highest accuracy for a given compute budget).…

The post Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster appeared first on Cerebras.

]]>
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency https://arxiv.org/abs/2303.11525#new_tab Wed, 22 Mar 2023 16:53:24 +0000 https://www.cerebras.net/?p=104632 Replacing dense layers with Sparse-IFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or…

The post Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency appeared first on Cerebras.

]]>
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models https://arxiv.org/abs/2303.10464#new_tab Tue, 21 Mar 2023 16:41:45 +0000 https://www.cerebras.net/?p=104624 Presented at the ICLR 2023 Workshop on Sparsity in Neural Networks.
In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity…

The post SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models appeared first on Cerebras.

]]>
Wafer-Scale Fast Fourier Transforms https://arxiv.org/pdf/2209.15040.pdf#new_tab Fri, 20 Jan 2023 19:33:42 +0000 https://www.cerebras.net/?p=104308 We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000…

The post Wafer-Scale Fast Fourier Transforms appeared first on Cerebras.

]]>
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics https://www.biorxiv.org/content/10.1101/2022.10.10.511571v2#new_tab Thu, 24 Nov 2022 04:58:55 +0000 https://www.cerebras.net/?p=104094 Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary…

The post GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics appeared first on Cerebras.

]]>
Disruptive Changes in Field Equation Modeling: A Simple Interface for Wafer Scale Engines https://arxiv.org/abs/2209.13768#new_tab Thu, 29 Sep 2022 03:47:31 +0000 https://www.cerebras.net/?p=104093 We present a high-level and accessible Application Programming Interface (API) for the solution of field equations on the Cerebras Systems Wafer-Scale Engine (WSE) with over two orders of magnitude performance gain relative to traditional distributed computing approaches. The domain-specific API…

The post Disruptive Changes in Field Equation Modeling: A Simple Interface for Wafer Scale Engines appeared first on Cerebras.

]]>
TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer-Scale Engine https://arxiv.org/abs/2210.04795#new_tab Fri, 26 Aug 2022 17:10:18 +0000 https://www.cerebras.net/?p=104540 The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it…

The post TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer-Scale Engine appeared first on Cerebras.

]]>
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network https://arxiv.org/abs/2206.14098#new_tab Tue, 28 Jun 2022 21:38:33 +0000 https://www.cerebras.net/?p=103704 This work introduces the RevSilo, the first reversible module for bidirectional multi-scale feature fusion. Like other reversible methods, RevSilo eliminates the need to store hidden activations by recomputing them. Existing reversible methods, however, do not apply to multi-scale feature fusion…

The post RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network appeared first on Cerebras.

]]>
A Templated C++ Interface for ISL https://cerebras.net/wp-content/uploads/2021/04/IMPACT_2021_paper_2.pdf#new_tab Sat, 23 Apr 2022 04:24:02 +0000 https://cerebras.net/?p=103302 Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions.…

The post A Templated C++ Interface for ISL appeared first on Cerebras.

]]>
Massively scalable stencil algorithm https://arxiv.org/pdf/2204.03775.pdf#new_tab Thu, 07 Apr 2022 17:54:07 +0000 https://www.cerebras.net/?p=103623 Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low reuse of memory accesses. This work shows that for stencil computation a novel…

The post Massively scalable stencil algorithm appeared first on Cerebras.

]]>
Epigenomic language models powered by Cerebras https://arxiv.org/abs/2112.07571#new_tab Thu, 27 Jan 2022 04:25:10 +0000 https://cerebras.net/?p=103293 Large scale self-supervised pre-training of Transformer language models has advanced the field of Natural Language Processing and shown promise in cross-application to the biological `languages’ of proteins and DNA. Learning effective representations of DNA sequences using large genomic sequence corpuses…

The post Epigenomic language models powered by Cerebras appeared first on Cerebras.

]]>
BraggNN: fast X-ray Bragg peak analysis using deep learning https://journals.iucr.org/m/issues/2022/01/00/fs5198/index.html#new_tab Sat, 01 Jan 2022 21:35:02 +0000 https://www.cerebras.net/?p=103609 We propose BraggNN, a deep-learning based method, to accelerate the most computation-intensive part of polycrystal diffraction data analysis (diffraction signal characterization). The application of BraggNN for real experimental data demonstrates that it can deliver consistent (sometimes even slightly better) results…

The post BraggNN: fast X-ray Bragg peak analysis using deep learning appeared first on Cerebras.

]]>
Microprocessor at 50. The Path to Successful Wafer-Scale Integration: The Cerebras Story https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/IEEE%20Micro%202021-11%20Path%20to%20Wafer-Scale%20Integration.pdf#new_tab Fri, 19 Nov 2021 22:38:04 +0000 https://www.cerebras.net/?p=104723 IEEE Micro Volume 41, Issue 6, took a look back at the first 50 years of the microprocessor, and forward to what’s next. It featured this article by Gary Lauterbach, Co-Founder
and the Chief Technology Officer of Cerebras Systems, which…

The post Microprocessor at 50. The Path to Successful Wafer-Scale Integration: The Cerebras Story appeared first on Cerebras.

]]>
Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action https://www.biorxiv.org/content/10.1101/2021.10.09.463779v1.full.pdf#new_tab Thu, 18 Nov 2021 05:25:03 +0000 https://cerebras.net/?p=103303 The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical com- pounds is a pathway to treating COVID-19.…

The post Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action appeared first on Cerebras.

]]>
The Path to Successful Wafer-Scale Integration: The Cerebras Story https://www.computer.org/csdl/magazine/mi/2021/06/09623424/1yJTq0E9m1O#new_tab Mon, 01 Nov 2021 16:15:09 +0000 https://www.cerebras.net/?p=103752 There has been an impressive increase in single-chip processing power since the Intel 4004 was launched in 1971. This is usually attributed to Moore’s law, but there are additional factors to consider. In understanding the components of prior improvements, we…

The post The Path to Successful Wafer-Scale Integration: The Cerebras Story appeared first on Cerebras.

]]>
Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms https://dl.acm.org/doi/10.1145/3468267.3470578#new_tab Tue, 06 Jul 2021 04:02:23 +0000 https://cerebras.net/?p=103299 Emerging hardware tailored for artificial intelligence (AI) and machine learning (ML) methods provide novel means to couple them with traditional high performance computing (HPC) workflows involving molecular dynamics (MD) simulations. We propose Stream-AI-MD, a novel instance of applying deep learning…

The post Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms appeared first on Cerebras.

]]>
Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation https://www.springerprofessional.de/en/memory-efficient-3d-u-net-with-reversible-mobile-inverted-bottle/19007114#new_tab Sat, 06 Mar 2021 04:40:10 +0000 https://cerebras.net/?p=103295 We propose combining memory saving techniques with traditional U-Net architectures to increase the complexity of the models on the Brain Tumor Segmentation (BraTS) challenge. The BraTS challenge consists of a 3D segmentation of a 240 240 155 4 input image…

The post Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation appeared first on Cerebras.

]]>
Pipelined Backpropagation at Scale: Training Large Models without Batches https://proceedings.mlsys.org/paper/2021/hash/9b8619251a19057cff70779273e95aa6-Abstract.html#new_tab Mon, 01 Mar 2021 18:29:17 +0000 https://cerebras.net/?p=103242 New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms.…

The post Pipelined Backpropagation at Scale: Training Large Models without Batches appeared first on Cerebras.

]]>
System Integration of Neocortex, a Unique, Scalable AI Platform https://dl.acm.org/doi/abs/10.1145/3437359.3465604#new_tab Thu, 04 Feb 2021 05:21:34 +0000 https://cerebras.net/?p=103301 The Pittsburgh Supercomputing Center, in partnership with Cerebras Systems and Hewlett Packard Enterprise, has deployed Neocortex, an innovative computing platform that accelerates scientific discovery by vastly shortening the time required for deep learning training and fosters greater integration of deep…

The post System Integration of Neocortex, a Unique, Scalable AI Platform appeared first on Cerebras.

]]>
Fast Stencil-Code Computation on a Wafer-Scale Processor https://arxiv.org/abs/2010.03660#new_tab Fri, 23 Oct 2020 04:00:21 +0000 https://cerebras.net/?p=103298 The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between…

The post Fast Stencil-Code Computation on a Wafer-Scale Processor appeared first on Cerebras.

]]>
Fast Stencil-Code Computation on a Wafer-Scale Processor https://arxiv.org/abs/2010.03660#new_tab Wed, 07 Oct 2020 14:55:41 +0000 https://www.cerebras.net/?p=103707 The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between…

The post Fast Stencil-Code Computation on a Wafer-Scale Processor appeared first on Cerebras.

]]>
The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain https://arxiv.org/abs/2007.03774#new_tab Wed, 08 Jul 2020 03:57:12 +0000 https://cerebras.net/?p=103297 In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression.…

The post The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain appeared first on Cerebras.

]]>
Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques https://cerebras.net/wp-content/uploads/2021/04/IMPACT_2020_paper_3.pdf#new_tab Sun, 23 Feb 2020 05:03:52 +0000 https://cerebras.net/?p=103300 The Cerebras CS-1 is a computing system based on a waferscale processor having nearly 400,000 compute cores. It is intended for training of and inference on deep neural networks.…

The post Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques appeared first on Cerebras.

]]>
A Templated C++ Interface for ISL https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/Whitepapers/A%20Templated%20C++%20Interface%20for%20isl.pdf#new_tab Thu, 20 Feb 2020 16:13:04 +0000 https://www.cerebras.net/?p=103708 Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions. Software built on top of these libraries, on the other hand, needs to deal…

The post A Templated C++ Interface for ISL appeared first on Cerebras.

]]>
Online Normalization for Training Neural Networks https://papers.nips.cc/paper/2019/hash/cb3ce9b06932da6faaa7fc70d5b5d2f4-Abstract.html#new_tab Fri, 29 Nov 2019 23:15:48 +0000 https://cerebras.net/?p=103346 Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions.…

The post Online Normalization for Training Neural Networks appeared first on Cerebras.

]]>
Online Normalization for Training Neural Networks, NeurIPS 2019 https://papers.nips.cc/paper/2019/hash/cb3ce9b06932da6faaa7fc70d5b5d2f4-Abstract.html#new_tab Thu, 16 May 2019 03:48:47 +0000 https://cerebras.net/?p=103296 Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization.…

The post Online Normalization for Training Neural Networks, NeurIPS 2019 appeared first on Cerebras.

]]>