Creating Sparse GPT-3 Models with Iterative Pruning

We have trained extremely sparse GPT-3 1.3B parameter models on the Cerebras CS-2 system, including an 83.8% sparse…


0 Comments16 Minutes