JetMoE: Pre-training an 8B LLM Better than Llama 2 7B

JetMoE: Pre-training an 8B LLM Better than Llama 2 7B

2 years ago
Anonymous $6hYC3Wwiad