Neural Speed: Fast Inference on CPU for 4-bit Large Language Models

Neural Speed: Fast Inference on CPU for 4-bit Large Language Models

2 weeks ago
Anonymous $6hYC3Wwiad