Abu Qader

Software Engineer

Abu Qader

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Rachel Rapp

Bryce Dubayah

Abu Qader

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

How to double tokens per second for Llama 3 with Medusa

Philip Kiely

Abu Qader

Abu Qader

1 other

Double Llama TPS with Medusa

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

Philip Kiely

Abu Qader

Abu Qader

1 other

TensorRT-LLM Engine Creation

Model performance

Benchmarking fast Mistral 7B inference

Philip Kiely

Pankaj Gupta

Abu Qader

Abu Qader

3 others

Mistral 7B

Model performance

Introduction to quantizing ML models

Philip Kiely

Abu Qader

Abu Qader

1 other

Quantization

Explore Baseten today

Start deploying

Talk to an engineer