Justin Yi

Software Engineer

Model performance

How we built production-ready speculative decoding with TensorRT-LLM

Philip Kiely

Pankaj Gupta

Pankaj Gupta

2 others

Speculative Decoding with TensorRT-LLM

Model performance

A quick introduction to speculative decoding

Philip Kiely

Pankaj Gupta

Pankaj Gupta

2 others

Intro to Speculative Decoding

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Rachel Rapp

Bryce Dubayah

Abu Qader

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

Benchmarking fast Mistral 7B inference

Philip Kiely

Pankaj Gupta

Abu Qader

Abu Qader

3 others

Mistral 7B

Model performance

High performance ML inference with NVIDIA TensorRT

Philip Kiely

Justin Yi

1 other

NVIDIA TensorRT

Model performance

40% faster Stable Diffusion XL inference with NVIDIA TensorRT

Philip Kiely

Pankaj Gupta

Pankaj Gupta

2 others

40% faster SDXL

AI engineering

Build with OpenAI’s Whisper model in five minutes

Justin Yi

Whisper on Baseten

Explore Baseten today

Start deploying

Talk to an engineer