embedding

Qwen LogoQwen3 8B Reranker

A performant reranker model

Model details

View repository

Example usage

Qwen-3-embeddings is prediction model that has two outputs: "no" and "yes", which indicate the match between query and a document.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also your own client code.

Input
1import os
2from baseten_performance_client import (
3    PerformanceClient, ClassificationResponse
4)
5
6api_key = os.environ["BASETEN_API_KEY"]
7model_id = "xxxxxxx"
8base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"
9
10client = PerformanceClient(base_url=base_url, api_key=api_key)
11
12prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
13suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
14
15def format_instruction(instruction, query, doc):
16    if instruction is None:
17        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
18    output = "{prefix}<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}{suffix}"
19    return output
20
21texts_to_classify = [
22    format_instruction(task=None, query="What is the capital of China?", doc="The capital of China is Beijing."),
23    format_instruction(task=None, query="What is the capital of China?", doc="The capital of France is Paris.")
24]
25
26response: ClassificationResponse = client.classify(
27    input=texts,
28    model="my_model",
29    truncate=True,
30    batch_size=16,
31    max_concurrent_requests=32,
32)
JSON output
1[
2    {
3        "score": 0.9861514,
4        "label": "yes"
5    },
6    {
7        "score": 0.01384861,
8        "label": "no"
9    }
10]

embedding models

See all
Qwen Logo
Embedding

Qwen3 8B Reranker

BEI - H100 MIG 40GB
Qwen Logo
Embedding

Qwen3 8B Embedding

BEI - H100 MIG 40GB
Allen AI
Embedding

Tulu 3 8B Reward

V3 - Reward - BEI - H100 MIG 40GB

Qwen models

See all
Qwen Logo
Model API
LLM

Qwen3 235B 2507

2507
Qwen Logo
Model API
LLM

Qwen3 Coder 480B

Qwen Logo
Embedding

Qwen3 8B Reranker

BEI - H100 MIG 40GB

🔥 Trending models