Integrations

Use OVHcloud AI Endpoints models with our partner integrations. Feel free to send us a message on our Discord if you have any ideas, suggestions or requests.

Information

To create an API key, you can navigate to the OVHcloud Control Panel, in Public Cloud > AI Endpoints > API keys.

LiteLLM

LiteLLM is a Python library that simplifies using Large Language Model (LLM) by providing a unified interface for different AI providers.

Install LiteLLM via pip:

pip install litellm

Basic Usage

The recommended method to configure your API key is using environment variables:

import os

# Set your API key via environment variable
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"

Here's a simple usage example:

from litellm import completion

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What's the capital of France?"
        }
    ],
    max_tokens=100,
    temperature=0.7
)

print(response.choices[0].message.content)

Advanced Features

Response Streaming

For applications requiring real-time responses, use streaming:

from litellm import completion

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "Write me a short story about a robot learning to cook."
        }
    ],
    max_tokens=500,
    temperature=0.8,
    stream=True  # Enable streaming
)

# Progressive display of the response
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='', flush=True)

Function Calling (or Tool Calling)

LiteLLM supports function calling with AI Endpoints compatible models:

from litellm import completion
import json

def get_current_weather(location, unit="celsius"):
    """Simulated function to get the weather"""
    if unit == "celsius":
        return {"location": location, "temperature": "22", "unit": "celsius"}
    else:
        return {"location": location, "temperature": "72", "unit": "fahrenheit"}

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and country, e.g. Paris, France"
                    },
                    "unit": {
                        "type": "string", 
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# First call to get the tool usage decision
response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[{"role": "user", "content": "What's the weather like in Paris?"}],
    tools=tools,
    tool_choice="auto"
)

# Process tool calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_args = json.loads(tool_call.function.arguments)
    
    # Execute the function
    result = get_current_weather(
        location=function_args.get("location"),
        unit=function_args.get("unit", "celsius")
    )
    
    print(f"Tool result: {result}")

Vision and Image Analysis

For models supporting vision capabilities:

from base64 import b64encode
from mimetypes import guess_type
import litellm

def encode_image(file_path):
    """Encode an image to base64 for the API"""
    mime_type, _ = guess_type(file_path)
    if mime_type is None:
        raise ValueError("Could not determine MIME type of the file")
    
    with open(file_path, "rb") as image_file:
        encoded_string = b64encode(image_file.read()).decode("utf-8")
        data_url = f"data:{mime_type};base64,{encoded_string}"
        return data_url

# Image analysis
response = litellm.completion(
    model="ovhcloud/Mistral-Small-3.2-24B-Instruct-2506",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": encode_image("my_image.jpg"),
                        "format": "image/jpeg"
                    }
                }
            ]
        }
    ],
    stream=False
)

print(response.choices[0].message.content)

Structured Output (JSON Schema)

To get responses in a structured format:

from litellm import completion

response = completion(
    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
    messages=[
        {
            "role": "system",
            "content": "You are a specialist in extracting structured data from unstructured text."
        },
        {
            "role": "user",
            "content": "Room 12 contains books, a desk, and a lamp."
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "title": "extracted_data",
            "name": "data_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "room": {"type": "string"},
                    "items": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["room", "items"],
                "additionalProperties": False
            },
            "strict": False
        }
    }
)

print(response.choices[0].message.content)

Embeddings

To generate embeddings with compatible models:

from litellm import embedding

response = embedding(
    model="ovhcloud/BGE-M3",
    input=["sample text to embed", "another sample text to embed"]
)

print(response.data)

Using LiteLLM Proxy Server

Proxy Server Configuration

For production deployments, you can use the LiteLLM proxy server:

Install LiteLLM proxy:

pip install 'litellm[proxy]'

Create a config.yaml file:

config.yaml

model_list:
  - model_name: my-llama
    litellm_params:
      model: ovhcloud/Meta-Llama-3_3-70B-Instruct
      api_key: your-ovh-api-key
      
  - model_name: my-mistral
    litellm_params:
      model: ovhcloud/Mistral-Small-3.2-24B-Instruct-2506
      api_key: your-ovh-api-key

  - model_name: my-embedding
    litellm_params:
      model: ovhcloud/BGE-M3
      api_key: your-ovh-api-key

Start the proxy server:

litellm --config /path/to/config.yaml --port 4000

The proxy server is live with our models!

Using the Proxy

Once the proxy is running, use it like a standard OpenAI API:

import openai

client = openai.OpenAI(
    api_key="sk-1234",  # LiteLLM proxy key
    base_url="http://localhost:4000"  # Proxy URL
)

response = client.chat.completions.create(
    model="my-llama",
    messages=[
        {
            "role": "user",
            "content": "What is OVHcloud?"
        }
    ]
)

print(response.choices[0].message.content)

Pydantic AI

Pydantic AI is a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI.

Pydantic AI is available on PyPI as pydantic-ai so installation is as simple as:

pip install pydantic-ai

You can then set the OVHCLOUD_API_KEY environment variable and use OVHcloudProvider by name:

from pydantic_ai import Agent

agent = Agent('ovhcloud:gpt-oss-120b')
result = agent.run_sync('What is the capital of France?')
print(result.output)
#> The capital of France is Paris.

If you need to configure the provider, you can use the OVHcloudProvider class:

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.ovhcloud import OVHcloudProvider

model = OpenAIChatModel(
    'gpt-oss-120b',
    provider=OVHcloudProvider(api_key='your-api-key'),
)
agent = Agent(model)
result = agent.run_sync('What is the capital of France?')
print(result.output)
#> The capital of France is Paris.

Kilo Code

Kilo Code accelerates development with AI-driven code generation and task automation. This open source extension plugs directly into VS Code.

First, go in your IDE market place and search for "Kilo Code".

Visual Studio Code : https://marketplace.visualstudio.com/items?itemName=kilocode.Kilo-Code
Jetbrains : https://plugins.jetbrains.com/plugin/28350-kilo-code
Cursor : cursor:extension/kilocode.kilo-code

Once Kilo Code is installed, open the extension and click on Use your own API key. Then you can search in API Provider for OVHcloud AI Endpoints. You can enter your OVHcloud AI Endpoints API key.

You can select the model of your choice on the drop-down list, that automatically fetches the last available models.

Apache Airflow Provider

This package provides Apache Airflow integration with OVHcloud AI products and especially AI Endpoints.

Installation

pip install apache-airflow-provider-ovhcloud-ai

Configuration

Create an Airflow Connection

Navigate to Admin > Connections in the Airflow UI and create a new connection:

Connection Id: ovh_ai_endpoints_default (or your custom name)
Connection Type: generic
Password: Your OVHcloud AI Endpoints API token

Alternatively, use the Airflow CLI:

airflow connections add ovh_ai_endpoints_default \
    --conn-type generic \
    --conn-password your-api-token-here

Or set via environment variable:

export AIRFLOW_CONN_OVH_AI_ENDPOINTS_DEFAULT='{"password":"your-api-token-here"}'

Usage

Chat Completion Example

from airflow import DAG
from apache_airflow_provider_ovhcloud_ai.operators.ai_endpoints import OVHCloudAIEndpointsChatCompletionsOperator
from datetime import datetime

with DAG(
    dag_id='ovh_ai_chat_example',
    start_date=datetime(2024, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    
    chat_task = OVHCloudAIEndpointsChatCompletionsOperator(
        task_id='generate_response',
        model='gpt-oss-120b',
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain what Apache Airflow is in one sentence."}
        ],
        temperature=0.7,
        max_tokens=100,
    )

Embedding Example

from airflow import DAG
from apache_airflow_provider_ovhcloud_ai.operators.ovhcloud_ai import OVHCloudAIEndpointsEmbeddingOperator
from datetime import datetime

with DAG(
    dag_id='ovh_ai_embedding_example',
    start_date=datetime(2024, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    
    embed_task = OVHCloudAIEndpointsEmbeddingOperator(
        task_id='create_embeddings',
        model='BGE-M3',
        input=[
            "Apache Airflow is a workflow orchestration tool",
            "OVHcloud provides cloud computing services"
        ],
    )

Using the Hook Directly

from apache_airflow_provider_ovhcloud_ai.hooks.ai_endpoints import OVHCloudAIEndpointsHook

def my_custom_function(**context):
    hook = OVHCloudAIEndpointsHook(ovh_conn_id='ovh_ai_endpoints_default')
    
    # Chat completion
    response = hook.chat_completion(
        model='gpt-oss-120b',
        messages=[{"role": "user", "content": "Hello!"}],
        temperature=0.8,
    )
    
    print(response['choices'][0]['message']['content'])
    
    # Create embeddings
    embeddings = hook.create_embedding(
        model='BGE-M3',
        input="Text to embed"
    )
    
    print(embeddings['data'][0]['embedding'])

Dynamic Templating

Operators support Jinja templating for dynamic values:

from airflow import DAG
from apache_airflow_provider_ovhcloud_ai.operators.ovhcloud_ai import OVHCloudAIEndpointsChatCompletionsOperator
from datetime import datetime

with DAG(
    dag_id='ovh_ai_templating_example',
    start_date=datetime(2024, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    
    chat_task = OVHCloudAIEndpointsChatCompletionsOperator(
        task_id='templated_chat',
        model='{{ var.value.llm_model }}',  # From Airflow Variables
        messages=[
            {
                "role": "user", 
                "content": "Process data from {{ ds }}"  # Execution date
            }
        ],
    )

Integrations

On this page