Google Vertex AI - Orq.ai Documentation

Google Vertex AI provides enterprise-grade access to Gemini models with enhanced security, compliance, and control. Connecting Vertex AI to Orq.ai provides enterprise Gemini capabilities with service account authentication, project-level billing, and data residency controls.

Set Up an API Key

To use Vertex AI with Orq.ai, create a service account with appropriate permissions:

Create Service Account

Go to Google Cloud Console
Navigate to IAM & Admin > Service Accounts
Click Create Service Account
Enter a name (e.g., “orq-vertex-ai”)
Grant the following roles:
- Service Account Token Creator
- Vertex AI User
Click Create and Continue
Click Done

Create Service Account Key

Find the service account in the list
Click the Actions menu (three dots)
Select Manage Keys
Click Add Key > Create New Key
Select JSON format
Click Create to download the key file

Configure in Orq.ai

Navigate to AI Gateway > BYOK
Find Google Vertex AI in the list
Click the Configure button
Select Setup your own API Key
Enter configuration name (e.g., “Vertex AI Production”)
Paste the service account JSON in the Deployment JSON field (see format below)
Click Save to complete the setup

Deployment JSON Format

The deployment JSON must include the service account credentials, project ID, and region:

{
  "projectId": "my-project-123456",
  "location": "us-central1",
  "serviceAccount": {
    "type": "service_account",
    "project_id": "my-project-123456",
    "private_key_id": "afd17083ecd5184b5ca880e70eb84c2e4c382f14",
    "private_key": "-----BEGIN PRIVATE KEY-----\n...=\n-----END PRIVATE KEY-----\n",
    "client_email": "vertex-ai@my-project-123456.iam.gserviceaccount.com",
    "client_id": "000000000000000000000",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/vertex-ai%40my-project-123456.iam.gserviceaccount.com",
    "universe_domain": "googleapis.com"
  }
}

Project ID: Find the Google Cloud Project ID at the top of the Google Cloud Console.Location: Common regions include us-central1, europe-west1, asia-northeast1. Choose based on data residency requirements.

Available Models

The AI Gateway supports all current Vertex AI Gemini models. Here are the most commonly used:

Recommended Models

Model	Context	Best For
`google/gemini-2.5-pro-preview`	1M	Latest preview, most advanced
`google/gemini-2.5-pro`	1M	Latest stable, most capable
`google/gemini-2.5-flash`	1M	Fast, balanced performance
`google/gemini-2.0-flash-001`	1M	Stable, reliable

For a complete and up-to-date list of all available Vertex AI models, see Supported Models.

Use google/gemini-2.5-pro for the latest stable model, or google/gemini-2.5-flash for the best balance of performance and cost.

Quick Start

Access Vertex AI Gemini models through the AI Gateway.

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "input": "Explain quantum computing in simple terms"
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "google/gemini-2.5-pro",
  input: "Explain quantum computing in simple terms",
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="google/gemini-2.5-pro",
    input="Explain quantum computing in simple terms",
)

print(response.output_text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [{ role: "user", content: "Explain quantum computing in simple terms" }],
});

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}],
)

print(response.choices[0].message.content)

Using the AI Gateway

Access Vertex AI Gemini models through the AI Gateway with enterprise-grade security, advanced chat completions, streaming, and intelligent model routing. All Vertex AI models are available with consistent formatting and automatic request logging.

Vertex AI models use the provider slug format: google/model-name. For example: google/gemini-2.5-pro

Prerequisites

Before making requests to the AI Gateway, configure the environment and install the required SDKs. Endpoint

POST https://api.orq.ai/v3/router/responses

Required Headers Include the following headers in all requests:

Authorization: Bearer $ORQ_API_KEY
Content-Type: application/json

Getting an API Key:

Go to API Keys
Click Create API Key and copy it
Store it in your environment as ORQ_API_KEY

SDK Installation Install the OpenAI SDK for your language (compatible with Vertex AI models):

npm install openai
# or
yarn add openai

pip install openai

If existing OpenAI code is already functioning, change only the base_url and api_key to the AI Gateway endpoint and ORQ_API_KEY.

Basic Usage

Send messages to Vertex AI Gemini models and get intelligent responses:

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "instructions": "You are a helpful assistant that explains complex concepts simply.",
    "input": "Explain machine learning"
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "google/gemini-2.5-pro",
  instructions: "You are a helpful assistant that explains complex concepts simply.",
  input: "Explain machine learning",
});

console.log(response.output_text);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="google/gemini-2.5-pro",
    instructions="You are a helpful assistant that explains complex concepts simply.",
    input="Explain machine learning",
)

print(response.output_text)

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant that explains complex concepts simply."},
      {"role": "user", "content": "Explain machine learning"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that explains complex concepts simply.",
    },
    { role: "user", content: "Explain machine learning" },
  ],
  temperature: 0.7,
  max_tokens: 500,
});

console.log(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that explains complex concepts simply.",
        },
        {"role": "user", "content": "Explain machine learning"},
    ],
    temperature=0.7,
    max_tokens=500,
)

print(response.choices[0].message.content)

Streaming

Stream responses for real-time output and improved user experience:

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "input": "Write a short poem about the ocean",
    "stream": true
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.responses.create({
  model: "google/gemini-2.5-pro",
  input: "Write a short poem about the ocean",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

stream = client.responses.create(
    model="google/gemini-2.5-pro",
    input="Write a short poem about the ocean",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const stream = await client.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [{ role: "user", content: "Write a short poem about the ocean" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

stream = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Write a short poem about the ocean"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling

Vertex AI Gemini models support function calling for structured interactions:

curl -X POST https://api.orq.ai/v3/router/responses \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "input": "What is the weather in San Francisco?",
    "tools": [{
      "type": "function",
      "name": "get_weather",
      "description": "Get the current weather in a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" },
          "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
        },
        "required": ["location"]
      }
    }]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.responses.create({
  model: "google/gemini-2.5-pro",
  input: "What's the weather in San Francisco?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get the current weather in a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["location"],
      },
    },
  ],
});

const toolCall = response.output.find((item) => item.type === "function_call");
if (toolCall && toolCall.type === "function_call") {
  console.log(`Calling function: ${toolCall.name}`);
  console.log(`Arguments: ${toolCall.arguments}`);
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.responses.create(
    model="google/gemini-2.5-pro",
    input="What's the weather in San Francisco?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        }
    ],
)

tool_call = next((item for item in response.output if item.type == "function_call"), None)
if tool_call:
    print(f"Calling function: {tool_call.name}")
    print(f"Arguments: {tool_call.arguments}")

curl -X POST https://api.orq.ai/v3/router/chat/completions \
  -H "Authorization: Bearer $ORQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-pro",
    "messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather in a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" },
            "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
          },
          "required": ["location"]
        }
      }
    }]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ORQ_API_KEY,
  baseURL: "https://api.orq.ai/v3/router",
});

const response = await client.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get the current weather in a location",
        parameters: {
          type: "object",
          properties: {
            location: {
              type: "string",
              description: "The city and state, e.g. San Francisco, CA",
            },
            unit: { type: "string", enum: ["celsius", "fahrenheit"] },
          },
          required: ["location"],
        },
      },
    },
  ],
});

const choice = response.choices[0];
if (choice.finish_reason === "tool_calls" && choice.message.tool_calls) {
  const toolCall = choice.message.tool_calls[0];
  console.log(`Calling function: ${toolCall.function.name}`);
  console.log(`Arguments: ${toolCall.function.arguments}`);
}

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("ORQ_API_KEY"),
    base_url="https://api.orq.ai/v3/router",
)

response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ],
)

choice = response.choices[0]
if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
    tool_call = choice.message.tool_calls[0]
    print(f"Calling function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Automatic Request Logging

All requests made through the AI Gateway are automatically logged to the dashboard. The dashboard shows:

Request details: Model used, tokens, latency
Cost tracking: Per-request and aggregate costs
Error monitoring: Failed requests with error messages
Performance metrics: Response times and throughput

No additional configuration is needed. Logging happens automatically.

​Set Up an API Key

​Deployment JSON Format

​Available Models

​Recommended Models

​Quick Start

​Using the AI Gateway

​Prerequisites

​Basic Usage

​Streaming

​Function Calling

​Automatic Request Logging

​Reference

Set Up an API Key

Deployment JSON Format

Available Models

Recommended Models

Quick Start

Using the AI Gateway

Prerequisites

Basic Usage

Streaming

Function Calling

Automatic Request Logging

Reference