Instructor
Integrate Orq.ai with Instructor for structured output observability using OpenTelemetry
Getting Started
Instructor enables structured outputs from language models using Pydantic schemas. Tracing Instructor with Orq.ai provides comprehensive insights into data extraction patterns, validation success rates, retry mechanisms, and structured output performance to optimize your LLM-powered data processing pipelines.
Prerequisites
Before you begin, ensure you have:
- An Orq.ai account and API Key
- Python 3.8+
- Instructor library installed in your project
- OpenAI API key (or other supported LLM provider credentials)
Install Dependencies
# Core Instructor and OpenTelemetry packages
pip install instructor openai opentelemetry-sdk opentelemetry-exporter-otlp
# OpenInference instrumentation for Instructor
pip install openinference-instrumentation-instructor
Configure Orq.ai
Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:
Unix/Linux/macOS:
export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=instructor-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Windows (PowerShell):
$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=instructor-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"
Using .env file:
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=instructor-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
Integration
Instructor uses OpenInference instrumentation for automatic OpenTelemetry tracing.
Set up the instrumentation in your application:
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import os
# Configure tracer provider
tracer_provider = TracerProvider(
resource=Resource({"service.name": "instructor-app"})
)
# Set up OTLP exporter
otlp_exporter = OTLPSpanExporter(
endpoint=f"{os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT')}/v1/traces",
headers={"Authorization": os.getenv('OTEL_EXPORTER_OTLP_HEADERS').split('=', 1)[1]}
)
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
# Instrument Instructor
from openinference.instrumentation.instructor import InstructorInstrumentor
InstructorInstrumentor().instrument(tracer_provider=tracer_provider)
Use Instructor with automatic tracing:
import instructor
from pydantic import BaseModel
from openai import OpenAI
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import os
# Configure OpenTelemetry
tracer_provider = TracerProvider(
resource=Resource({"service.name": "instructor-app"})
)
otlp_exporter = OTLPSpanExporter(
endpoint=f"{os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT')}/v1/traces",
headers={"Authorization": os.getenv('OTEL_EXPORTER_OTLP_HEADERS').split('=', 1)[1]}
)
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
# Instrument Instructor
from openinference.instrumentation.instructor import InstructorInstrumentor
InstructorInstrumentor().instrument(tracer_provider=tracer_provider)
# Define response schema
class UserInfo(BaseModel):
name: str
age: int
occupation: str
# Create Instructor client
client = instructor.from_openai(OpenAI())
# Extract structured data (automatically traced)
user_info = client.chat.completions.create(
model="gpt-4",
response_model=UserInfo,
messages=[{
"role": "user",
"content": "John Smith is a 32-year-old software engineer."
}]
)
print(f"Extracted: {user_info.name}, {user_info.age}, {user_info.occupation}")
All Instructor structured output extractions will be automatically instrumented and exported to Orq.ai through the OTLP exporter. For more details, see Traces.
Advanced Examples
Complex Nested Schemas
import instructor
from pydantic import BaseModel, Field
from openai import OpenAI
from typing import List
# Setup done as shown in Integration section above
class Address(BaseModel):
street: str
city: str
country: str
postal_code: str
class OrderItem(BaseModel):
product_name: str
quantity: int
price: float
class Invoice(BaseModel):
invoice_number: str = Field(description="Unique invoice identifier")
customer_name: str
billing_address: Address
items: List[OrderItem]
subtotal: float
tax: float
total: float
client = instructor.from_openai(OpenAI())
# Extract complex nested data (automatically traced)
invoice_text = """
Invoice #INV-2024-001
Customer: Acme Corporation
Address: 123 Main St, New York, NY, USA, 10001
Items:
- Premium Widget x5 @ $50.00
- Standard Gadget x10 @ $25.00
Subtotal: $500.00
Tax: $40.00
Total: $540.00
"""
invoice = client.chat.completions.create(
model="gpt-4",
response_model=Invoice,
messages=[{"role": "user", "content": f"Extract invoice data: {invoice_text}"}]
)
print(f"Invoice {invoice.invoice_number} for {invoice.customer_name}")
print(f"Total: ${invoice.total}")
Batch Processing with Validation
import instructor
from pydantic import BaseModel, Field, validator
from openai import OpenAI
from typing import List
class ProductReview(BaseModel):
reviewer_name: str
rating: int = Field(ge=1, le=5, description="Rating from 1-5 stars")
sentiment: str = Field(description="positive, negative, or neutral")
key_points: List[str]
@validator('sentiment')
def validate_sentiment(cls, v):
if v.lower() not in ['positive', 'negative', 'neutral']:
raise ValueError('Invalid sentiment')
return v.lower()
client = instructor.from_openai(OpenAI())
reviews_text = [
"Amazing product! Best purchase ever. 5 stars from John.",
"Terrible quality. Broke after one day. Very disappointed. - Sarah",
"It's okay, nothing special but works as expected. 3 stars - Mike"
]
extracted_reviews = []
# Process batch with validation (automatically traced)
for review in reviews_text:
try:
result = client.chat.completions.create(
model="gpt-4",
response_model=ProductReview,
messages=[{"role": "user", "content": f"Extract review data: {review}"}]
)
extracted_reviews.append(result)
except Exception as e:
print(f"Failed to process review: {e}")
print(f"Successfully processed {len(extracted_reviews)} reviews")
Retry with Custom Logic
import instructor
from pydantic import BaseModel, Field, validator
from openai import OpenAI
from datetime import datetime
class EventInfo(BaseModel):
title: str = Field(min_length=5, max_length=100)
date: str = Field(description="Date in YYYY-MM-DD format")
time: str = Field(description="Time in HH:MM format")
attendees: int = Field(ge=1, le=10000)
@validator('date')
def validate_date(cls, v):
try:
datetime.strptime(v, '%Y-%m-%d')
return v
except ValueError:
raise ValueError('Date must be in YYYY-MM-DD format')
client = instructor.from_openai(OpenAI())
event_description = """
AI in Healthcare Summit 2024
Date: April 15, 2024
Time: 9:00 AM
Expected Attendees: 500
"""
# Extract with automatic retries on validation errors (automatically traced)
event = client.chat.completions.create(
model="gpt-4",
response_model=EventInfo,
messages=[{"role": "user", "content": f"Extract event info: {event_description}"}],
max_retries=3
)
print(f"Event: {event.title} on {event.date} at {event.time}")
Instructor is also compatible with our AI Gateway, to learn more, see Instructor.
Updated about 5 hours ago