Instructor

Getting Started

Instructor enables structured outputs from language models using Pydantic schemas. Tracing Instructor with Orq.ai provides comprehensive insights into data extraction patterns, validation success rates, retry mechanisms, and structured output performance to optimize your LLM-powered data processing pipelines.

Prerequisites

Before you begin, ensure you have:

An Orq.ai account and API Key
Python 3.8+
Instructor library installed in your project
OpenAI API key (or other supported LLM provider credentials)

Install Dependencies

# Core Instructor and OpenTelemetry packages
pip install instructor openai opentelemetry-sdk opentelemetry-exporter-otlp

# OpenInference instrumentation for Instructor
pip install openinference-instrumentation-instructor

Configure Orq.ai

Set up your environment variables to connect to Orq.ai's OpenTelemetry collector:

Unix/Linux/macOS:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.orq.ai/v2/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <ORQ_API_KEY>"
export OTEL_RESOURCE_ATTRIBUTES="service.name=instructor-app,service.version=1.0.0"
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Windows (PowerShell):

$env:OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.orq.ai/v2/otel"
$env:OTEL_EXPORTER_OTLP_HEADERS = "Authorization=Bearer <ORQ_API_KEY>"
$env:OTEL_RESOURCE_ATTRIBUTES = "service.name=instructor-app,service.version=1.0.0"
$env:OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY>"

Using .env file:

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.orq.ai/v2/otel
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <ORQ_API_KEY>
OTEL_RESOURCE_ATTRIBUTES=service.name=instructor-app,service.version=1.0.0
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

Integration

Instructor uses OpenInference instrumentation for automatic OpenTelemetry tracing.

Set up the instrumentation in your application:

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import os

# Configure tracer provider
tracer_provider = TracerProvider(
    resource=Resource({"service.name": "instructor-app"})
)

# Set up OTLP exporter
otlp_exporter = OTLPSpanExporter(
    endpoint=f"{os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT')}/v1/traces",
    headers={"Authorization": os.getenv('OTEL_EXPORTER_OTLP_HEADERS').split('=', 1)[1]}
)

tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# Instrument Instructor
from openinference.instrumentation.instructor import InstructorInstrumentor

InstructorInstrumentor().instrument(tracer_provider=tracer_provider)

Use Instructor with automatic tracing:

import instructor
from pydantic import BaseModel
from openai import OpenAI
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import os

# Configure OpenTelemetry
tracer_provider = TracerProvider(
    resource=Resource({"service.name": "instructor-app"})
)

otlp_exporter = OTLPSpanExporter(
    endpoint=f"{os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT')}/v1/traces",
    headers={"Authorization": os.getenv('OTEL_EXPORTER_OTLP_HEADERS').split('=', 1)[1]}
)

tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

# Instrument Instructor
from openinference.instrumentation.instructor import InstructorInstrumentor

InstructorInstrumentor().instrument(tracer_provider=tracer_provider)

# Define response schema
class UserInfo(BaseModel):
    name: str
    age: int
    occupation: str

# Create Instructor client
client = instructor.from_openai(OpenAI())

# Extract structured data (automatically traced)
user_info = client.chat.completions.create(
    model="gpt-4",
    response_model=UserInfo,
    messages=[{
        "role": "user",
        "content": "John Smith is a 32-year-old software engineer."
    }]
)

print(f"Extracted: {user_info.name}, {user_info.age}, {user_info.occupation}")

👍
All Instructor structured output extractions will be automatically instrumented and exported to Orq.ai through the OTLP exporter. For more details, see Traces.

Advanced Examples

Complex Nested Schemas

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI
from typing import List

# Setup done as shown in Integration section above

class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: str

class OrderItem(BaseModel):
    product_name: str
    quantity: int
    price: float

class Invoice(BaseModel):
    invoice_number: str = Field(description="Unique invoice identifier")
    customer_name: str
    billing_address: Address
    items: List[OrderItem]
    subtotal: float
    tax: float
    total: float

client = instructor.from_openai(OpenAI())

# Extract complex nested data (automatically traced)
invoice_text = """
Invoice #INV-2024-001
Customer: Acme Corporation
Address: 123 Main St, New York, NY, USA, 10001

Items:
- Premium Widget x5 @ $50.00
- Standard Gadget x10 @ $25.00

Subtotal: $500.00
Tax: $40.00
Total: $540.00
"""

invoice = client.chat.completions.create(
    model="gpt-4",
    response_model=Invoice,
    messages=[{"role": "user", "content": f"Extract invoice data: {invoice_text}"}]
)

print(f"Invoice {invoice.invoice_number} for {invoice.customer_name}")
print(f"Total: ${invoice.total}")

Batch Processing with Validation

import instructor
from pydantic import BaseModel, Field, validator
from openai import OpenAI
from typing import List

class ProductReview(BaseModel):
    reviewer_name: str
    rating: int = Field(ge=1, le=5, description="Rating from 1-5 stars")
    sentiment: str = Field(description="positive, negative, or neutral")
    key_points: List[str]

    @validator('sentiment')
    def validate_sentiment(cls, v):
        if v.lower() not in ['positive', 'negative', 'neutral']:
            raise ValueError('Invalid sentiment')
        return v.lower()

client = instructor.from_openai(OpenAI())

reviews_text = [
    "Amazing product! Best purchase ever. 5 stars from John.",
    "Terrible quality. Broke after one day. Very disappointed. - Sarah",
    "It's okay, nothing special but works as expected. 3 stars - Mike"
]

extracted_reviews = []

# Process batch with validation (automatically traced)
for review in reviews_text:
    try:
        result = client.chat.completions.create(
            model="gpt-4",
            response_model=ProductReview,
            messages=[{"role": "user", "content": f"Extract review data: {review}"}]
        )
        extracted_reviews.append(result)
    except Exception as e:
        print(f"Failed to process review: {e}")

print(f"Successfully processed {len(extracted_reviews)} reviews")

Retry with Custom Logic

import instructor
from pydantic import BaseModel, Field, validator
from openai import OpenAI
from datetime import datetime

class EventInfo(BaseModel):
    title: str = Field(min_length=5, max_length=100)
    date: str = Field(description="Date in YYYY-MM-DD format")
    time: str = Field(description="Time in HH:MM format")
    attendees: int = Field(ge=1, le=10000)

    @validator('date')
    def validate_date(cls, v):
        try:
            datetime.strptime(v, '%Y-%m-%d')
            return v
        except ValueError:
            raise ValueError('Date must be in YYYY-MM-DD format')

client = instructor.from_openai(OpenAI())

event_description = """
AI in Healthcare Summit 2024
Date: April 15, 2024
Time: 9:00 AM
Expected Attendees: 500
"""

# Extract with automatic retries on validation errors (automatically traced)
event = client.chat.completions.create(
    model="gpt-4",
    response_model=EventInfo,
    messages=[{"role": "user", "content": f"Extract event info: {event_description}"}],
    max_retries=3
)

print(f"Event: {event.title} on {event.date} at {event.time}")

👍
Instructor is also compatible with our AI Gateway, to learn more, see Instructor.