You can now add your own private and/or finetuned models to the Orq.ai model garden.

As of now, you can only add models hosted on Azure. Soon, we'll also support the self-onboarding of private models through OpenAI and Vertex AI.

Use the interactive walkthrough below to see how the onboarding process works.


Debug more effectively

by Cormick Marskamp

The Debug tab is back. Did your request fail, and do you want to look into the payload? Navigate towards the debug tab and find out what has been happening.

We did add the raw provider_response to our Python and Node SDK. This will give you the flexibility of retrieving model-specific output that we don't provide through our unified API.

See an example response below:

{
  "id": "01J3MZHS6KFEWE7WBSAG0JP25X",
  "created": "2024-07-25T12:58:41.235Z",
  "object": "image",
  "model": "leonard-vision-xl",
  "provider": "leonardoai",
  "is_final": true,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "url": "https://cdn.leonardo.ai/users/3bf1a64b-57f6-499b-907d-8b1286c1aa95/generations/dee000c8-7b6d-4d8c-82b4-d1992f8adce2/Default_a_tree_0.jpg"
      },
      "finish_reason": "stop"
    }
  ],
  "provider_response": {
    "generations_by_pk": {
      "generated_images": [
        {
          "url": "https://cdn.leonardo.ai/users/3bf1a64b-57f6-499b-907d-8b1286c1aa95/generations/dee000c8-7b6d-4d8c-82b4-d1992f8adce2/Default_a_tree_0.jpg",
          "nsfw": false,
          "id": "f47939ed-0841-48fa-b763-50e2e5b02e9a",
          "likeCount": 0,
          "motionMP4URL": null,
          "generated_image_variation_generics": []
        }
      ],
      "modelId": "5c232a9e-9061-4777-980a-ddc8e65647c6",
      "motion": null,
      "motionModel": null,
      "motionStrength": null,
      "prompt": "a tree",
      "negativePrompt": "",
      "imageHeight": 512,
      "imageToVideo": null,
      "imageWidth": 512,
      "inferenceSteps": 15,
      "seed": 4135326353,
      "public": false,
      "scheduler": "EULER_DISCRETE",
      "sdVersion": "SDXL_0_9",
      "status": "COMPLETE",
      "presetStyle": "CINEMATIC",
      "initStrength": null,
      "guidanceScale": null,
      "id": "dee000c8-7b6d-4d8c-82b4-d1992f8adce2",
      "createdAt": "2024-07-25T12:58:43.034",
      "promptMagic": false,
      "promptMagicVersion": null,
      "promptMagicStrength": null,
      "photoReal": false,
      "photoRealStrength": null,
      "fantasyAvatar": null,
      "generation_elements": []
    }
  }
}

GPT-4o mini

by Cormick Marskamp

OpenAI has introduced GPT-4o mini, a smaller, more efficient, and cheaper model replacing GPT-3.5 Turbo. This follows similar moves by Anthropic (Claude 3 Haiku) and Google (Gemini 1.5 Flash).

Key features:

  • Significantly smarter than GPT-3.5 Turbo, but not as capable as GPT-4o (see benchmark).
  • 60% cheaper compared to GPT-3.5 Turbo.
  • It's a vision model, meaning it can interpret images.
  • Context window is as big as GPT-4o (128k).

Why a "weaker" model when there is a better one? Quite simply: Not every task needs the best model.

Recommendation: For simple tasks such as summarizing texts or improving formulations, 4o-mini is perfectly sufficient. When more knowledge and complex "thinking" is required, you resort to 4o.


small model comparison

GPT-4o-mini in comparison to other smaller models and GPT-4o

Human in the Loop

by Cormick Marskamp

With the improved Human-in-the-loop feature, you have more control over your AI. You can collect feedback from your end users and have your domain experts annotate feedback and corrections on each log for future improvements.

For example: The model generates an output. Your domain expert checks it and sees that it is 95% correct. It flags the output as incomplete and adds a correction to make it 100% correct.

All the checked and corrected logs can be saved to a dataset, allowing you to create curated datasets. These curated datasets can be used to finetune your model.

There are two options to log feedback:

  1. via the Orq.ai user interface
  2. via the API

Use the interactive walkthrough below to see how to monitor, flag, and correct human feedback within Orq.ai


There are three feedback properties: Rating, Defects, and Interactions.

RatingDefectsInteractions
GoodGrammaticalSaved
BadSpellingSelected
HallucinationDeleted
RepetitionShared
InappropriateCopied
Off-TopicReported
Incompleteness
Ambiguity

See the code snippet below as an example of how to log feedback via the API:

client.feedback.report(
    property="defects",
    value=["grammatical", "hallucination"],  # Can include multiple defects
    trace_id="unique-trace-id"
)
const feedbackPayload: FeedbackReport = {
  property: 'defects',
  value: ['grammatical', 'hallucination'], // Can include multiple defects
  trace_id: 'unique-trace-id',
};

📘

For the technical documentation, please see: Node SDK & Python SDK

Claude 3.5 Sonnet

by Cormick Marskamp

Two hours after Claude 3.5 Sonnet was released, it became available on Orq.ai. Toggle it on in the model garden and try it out yourself!

A little about the new model. First off, Claude 3.5 Sonnet can handle both text and images. It's also Anthropic's top-performing model to date, outshining the previous Claude 3 Sonnet and Claude 3 Opus in various AI benchmarks (see image).

One area where Claude 3.5 Sonnet does seem to excel is in speed. Anthropic claims it's about twice as fast as Claude 3 Opus, which could be a game-changer for developers building apps that need quick responses, like customer service chatbots.

The model's vision capabilities have also seen a significant boost. It's better at interpreting charts, graphs, and even reading text from less-than-perfect images. This could open up some interesting possibilities for real-world applications.

However, it's worth noting that Claude 3.5 Sonnet still has the same context window of 200,000 tokens as its predecessor. While this is a decent amount (about 150,000 words), it's not a step up from what we've seen before.

Prompt library

by Cormick Marskamp

The new Prompt library lets you create prompt templates that you can reuse throughout the whole platform.

With the filtering capabilities, you can easily find the right prompt for your use case and effectively manage multiple prompts.

Check the new feature out in your workspace or in this interactive walkthrough below.

Llama 3 on Perplexity

by Cormick Marskamp

Start using the best open source model, hosted on perplexity, in Orq.

You could already use the Llama 3 models hosted on Anyscale and Groq. However, being able to use Llama-3 on Perplexity opens up new possibilities and use cases.

Because the model is able to go online, it is able to:

  • Generate up-to-date responses
  • Retrieve dynamic data about the latest news etc.
  • Understand real-world context better

The example below showcases that only Llama-3 on perplexity is able to generate the current temperature in Amsterdam.

Multi API key selector

by Cormick Marskamp

You can now select which API key to use in your Playgrounds, Experiments, and Deployments.

Select which API key to use with the new integration selector pill.

This only works after you have set up more than one API key. Note: This feature does not apply on Azure and AWS.

In the example below, I use the exact same model configuration as my fallback model. However, I will use my second API key in case the first one doesn't work.

Deployment test run

by Cormick Marskamp

Preview your LLM call in the Deployment studio.

This new feature allows you to quickly do a test run. You still have the option to open the same configuration in the Playground for further testing.