Using Images and Vision in a Prompt

In the Playground, you can work with images in two powerful ways: generating images using Image Generation models and analyzing images using Vision models. This guide covers both use cases to help you leverage visual AI capabilities in your prompts.

To get started make sure you are familiar with Prompts, see Creating a Prompt.


Image Generation: Creating Images from Text

Image Generation models can create images based on text descriptions. These models are perfect for creative tasks, content generation, and visual prototyping.

Selecting an Image Generation Model

To select a model that is compatible with Image Generation, use a model that has an image tag next to its name:

The 'image' tag means that the model is capable of generating images

The image tag means that the model is capable of generating images

Configuring Parameters for Image Models

Image Generation models have different parameters compared to chat models. The parameters will be different for each image model and will impact the generated images.

Example of parameters for the `dall-e-3` Image Generation model.

Example of parameters for the dall-e-3 Image Generation model.

Using Image Generation in Playground

You can use image models just like any other model in the Playground. The generated images will appear as regular messages. You can click on the image to see it on fullscreen or in a new tab.

Example of an image generation using Leonardo AI

Example of an image generation using Leonardo AI


Use Cases

  • Creative Content: Generate artwork, illustrations, and visual content for marketing materials
  • Product Design: Create mockups and visual prototypes based on descriptions
  • Content Creation: Generate images for blogs, social media, and presentations
  • Concept Visualization: Turn abstract ideas into visual representations

Best Practices

  • Be Specific: Provide detailed descriptions for better results
  • Style Guidelines: Include artistic style, mood, and visual elements in your prompts
  • Parameter Tuning: Experiment with model-specific parameters to achieve desired output quality
  • Iterative Refinement: Use generated images as starting points for further refinement

Vision: Analyzing and Interpreting Images

Vision models can analyze, interpret, and understand images that you provide. These models are ideal for image analysis, document processing, visual question answering, and content moderation.

Selecting a Vision Model

To select a model that is compatible with Vision, use a model that has a vision tag next to its name:

The `vision` label means that the model is able to interpret images

The vision label means that the model is able to interpret images

Using Vision in the Playground

You can use Vision models just like any other model in the Playground.

To include an image as an input for your model, click on the image icon at the top-right of your message. You will then be able to share a link or upload an image to be sent to the model.

An example use case using a vision model

An example use case using a vision model

Use Cases

  • Document Processing: Extract text and information from scanned documents and forms
  • Visual Quality Control: Analyze product images for defects or compliance
  • Content Moderation: Automatically review images for inappropriate content
  • Medical Imaging: Analyze medical scans and diagnostic images (with appropriate models)
  • Insurance Claims: Process damage assessment photos and documentation

Best Practices

  • Image Quality: Ensure images are clear and well-lit for best analysis results
  • Specific Questions: Ask focused questions about what you want to extract or understand
  • Context Provision: Provide context about what the image represents for better interpretation
  • Multiple Angles: For complex analysis, consider providing multiple views of the same subject

Using Vision through Code

You can use vision models through our API and SDK to analyze images programmatically:

curl --request POST \
     --url https://api.orq.ai/v2/deployments/invoke \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "messages": [
    {
      "role": "user",
      "content": "describe what you see in this image"
    },
    {
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "Either a URL of the image or the base64 encoded image data."
          }
        }
      ]
    }
  ]
}
'