Prompt-Based Object Detection Tools for Image Automation

Description

Overview

This prompt-based object detection workflow enables precise identification and bounding box visualization of specific objects within images, exemplifying an image-to-insight orchestration pipeline. Designed for users seeking automated detection of visual elements, it addresses the problem of locating and marking multiple instances of targeted objects—rabbits in this case—within a photographic input. The workflow initiates via a manual trigger and utilizes an HTTP Request node to interact with an AI vision API capable of prompt-driven object detection.

Key Benefits

Facilitates prompt-driven object detection enabling customized identification of visual elements.
Automates extraction and normalization of bounding box coordinates for accurate image mapping.
Integrates image metadata retrieval to scale coordinates precisely to original image dimensions.
Visualizes detection results by drawing bounding boxes directly on the source image.
Supports flexible input images and prompt variations for diverse no-code integration scenarios.

Product Overview

This automation workflow begins with a manual trigger node that initiates the process. The workflow downloads a test image via an HTTP Request node, retrieving an image of a petting zoo. Subsequently, an image information node extracts the image’s width and height metadata essential for coordinate scaling. The core logic involves sending the image alongside a prompt requesting bounding boxes for all rabbits to the Google Gemini 2.0 multimodal vision API through an authenticated HTTP Request node. The API returns bounding box coordinates normalized on a 0-1000 scale. These coordinates are parsed and rescaled to match the original image dimensions by a code node using deterministic mathematical transformations. Finally, an image editing node draws semi-transparent magenta bounding boxes onto the original image at the calculated positions. The workflow runs synchronously, producing a single annotated image output per execution cycle, with error handling relying on n8n’s default retry mechanisms. Authentication is managed by predefined Google Palm API credentials, ensuring secure access to the Gemini 2.0 service. No persistent storage of image or detection data occurs within the workflow.

Features and Outcomes

Core Automation

This image-to-insight automation workflow accepts an input image and a textual prompt specifying target objects for detection. It uses prompt-based object detection to identify bounding boxes around rabbits, then rescales coordinates to match the original image size before visualization.

Deterministic coordinate rescaling from normalized to pixel values based on image dimensions.
Single-pass evaluation of detected bounding boxes filtered by exact coordinate length.
Consistent drawing of multiple bounding boxes in a single image editing operation.

Integrations and Intake

The orchestration pipeline integrates with external APIs and internal nodes to deliver prompt-driven detection. The workflow uses HTTP Request nodes for image download and Gemini 2.0 API calls, authenticated via predefined Google Palm API credentials.

HTTP Request node for image download supporting any accessible JPEG image.
Google Gemini 2.0 Object Detection API for prompt-based bounding box extraction.
Edit Image node for obtaining image metadata (width and height) required for scaling.

Outputs and Consumption

The workflow outputs an annotated image with bounding boxes drawn around detected rabbits. The output preserves the original image format and includes graphical overlays indicating detection results.

Annotated image in original JPEG format with bounding boxes overlaid.
Synchronous output suitable for immediate downstream processing or display.
Metadata and bounding box coordinates returned internally for further automation if required.

Workflow — End-to-End Execution

Step 1: Trigger

The workflow begins with a manual trigger node, which requires user initiation to start the process, enabling controlled execution for testing or on-demand detection tasks.

Step 2: Processing

After triggering, an HTTP Request node downloads a test image. The image is passed to an Edit Image node configured to extract image metadata, specifically width and height, which are necessary for coordinate scaling. Basic presence checks ensure the image is received and metadata is valid.

Step 3: Analysis

The image and a textual prompt requesting detection of all rabbits are sent to the Gemini 2.0 Object Detection API. The API returns normalized bounding box coordinates and labels in JSON format. These coordinates are parsed and filtered to retain only bounding boxes with four coordinate points. The code node rescales these normalized coordinates to pixel values based on the original image dimensions.

Step 4: Delivery

The final image editing node draws multiple bounding boxes onto the original image using the rescaled coordinates. The output is a single annotated image with bounding boxes rendered in a semi-transparent magenta color. The workflow completes by delivering this image synchronously for further use.

Use Cases

Scenario 1

A wildlife researcher needs to identify rabbits within a large set of photographs. This workflow automates detection by interpreting user prompts to locate rabbits and visually marking their positions, enabling rapid image annotation and verification within one processing cycle.

Scenario 2

A content moderator requires automatic identification of specific animals in user-uploaded images. The prompt-based object detection pipeline isolates rabbits by bounding box and outputs annotated images, reducing manual review efforts while maintaining consistent detection parameters.

Scenario 3

A developer building an image search tool integrates this no-code integration to dynamically detect rabbits within uploaded images. The workflow returns bounding boxes scaled to actual image dimensions, facilitating precise indexing and visual highlighting in search results.

How to use

To deploy this prompt-based object detection workflow, import the template into your n8n instance. Setup requires configuring Google Palm API credentials for authenticated calls to Gemini 2.0. Adjust the HTTP Request node URL to your desired input image if necessary. Execute the manual trigger node to start the workflow. The output annotated image is accessible immediately after execution, allowing visual confirmation of detected objects. For live use, trigger manually or extend with event-driven triggers based on your environment.

Comparison — Manual Process vs. Automation Workflow

Attribute	Manual/Alternative	This Workflow
Steps required	Multiple manual steps including image inspection, coordinate calculation, and annotation	Single automated pipeline from image input to annotated output with no manual intervention
Consistency	Subject to human error and variable interpretation of image content	Deterministic, repeatable detection based on fixed prompt and scaling logic
Scalability	Limited by manual processing speed and workload capacity	Scales according to n8n and API throughput, enabling batch or repeated runs
Maintenance	Requires ongoing human effort and training for annotation accuracy	Maintained centrally, with updates focused on API credentials and node configurations

Technical Specifications

Environment	n8n automation platform
Tools / APIs	HTTP Request (image download, Gemini 2.0 API), Edit Image, Code, Set, Manual Trigger
Execution Model	Synchronous request–response for single image processing
Input Formats	JPEG images accessible via HTTP URL
Output Formats	JPEG image with graphical bounding boxes overlay
Data Handling	Transient in-memory processing with no persistent storage
Known Constraints	Bounding boxes limited to objects detected with exactly four coordinates
Credentials	Google Palm API key for Gemini 2.0 authentication

Implementation Requirements

Valid Google Palm API credentials with access to Gemini 2.0 Object Detection endpoints.
n8n instance capable of executing HTTP Request, Edit Image, Code, and Set nodes.
Internet access to retrieve test images and communicate with external APIs.

Configuration & Validation

Import the workflow and configure Google Palm API credentials in n8n.
Verify the HTTP Request node correctly downloads the test image and the Edit Image node extracts width and height metadata.
Trigger the workflow manually and confirm bounding boxes are drawn on the output image as expected.

Data Provenance

Manual Trigger node initiates workflow execution.
HTTP Request nodes handle image retrieval and interaction with Gemini 2.0 API.
Code node rescales normalized bounding box coordinates using metadata from Edit Image node.

FAQ

How is the prompt-based object detection automation workflow triggered?

The workflow is initiated manually using a trigger node that requires user intervention to start the detection process.

Which tools or models does the orchestration pipeline use?

The pipeline integrates an HTTP Request node calling Google Gemini 2.0’s Object Detection API authenticated via Google Palm API credentials.

What does the response look like for client consumption?

The workflow outputs a JPEG image annotated with bounding boxes drawn around detected rabbits, corresponding to rescaled coordinates.

Is any data persisted by the workflow?

No, all image data and detection results are processed transiently in memory without persistent storage.

How are errors handled in this integration flow?

Error handling relies on n8n’s built-in retry and backoff mechanisms; no custom error handling is defined explicitly.

Conclusion

This prompt-based object detection workflow reliably automates the detection and visualization of specific objects within images by integrating Gemini 2.0’s AI vision capabilities with image processing nodes. It delivers deterministic bounding box coordinates scaled precisely to original image dimensions and overlays these visually. The workflow’s synchronous execution model supports immediate consumption of annotated images with no data persistence. A known constraint is the dependency on external API availability and credentials for Gemini 2.0, which governs detection accuracy and uptime. This solution provides a technical foundation for embedding prompt-driven image analysis into broader automation pipelines.

Additional information

Use Case	Data Analytics, IT & Dev
Platform	n8n, OpenAI GPT
Risk Level (EU)	GPAI
Tech Stack	Custom API
Trigger Type	Manual Run
Skill Level	Developer friendly, Low Code
Data Sensitivity	No PII