Description
Overview
This prompt-based object detection workflow enables precise identification and bounding box visualization of specific objects within images, exemplifying an image-to-insight orchestration pipeline. Designed for users seeking automated detection of visual elements, it addresses the problem of locating and marking multiple instances of targeted objects—rabbits in this case—within a photographic input. The workflow initiates via a manual trigger and utilizes an HTTP Request node to interact with an AI vision API capable of prompt-driven object detection.
Key Benefits
- Facilitates prompt-driven object detection enabling customized identification of visual elements.
- Automates extraction and normalization of bounding box coordinates for accurate image mapping.
- Integrates image metadata retrieval to scale coordinates precisely to original image dimensions.
- Visualizes detection results by drawing bounding boxes directly on the source image.
- Supports flexible input images and prompt variations for diverse no-code integration scenarios.
Product Overview
This automation workflow begins with a manual trigger node that initiates the process. The workflow downloads a test image via an HTTP Request node, retrieving an image of a petting zoo. Subsequently, an image information node extracts the image’s width and height metadata essential for coordinate scaling. The core logic involves sending the image alongside a prompt requesting bounding boxes for all rabbits to the Google Gemini 2.0 multimodal vision API through an authenticated HTTP Request node. The API returns bounding box coordinates normalized on a 0-1000 scale. These coordinates are parsed and rescaled to match the original image dimensions by a code node using deterministic mathematical transformations. Finally, an image editing node draws semi-transparent magenta bounding boxes onto the original image at the calculated positions. The workflow runs synchronously, producing a single annotated image output per execution cycle, with error handling relying on n8n’s default retry mechanisms. Authentication is managed by predefined Google Palm API credentials, ensuring secure access to the Gemini 2.0 service. No persistent storage of image or detection data occurs within the workflow.
Features and Outcomes
Core Automation
This image-to-insight automation workflow accepts an input image and a textual prompt specifying target objects for detection. It uses prompt-based object detection to identify bounding boxes around rabbits, then rescales coordinates to match the original image size before visualization.
- Deterministic coordinate rescaling from normalized to pixel values based on image dimensions.
- Single-pass evaluation of detected bounding boxes filtered by exact coordinate length.
- Consistent drawing of multiple bounding boxes in a single image editing operation.
Integrations and Intake
The orchestration pipeline integrates with external APIs and internal nodes to deliver prompt-driven detection. The workflow uses HTTP Request nodes for image download and Gemini 2.0 API calls, authenticated via predefined Google Palm API credentials.
- HTTP Request node for image download supporting any accessible JPEG image.
- Google Gemini 2.0 Object Detection API for prompt-based bounding box extraction.
- Edit Image node for obtaining image metadata (width and height) required for scaling.
Outputs and Consumption
The workflow outputs an annotated image with bounding boxes drawn around detected rabbits. The output preserves the original image format and includes graphical overlays indicating detection results.
- Annotated image in original JPEG format with bounding boxes overlaid.
- Synchronous output suitable for immediate downstream processing or display.
- Metadata and bounding box coordinates returned internally for further automation if required.
Workflow — End-to-End Execution
Step 1: Trigger
The workflow begins with a manual trigger node, which requires user initiation to start the process, enabling controlled execution for testing or on-demand detection tasks.
Step 2: Processing
After triggering, an HTTP Request node downloads a test image. The image is passed to an Edit Image node configured to extract image metadata, specifically width and height, which are necessary for coordinate scaling. Basic presence checks ensure the image is received and metadata is valid.
Step 3: Analysis
The image and a textual prompt requesting detection of all rabbits are sent to the Gemini 2.0 Object Detection API. The API returns normalized bounding box coordinates and labels in JSON format. These coordinates are parsed and filtered to retain only bounding boxes with four coordinate points. The code node rescales these normalized coordinates to pixel values based on the original image dimensions.
Step 4: Delivery
The final image editing node draws multiple bounding boxes onto the original image using the rescaled coordinates. The output is a single annotated image with bounding boxes rendered in a semi-transparent magenta color. The workflow completes by delivering this image synchronously for further use.
Use Cases
Scenario 1
A wildlife researcher needs to identify rabbits within a large set of photographs. This workflow automates detection by interpreting user prompts to locate rabbits and visually marking their positions, enabling rapid image annotation and verification within one processing cycle.
Scenario 2
A content moderator requires automatic identification of specific animals in user-uploaded images. The prompt-based object detection pipeline isolates rabbits by bounding box and outputs annotated images, reducing manual review efforts while maintaining consistent detection parameters.
Scenario 3
A developer building an image search tool integrates this no-code integration to dynamically detect rabbits within uploaded images. The workflow returns bounding boxes scaled to actual image dimensions, facilitating precise indexing and visual highlighting in search results.
How to use
To deploy this prompt-based object detection workflow, import the template into your n8n instance. Setup requires configuring Google Palm API credentials for authenticated calls to Gemini 2.0. Adjust the HTTP Request node URL to your desired input image if necessary. Execute the manual trigger node to start the workflow. The output annotated image is accessible immediately after execution, allowing visual confirmation of detected objects. For live use, trigger manually or extend with event-driven triggers based on your environment.
Comparison — Manual Process vs. Automation Workflow
| Attribute | Manual/Alternative | This Workflow |
|---|---|---|
| Steps required | Multiple manual steps including image inspection, coordinate calculation, and annotation | Single automated pipeline from image input to annotated output with no manual intervention |
| Consistency | Subject to human error and variable interpretation of image content | Deterministic, repeatable detection based on fixed prompt and scaling logic |
| Scalability | Limited by manual processing speed and workload capacity | Scales according to n8n and API throughput, enabling batch or repeated runs |
| Maintenance | Requires ongoing human effort and training for annotation accuracy | Maintained centrally, with updates focused on API credentials and node configurations |
Technical Specifications
| Environment | n8n automation platform |
|---|---|
| Tools / APIs | HTTP Request (image download, Gemini 2.0 API), Edit Image, Code, Set, Manual Trigger |
| Execution Model | Synchronous request–response for single image processing |
| Input Formats | JPEG images accessible via HTTP URL |
| Output Formats | JPEG image with graphical bounding boxes overlay |
| Data Handling | Transient in-memory processing with no persistent storage |
| Known Constraints | Bounding boxes limited to objects detected with exactly four coordinates |
| Credentials | Google Palm API key for Gemini 2.0 authentication |
Implementation Requirements
- Valid Google Palm API credentials with access to Gemini 2.0 Object Detection endpoints.
- n8n instance capable of executing HTTP Request, Edit Image, Code, and Set nodes.
- Internet access to retrieve test images and communicate with external APIs.
Configuration & Validation
- Import the workflow and configure Google Palm API credentials in n8n.
- Verify the HTTP Request node correctly downloads the test image and the Edit Image node extracts width and height metadata.
- Trigger the workflow manually and confirm bounding boxes are drawn on the output image as expected.
Data Provenance
- Manual Trigger node initiates workflow execution.
- HTTP Request nodes handle image retrieval and interaction with Gemini 2.0 API.
- Code node rescales normalized bounding box coordinates using metadata from Edit Image node.
FAQ
How is the prompt-based object detection automation workflow triggered?
The workflow is initiated manually using a trigger node that requires user intervention to start the detection process.
Which tools or models does the orchestration pipeline use?
The pipeline integrates an HTTP Request node calling Google Gemini 2.0’s Object Detection API authenticated via Google Palm API credentials.
What does the response look like for client consumption?
The workflow outputs a JPEG image annotated with bounding boxes drawn around detected rabbits, corresponding to rescaled coordinates.
Is any data persisted by the workflow?
No, all image data and detection results are processed transiently in memory without persistent storage.
How are errors handled in this integration flow?
Error handling relies on n8n’s built-in retry and backoff mechanisms; no custom error handling is defined explicitly.
Conclusion
This prompt-based object detection workflow reliably automates the detection and visualization of specific objects within images by integrating Gemini 2.0’s AI vision capabilities with image processing nodes. It delivers deterministic bounding box coordinates scaled precisely to original image dimensions and overlays these visually. The workflow’s synchronous execution model supports immediate consumption of annotated images with no data persistence. A known constraint is the dependency on external API availability and credentials for Gemini 2.0, which governs detection accuracy and uptime. This solution provides a technical foundation for embedding prompt-driven image analysis into broader automation pipelines.








Reviews
There are no reviews yet.