Bank Statement Conversion with Vision Language Models in Automation

Description

Overview

This bank statement to markdown conversion workflow automates the transcription of scanned or digital PDF bank statements into structured markdown text, leveraging a vision language model (VLM) and a no-code integration pipeline. Designed for financial analysts and developers, this automation workflow handles complex document layouts by converting PDF pages to images, then transcribing them with Google Gemini AI to produce markdown output suitable for further data extraction.

Key Benefits

Automates conversion of bank statement PDFs into richly formatted markdown text for easy parsing.
Handles scanned and digital PDFs by transforming pages into images before transcription.
Uses a vision language model to accurately capture tables, headings, and multi-row cells in markdown.
Extracts structured deposit data from markdown using a dedicated information extraction model.

Product Overview

This automation workflow initiates manually via the “When clicking ‘Test workflow’” manual trigger node. It downloads a bank statement PDF directly from Google Drive using OAuth credentials, ensuring controlled access to the source document. Since vision language models require image inputs, the PDF is converted into separate JPEG images at 300 DPI by an external PDF-to-image conversion service. The resulting ZIP archive is extracted to isolate individual page images, which are then sorted by filename to maintain correct page order.

Images are resized to 75% of their original dimensions to optimize processing speed while preserving sufficient resolution for transcription. Each resized page is transcribed into markdown by the Google Gemini Chat Model via LangChain integration, capturing all visible text, tables, and document structure. Markdown outputs from all pages are aggregated into a single dataset. Subsequently, an information extraction node powered by Gemini AI parses the combined markdown text to identify and extract deposit rows, outputting a structured JSON array with date, description, and amount fields. Error handling relies on platform defaults, with no custom retry logic configured.

Features and Outcomes

Core Automation

This image-to-insight orchestration pipeline inputs scanned or digital bank statement PDFs, converting pages into images and transcribing them into markdown text using a vision language model. It applies deterministic processing steps including sorting and resizing images before transcription.

Single-pass evaluation of each page image with markdown transcription preserving tables and headings.
Maintains page order through filename-based sorting, ensuring consistent document reconstruction.
Structured deposit data extraction from aggregated markdown, formatted as JSON array with key fields.

Integrations and Intake

The automation workflow integrates with Google Drive via OAuth2 authentication to download bank statement PDFs. It relies on an external HTTP-based PDF-to-image conversion service accepting multipart form-data uploads. The vision language model uses Google Gemini AI credentials for secure API access.

Google Drive node for secure PDF file retrieval using OAuth2 credentials.
HTTP Request node connecting to Stirling PDF API for PDF-to-JPEG conversion.
Google Gemini Chat Model for markdown transcription and information extraction via API key authentication.

Outputs and Consumption

Outputs include a markdown transcript of the entire bank statement and a structured JSON array of deposit entries extracted from the markdown tables. The workflow operates synchronously, aggregating page transcriptions before data extraction.

Markdown text output retaining tables, headings, and document structure.
JSON array output with deposit records including date, description, and amount.
Synchronous aggregation of all page transcriptions prior to final extraction step.

Workflow — End-to-End Execution

Step 1: Trigger

The workflow begins with a manual trigger node labeled “When clicking ‘Test workflow’,” allowing controlled initiation. This node requires user interaction to start processing.

Step 2: Processing

The “Get Bank Statement” node downloads a bank statement PDF from Google Drive using OAuth2 credentials and a specified file ID. The “Split PDF into Images” node sends the PDF to an external service that converts each page into separate JPEG images at 300 DPI, returning a ZIP archive. The subsequent node extracts this ZIP archive, isolating individual images. A code node transforms these binaries into a list for further processing. Images are then sorted by filename to maintain page sequence and resized to 75% scale for optimized transcription.

Step 3: Analysis

The resized images are transcribed into markdown format via the Google Gemini Chat Model node. The model is prompted to faithfully replicate text, headings, and tables, converting complex layouts including multi-row cells and horizontally adjacent tables into vertical markdown tables. Transcriptions of all pages are aggregated into a single JSON field. Then, an information extraction node uses another Gemini Chat Model to parse the aggregated markdown, extracting deposit rows with date, description, and amount fields.

Step 4: Delivery

The workflow produces two primary outputs: a combined markdown transcription of the entire bank statement and a structured JSON array containing extracted deposit entries. These outputs are returned synchronously at the end of the workflow for downstream consumption or integration.

Use Cases

Scenario 1

A financial analyst needs to convert scanned bank statements into a machine-readable format. This workflow transforms scanned PDFs into markdown text, preserving tables and layout, enabling automated extraction of deposit data for reconciliation and reporting.

Scenario 2

An accounting software developer requires a no-code integration to process monthly bank statements. This automation pipeline downloads PDFs from Google Drive, converts pages to images, transcribes them into markdown, and extracts deposit line items as structured JSON, facilitating seamless data ingestion.

Scenario 3

A compliance team must audit deposits across multiple bank statements, some scanned and some digital. This workflow handles both formats, converting PDFs to images and extracting deposit records deterministically, ensuring consistent data extraction across document types.

How to use

After importing this workflow into n8n, configure the Google Drive OAuth2 credentials to enable PDF download access. Replace the file ID in the “Get Bank Statement” node with your target statement file. Ensure connectivity to the external PDF-to-image API or deploy a self-hosted equivalent for privacy. Provide Google Gemini API credentials for the transcription and extraction nodes. Run the workflow manually via the trigger node to convert the bank statement into markdown and extract deposits. Outputs are available immediately after execution for inspection or integration.

Comparison — Manual Process vs. Automation Workflow

Attribute	Manual/Alternative	This Workflow
Steps required	Multiple manual steps including downloading, converting, transcribing, and data entry.	Single automated pipeline from PDF download through data extraction.
Consistency	Varies by manual transcription accuracy and human error.	Deterministic processing with consistent markdown transcription and extraction logic.
Scalability	Limited by manual labor and document volume.	Scalable to multiple documents with minimal manual intervention.
Maintenance	High due to manual workflows and error correction.	Low, maintained through node configuration and credential updates.

Technical Specifications

Environment	n8n workflow automation platform
Tools / APIs	Google Drive API (OAuth2), Stirling PDF conversion API, Google Gemini AI (PaLM) API
Execution Model	Synchronous, manual trigger initiation
Input Formats	PDF files (scanned or digital)
Output Formats	Markdown text, JSON array of deposit entries
Data Handling	Transient processing, no persistent storage within workflow
Known Constraints	Requires external PDF-to-image conversion service; dependent on API availability
Credentials	Google Drive OAuth2, Google Gemini API key

Implementation Requirements

Valid Google Drive OAuth2 credentials with access to target PDF file.
Access to an external PDF-to-image conversion service supporting multipart form-data uploads.
Google Gemini AI API credentials for transcription and information extraction nodes.

Configuration & Validation

Verify Google Drive OAuth2 connection and confirm access to the specified PDF file ID.
Test connectivity and response from the external PDF-to-image conversion API with sample PDFs.
Validate Google Gemini AI credentials by running the transcription node on sample images and checking markdown output integrity.

Data Provenance

Workflow triggered manually via “When clicking ‘Test workflow’” manual trigger node.
Google Drive node downloads bank statement PDFs using OAuth2 credentials.
Google Gemini Chat Model nodes power both markdown transcription and deposit extraction processes.

FAQ

How is the bank statement to markdown conversion workflow triggered?

The workflow is initiated manually through the “When clicking ‘Test workflow’” node, requiring user interaction to start processing.

Which tools or models does the orchestration pipeline use?

The pipeline integrates Google Drive for PDF retrieval, an external PDF-to-image conversion service, and Google Gemini AI models for both markdown transcription and deposit data extraction.

What does the response look like for client consumption?

The workflow outputs combined markdown text representing the full bank statement and a structured JSON array containing extracted deposit entries with date, description, and amount fields.

Is any data persisted by the workflow?

No persistent storage is implemented; all data processing is transient within the workflow execution context.

How are errors handled in this integration flow?

Error handling relies on n8n’s platform defaults; no custom retry or backoff mechanisms are configured in this workflow.

Conclusion

This workflow provides a deterministic automation workflow converting bank statement PDFs into structured markdown text and extracting deposit data with vision-enabled language models. It supports scanned and digital PDFs by leveraging image conversion and resizing optimizations. While effective, it depends on the availability of an external PDF-to-image conversion service and cloud API credentials for Google Gemini AI, which can impact reliability if unavailable. The workflow delivers consistent, structured outputs suitable for financial data processing with minimal manual intervention.

Additional information

Use Case	Finance & Accounting
Platform	n8n
Risk Level (EU)	GPAI
Tech Stack	Custom API
Trigger Type	Manual Run
Skill Level	Developer friendly
Data Sensitivity	Contains PII, Finance Data