🎅🏼 Get -80% ->
80XMAS
Hours
Minutes
Seconds

Description

Overview

This News Extraction automation workflow enables systematic retrieval and processing of recent news posts from a website without an RSS feed, employing a no-code integration pipeline. Designed for data engineers and content managers, it automates the extraction of URLs, publication dates, and full content from news listings, producing summarized insights and technical keywords using AI language models.

Key Benefits

  • Automates weekly extraction of news posts using CSS selectors from HTML content.
  • Filters news articles by publication date, ensuring only recent posts are processed.
  • Generates concise content summaries with AI, optimizing information consumption.
  • Extracts key technical keywords from articles via AI-driven natural language processing.
  • Stores enriched news data reliably in a structured NocoDB SQL database for further use.

Product Overview

This News Extraction orchestration pipeline triggers weekly based on a scheduled cron event. It initiates by sending an HTTP request to retrieve the HTML of the news listing page. Using HTML extraction nodes configured with precise CSS selectors, it pulls arrays of individual news post links (href attributes) and their corresponding publication dates from specified DOM elements. These arrays are split into individual items and merged by position to associate each link with its date.

A JavaScript code node then filters the combined set to retain only posts published within the last seven days. For each filtered link, the workflow fetches the full news article HTML, extracting the title and main content using targeted CSS selectors. The content is sent to an AI text generation node which produces a summary capped at 70 words, alongside another AI call that identifies the top three technical keywords without explanation.

Summary and keyword outputs are renamed and merged for clarity, then combined with the original metadata (title, date, link). The final enriched JSON objects are pushed into a NocoDB SQL database table configured with appropriate fields, enabling structured storage and downstream querying. The workflow operates synchronously within each step, with default platform error handling applied. Authentication for AI and database access is via API keys securely managed by n8n credentials.

Features and Outcomes

Core Automation

This news extraction pipeline accepts raw HTML from a news listing page as input, applies CSS selector-based extraction to isolate links and publication dates, and uses a date-based filter for recent content. AI-powered summarization and keyword extraction nodes generate concise and relevant metadata for each article.

  • Single-pass evaluation merges and filters items by publication date deterministically.
  • Seamless combination of structured metadata with AI-generated text enrichments.
  • Synchronous node execution ensures consistent data flow and output integrity.

Integrations and Intake

The workflow integrates with external HTTP endpoints for HTML retrieval and uses OpenAI’s GPT model for natural language processing, authenticated via API keys. The intake consists of HTML pages containing news listings, expected to have consistent CSS structure for reliable extraction.

  • HTTP Request nodes fetch listing and detail pages for content scraping.
  • OpenAI API leverages GPT for content summarization and keyword extraction.
  • NocoDB API authenticates via token to store processed news data securely.

Outputs and Consumption

Outputs include JSON objects combining news titles, publication dates, URLs, AI-generated summaries, and keyword lists. Data is delivered asynchronously to a structured SQL database, enabling efficient querying and integration into downstream systems.

  • Structured JSON objects with keys: Title, Date, Link, Summary, Keywords.
  • Data is stored in a NocoDB SQL table optimized for news metadata management.
  • All outputs maintain consistent formatting for automated consumption workflows.

Workflow — End-to-End Execution

Step 1: Trigger

The workflow is initiated by a scheduled trigger node configured to run weekly on a specific day and time. This time-based event ensures periodic retrieval of news updates without manual intervention.

Step 2: Processing

Upon trigger, an HTTP Request node retrieves the HTML of the news listing page. Two HTML extraction nodes then apply CSS selectors to extract arrays of post links and publication dates. These arrays are split into individual JSON items for further processing and merged by their index position. Basic presence checks ensure extracted data is valid before filtering.

Step 3: Analysis

A JavaScript code node filters posts to include only those published within the last seven days. For each filtered post, an HTTP Request node fetches the full article HTML, from which title and content are extracted. AI-powered nodes then generate a succinct summary (under 70 words) and identify three key technical keywords from the content.

Step 4: Delivery

The enriched news data—combining original metadata with AI-generated summaries and keywords—is merged and sent to a NocoDB SQL database node. Data is stored asynchronously in predefined fields, enabling persistent archival and downstream analysis.

Use Cases

Scenario 1

Content managers need timely updates from news sites lacking RSS feeds. This workflow scrapes the latest posts, summarizes content, and extracts keywords, delivering structured data weekly. The result is a consistent feed of relevant news summaries ready for editorial review.

Scenario 2

Data analysts require consolidated, searchable news metadata. By filtering posts by date and enriching them with AI-generated summaries and keywords, this automation pipeline reduces manual curation effort and produces structured insights for trend analysis.

Scenario 3

IT teams managing content ingestion pipelines benefit from automated extraction and storage of news articles in SQL databases. This workflow ensures deterministic extraction steps and reliable data enrichment for integration with BI tools or content management systems.

How to use

Import the workflow into your n8n instance and configure credentials for OpenAI API and NocoDB database access. Adjust the schedule trigger to the preferred weekly interval. Verify and update CSS selectors if the news site’s HTML structure changes. Run the workflow manually once to validate extraction and data flow. Once active, the workflow will run automatically, producing weekly batches of summarized news posts with technical keywords, stored in the configured database.

Comparison — Manual Process vs. Automation Workflow

AttributeManual/AlternativeThis Workflow
Steps requiredMultiple manual steps: browsing, copying links, summarizing, keyword extraction, and data entry.Automates all steps from scraping to data storage in a single pipeline.
ConsistencySubject to human error and inconsistent summarization quality.Deterministic extraction and AI-generated summaries ensure uniform output.
ScalabilityLimited by manual effort; impractical for large volumes or frequent updates.Scales automatically with scheduled triggers and batch processing.
MaintenanceHigh effort to maintain selectors and manual processes.Requires occasional updates to CSS selectors and credential management only.

Technical Specifications

Environmentn8n automation platform with internet access
Tools / APIsHTTP Request, HTML Extract, JavaScript, OpenAI GPT API, NocoDB API
Execution ModelSynchronous node execution with scheduled trigger
Input FormatsHTML pages of news listings and individual articles
Output FormatsStructured JSON objects stored in NocoDB SQL database
Data HandlingTransient processing with no persistence outside configured database
Known ConstraintsRelies on stable CSS selectors and external API availability
CredentialsOpenAI API key, NocoDB API token

Implementation Requirements

  • Valid OpenAI API credentials with access to GPT model for summarization and keyword extraction.
  • NocoDB API token configured with write permissions to the target SQL database table.
  • Network access to the news website and external APIs (OpenAI, NocoDB) without firewall restrictions.

Configuration & Validation

  1. Set up API credentials for OpenAI and NocoDB within n8n credentials manager.
  2. Verify CSS selectors for link and date extraction match the current news site HTML structure.
  3. Run the workflow manually to confirm correct extraction, AI summarization, keyword generation, and database insertion.

Data Provenance

  • Trigger: Scheduled cron node initiating weekly execution.
  • Extraction Nodes: HTML Extract nodes using CSS selectors for links and dates.
  • AI Processing: OpenAI GPT nodes for generating summaries and technical keywords.

FAQ

How is the News Extraction automation workflow triggered?

The workflow is triggered by a scheduled trigger node configured to run once per week at a specified day and time, initiating the entire extraction and processing pipeline.

Which tools or models does the orchestration pipeline use?

The pipeline integrates HTTP Request nodes for web scraping, HTML extraction nodes with CSS selectors, and OpenAI’s GPT model for AI-powered summarization and keyword extraction, all orchestrated within n8n.

What does the response look like for client consumption?

The output is a structured JSON object containing each news post’s title, publication date, link, AI-generated summary, and a list of three technical keywords, all stored in a NocoDB SQL database.

Is any data persisted by the workflow?

No data is persisted within the workflow itself. The final enriched news data is saved only in the configured NocoDB SQL database for persistent storage and downstream use.

How are errors handled in this integration flow?

The workflow relies on n8n’s default error handling mechanisms. No explicit retry or backoff strategies are configured within the nodes, so failures are logged and require manual intervention.

Conclusion

This News Extraction workflow provides a dependable, repeatable process to scrape, summarize, and keyword-extract recent news posts from sites lacking RSS feeds. By automating content ingestion and enrichment on a weekly schedule, it reduces manual overhead and delivers structured metadata suitable for databases and analytical systems. The workflow depends on stable web page structure and external API availability, which requires periodic validation to maintain accuracy over time. Overall, it offers a precise, scalable method to transform raw HTML news content into actionable insights.

Additional information

Use Case

,

Platform

,

Risk Level (EU)

Tech Stack

Trigger Type

Skill Level

Data Sensitivity

Reviews

There are no reviews yet.

Be the first to review “News Extraction Automation Workflow with GPT Tools and HTML Formats”

Your email address will not be published. Required fields are marked *

Loading...

Vendor Information

  • Store Name: clepti
  • Vendor: clepti
  • No ratings found yet!

Product Enquiry

About the seller/store

Clepti is an automation specialist focused on dependable AI workflows and agentic systems that ship and stay online. I design end-to-end automations—intake, decision logic, approvals, execution, and audit trails—using robust building blocks: Python, REST/GraphQL APIs, event queues, vector search, and production-grade LLMs. My work centers on measurable outcomes: fewer manual touches, faster cycle times, lower error rates, and clear ROI.Typical projects include lead qualification and routing, document parsing and enrichment, multi-step data pipelines, customer support deflection with tool-using agents, and reporting that actually reconciles with source systems. I prioritize security (least privilege, logging, PII handling), testability (unit + sandbox runs), and maintainability (versioned prompts, clear configs, readable code). No inflated promises—just stable automation that replaces repetitive work.If you need an AI agent or workflow that integrates with your stack (CRMs, ticketing, spreadsheets, databases, or custom APIs) and runs every day without babysitting, I can help. Brief me on the problem, constraints, and success metrics; I’ll propose a straightforward plan and build something reliable.

30-Day Money-Back Guarantee

Easy refunds within 30 days of purchase – Shouldn’t you be happy with the automation/workflow you will get your money back with no questions asked.

News Extraction Automation Workflow with GPT Tools and HTML Formats

This news extraction workflow automates weekly retrieval and AI summarization of recent news posts using GPT tools and HTML formats, providing structured insights and technical keywords for content managers and data engineers.

49.99 $

You May Also Like

n8n workflow automates UK passport photo validation using AI vision and Google Drive integration

Passport Photo Validation Automation Workflow with AI Vision

Automate passport photo compliance checks using AI vision with Google Gemini Chat integration. This workflow validates portrait images against UK... More

41.99 $

clepti
Diagram of n8n workflow automating documentation creation with GPT-4 and Docsify, featuring Mermaid.js diagrams and live editing

Documentation Automation Workflow with GPT-4 Turbo & Mermaid.js

Automate workflow documentation generation with this no-code solution using GPT-4 Turbo and Mermaid.js for dynamic Markdown and HTML outputs, enhancing... More

42.99 $

clepti
Diagram of n8n workflow automating AI-based categorization and sorting of Outlook emails into folders

Outlook Email Categorization Automation Workflow with AI

Automate Outlook email sorting using AI-driven categorization to efficiently organize unread and uncategorized messages into predefined folders for streamlined inbox... More

42.99 $

clepti
Isometric illustration of an n8n workflow automating API schema discovery, extraction, and generation using Google Sheets and AI

API Schema Extraction Automation Workflow with Tools and Formats

Automate discovery and extraction of API documentation using this workflow that generates structured API schemas for technical teams and analysts.

... More

42.99 $

clepti
n8n workflow diagram showing Angie AI assistant processing voice and text via Telegram with Google Calendar, Gmail, and Baserow integration

Telegram AI Assistant Workflow for Voice & Text Automation

This Telegram AI assistant workflow processes voice and text inputs, integrating calendar, email, and database data to deliver precise, context-aware... More

42.99 $

clepti
n8n workflow automating phishing email detection, AI analysis, screenshot generation, and Jira ticket creation

Phishing Email Detection Automation Workflow for Gmail

Automate phishing email detection with this workflow that analyzes Gmail messages using AI and visual screenshots for accurate risk assessment... More

41.99 $

clepti
Isometric diagram of n8n workflow automating business email reading, summarizing, classifying, AI reply, and sending with vector database integration

Email AI Auto-Responder Automation Workflow for Business

Automate email intake and replies with this email AI auto-responder automation workflow. It summarizes, classifies, and responds to company info... More

41.99 $

clepti
n8n workflow automating AI-generated children's English stories with GPT and DALL-E, posting on Telegram every 12 hours

Children’s English Storytelling Automation Workflow with GPT-3.5

Automate engaging children's English storytelling with AI-generated narratives, audio narration, and image creation delivered every 12 hours via Telegram channels.

... More

41.99 $

clepti
n8n workflow automating customer feedback collection, OpenAI sentiment analysis, and Google Sheets storage

Customer Feedback Sentiment Analysis Automation Workflow

Streamline customer feedback capture and AI-powered sentiment classification with this event-driven automation workflow integrating OpenAI and Google Sheets.

... More

27.99 $

clepti
n8n workflow automating stock analysis with PDF ingestion, vector search, and AI-powered Q&A

Stock Q&A Workflow Automation for Financial Document Analysis

The Stock Q&A Workflow automates financial document ingestion and semantic indexing, enabling natural language queries and AI-driven stock analysis for... More

42.99 $

clepti
Isometric view of n8n LangChain workflow for question answering using sub-workflow data retrieval and OpenAI GPT model

LangChain Workflow Retriever Automation Workflow for Retrieval QA

This LangChain Workflow Retriever automation workflow enables precise retrieval-augmented question answering by integrating a sub-workflow retriever with OpenAI's language model,... More

42.99 $

clepti
Isometric diagram of n8n workflow automating Typeform feedback sentiment analysis and conditional Notion, Slack, Trello actions

Sentiment-Based Feedback Automation Workflow with Typeform and Google Cloud

Automate feedback processing using sentiment analysis from Typeform submissions with Google Cloud, routing results to Notion, Slack, or Trello for... More

42.99 $

clepti
Get Answers & Find Flows: