Description
Overview
This Hacker News headlines archival automation workflow enables a systematic retrieval and analysis of historical front-page headlines for the same calendar day across multiple years. As a precise event-driven analysis pipeline, it collects daily snapshots since 2007, providing insights into the evolution of technology trends over time by aggregating and categorizing headlines from each year.
Key Benefits
- Automates daily extraction of Hacker News headlines, ensuring consistent long-term data capture.
- Generates a chronological list of dates for multi-year headline retrieval, supporting trend analysis.
- Utilizes HTML parsing to accurately extract headline text while excluding extraneous elements.
- Leverages a no-code integration with a large language model to categorize and summarize key headlines.
Product Overview
This automation workflow initiates with a schedule trigger set to 21:00 daily, ensuring regular execution. It dynamically generates a list of ISO-formatted dates corresponding to the current day and month for each year from the present back to 2007, with special handling to begin from February 19, 2007. Each date is processed individually through an HTTP request node that fetches the Hacker News front page HTML for the specified historical date, using batched requests with a 3-second interval to moderate load. The extracted HTML content is parsed to obtain headline titles via CSS selectors, explicitly excluding nested span elements to ensure headline clarity. Headlines are paired with their respective dates and then aggregated into a single JSON array. This consolidated dataset is passed to an LLM chain node configured with a detailed prompt to identify and thematically categorize the top 10-15 headlines across years, outputting results in Markdown format with year-prefixed, hyperlinked headlines. The workflow utilizes Google Gemini Chat Model for natural language processing and finally dispatches the formatted output via Telegram for distribution. Error handling defaults to platform standards without custom retry or backoff strategies.
Features and Outcomes
Core Automation
This event-driven analysis pipeline accepts scheduled triggers and dynamically constructs date arrays, enabling sequential multi-year headline retrieval. It applies deterministic mapping of dates to HTTP requests and aggregates results for comprehensive review.
- Single-pass evaluation of each historical date to fetch front-page headlines.
- Structured combination of headlines with date metadata for unified processing.
- Automated Markdown-formatted output generation via an LLM chain.
Integrations and Intake
This orchestration pipeline integrates with Hacker News via HTTP GET requests and uses CSS selectors to extract relevant HTML content. Authentication is not required for data retrieval, and the expected payload is a standard front-page HTML response filtered by date query parameters.
- HTTP Request node queries Hacker News front page by historical date.
- HTML Extract node parses headlines using CSS selector `.titleline` excluding nested spans.
- Google Gemini Chat Model node for LLM-based text analysis and summarization.
Outputs and Consumption
The workflow produces Markdown-formatted text categorizing top headlines with year-prefixed hyperlinks. The output is sent synchronously to a Telegram channel, formatted for direct consumption by subscribers or downstream applications.
- Markdown text with bullet points grouped by thematic categories.
- Synchronous delivery to Telegram using chatId credentials.
- Output includes year, headline title, and source URL for reference.
Workflow — End-to-End Execution
Step 1: Trigger
The workflow is initiated by a schedule trigger node configured to execute daily at 21:00 (9 PM). This deterministic timing ensures consistent daily updates of historical headline data.
Step 2: Processing
Upon triggering, a code node generates an array of dates from the current year back to 2007, aligning month and day while applying special logic to exclude dates before February 19, 2007. The list is cleaned and split for individual processing. Basic presence checks validate date formatting before HTTP requests proceed.
Step 3: Analysis
For each date, an HTTP request retrieves the Hacker News front page HTML. An HTML extraction node parses headlines using the `.titleline` CSS selector, excluding nested span elements to isolate headline text. Headlines and dates are merged, aggregated into a single JSON array, and passed to a Basic LLM Chain node, which applies a custom prompt to identify, categorize, and summarize the most significant headlines across years.
Step 4: Delivery
The categorized Markdown output generated by the LLM is delivered synchronously to a Telegram channel via an API credentialed node, formatted to support Markdown parsing without appended attribution, making the information immediately accessible to recipients.
Use Cases
Scenario 1
Technology historians seek to analyze shifts in industry focus over time. This workflow automates the extraction and categorization of daily historical headlines, providing structured insight into evolving technological themes on specific calendar days.
Scenario 2
Content curators want to deliver timely retrospectives highlighting significant tech news anniversaries. Using this automation pipeline, they receive daily Markdown summaries of top headlines from past years, enabling streamlined content generation without manual aggregation.
Scenario 3
Data analysts require longitudinal datasets of tech news trends for machine learning modeling. This workflow produces consistent, date-aligned headline arrays spanning over a decade, facilitating comparative event-driven analysis across years.
How to use
Deploy the workflow within the n8n environment and configure the Telegram API credentials for message dispatch. The schedule trigger activates the pipeline daily at the preset time, automatically generating the historical date list and fetching corresponding headlines. Results are processed through the LLM chain and delivered as Markdown via Telegram. Users should verify API access and connectivity to Hacker News and Telegram services. Output can be reviewed live within the Telegram channel or logged for archival analysis.
Comparison — Manual Process vs. Automation Workflow
| Attribute | Manual/Alternative | This Workflow |
|---|---|---|
| Steps required | Multi-step manual data collection, parsing, and formatting per year. | Single automated pipeline with scheduled execution and integrated processing. |
| Consistency | Variable, dependent on manual effort and human error. | Deterministic, with fixed schedule and programmatic data handling. |
| Scalability | Limited, manual effort increases with years added. | Scales linearly with number of years; automated batch requests manage load. |
| Maintenance | High, requiring ongoing human oversight and content verification. | Moderate, reliant on stable HTML structure and API availability. |
Technical Specifications
| Environment | n8n automation platform with internet access |
|---|---|
| Tools / APIs | Hacker News front page HTTP endpoint, Telegram API, Google Gemini Chat Model |
| Execution Model | Scheduled event-driven batch processing with synchronous delivery |
| Input Formats | ISO 8601 date strings for historical day selection |
| Output Formats | Markdown-formatted text delivered via Telegram message |
| Data Handling | Transient processing with no data persistence beyond runtime |
| Known Constraints | Dependent on Hacker News HTML structure and API availability |
| Credentials | Telegram API key, Google PaLM API key for AI model access |
Implementation Requirements
- Valid Telegram API credentials configured for message dispatch.
- Stable internet connectivity to access Hacker News front page and Google Gemini model.
- n8n environment with permissions to execute scheduled triggers and HTTP requests.
Configuration & Validation
- Confirm schedule trigger activates daily at 21:00 and initiates date list generation.
- Verify HTTP requests retrieve valid HTML pages for the specified historical dates.
- Check Telegram messages receive expected Markdown output with correct headline formatting.
Data Provenance
- Schedule Trigger node initiates daily workflow execution based on fixed time.
- GetFrontPage HTTP Request node retrieves Hacker News front pages filtered by date parameter.
- Basic LLM Chain node applies prompt-driven AI analysis to aggregate and classify headlines.
FAQ
How is the Hacker News headlines archival automation workflow triggered?
It is triggered by a schedule node configured to run daily at 21:00, initiating data collection for the current calendar day across multiple years.
Which tools or models does the orchestration pipeline use?
The workflow integrates Hacker News HTTP requests, HTML extraction nodes, and uses the Google Gemini Chat Model via an LLM chain for headline categorization and summarization.
What does the response look like for client consumption?
The output is Markdown-formatted text grouping top headlines into thematic bullet points, each prepended with the year and hyperlinked to the original source URL, delivered via Telegram.
Is any data persisted by the workflow?
No data is persisted beyond runtime; all processing occurs transiently within the workflow execution context.
How are errors handled in this integration flow?
The workflow relies on n8n’s default error handling without custom retries or backoff strategies, assuming stable endpoint availability.
Conclusion
This workflow provides a dependable method for automating the collection and thematic analysis of Hacker News front-page headlines across years for the same calendar day. It produces structured Markdown outputs suitable for retrospective insights into technology trends. The process depends on consistent access to Hacker News’s front page HTML structure and Google Gemini model availability, which represents a constraint on operational continuity. Overall, it offers a reproducible, event-driven analysis pipeline that reduces manual effort and ensures uniform data formatting for historical news aggregation.








Reviews
There are no reviews yet.