From 8fcc39944991b48cbf4fd83f38bcfb97a4698a4c Mon Sep 17 00:00:00 2001
From: Nubenetes Bot <bot@nubenetes.com>
Date: Sat, 16 May 2026 23:16:46 +0200
Subject: [PATCH] docs: finalize project mandates with semantic deduplication,
 tiered monitoring, and RSS discovery policies

---
 GEMINI.md | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/GEMINI.md b/GEMINI.md
index 371fd331..1606ace0 100644
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -48,11 +48,18 @@ This file contains the accumulated instructions and long-term vision for the aut
     - **Exhaustive Initialization**: The system supports a `FORCE_FULL_CHECK` environment variable to bypass all caches (e.g., 21-day health cache) and force a full re-validation and re-enrichment of the entire 17k+ link archive.
     - **No Trusted Bypassing**: All domains, including high-trust ones (GitHub, Google, AWS), MUST be verified for link validity. Trusted status only grants a lower priority for aggressive scraper rotation, not a bypass for existence checks.
     - **Manual Priority**: AI agents MUST NOT overwrite existing manual descriptions in the V1 archive files. Enrichment is strictly for `inventory.yaml` and the V2 portal.
-23. **Canonical URL Normalization**: To prevent duplication and fragmented metadata, all agents MUST normalize URLs before any inventory operation.
+23. **Canonical URL Normalization & Semantic Deduplication**: To prevent duplication and fragmented metadata, all agents MUST normalize URLs before any inventory operation.
     - **Tracking Stripping**: Systematically remove UTM parameters, social media trackers (X.com, LinkedIn), and URL fragments (`#`).
     - **Protocol Uniformity**: Standardize on `https://` whenever possible.
-    - **Merge Logic**: Metadata from multiple sources for the same canonical URL MUST be merged, prioritizing the highest star rating and most recent date.
-24. **Dynamic AI Model Discovery**: To remain at the cutting edge and ensure system stability, all agents MUST use the dynamic model discovery engine.
+    - **Semantic Merge Logic**: If multiple URLs point to the same technical project (e.g., `user.github.io` vs `github.com/user/repo`), the agent MUST consolidate them into a single canonical reference, prioritizing the primary repository root.
+    - **Metadata Merge**: Metadata from multiple sources for the same canonical URL MUST be merged, prioritizing the highest star rating and most recent date.
+24. **Multi-Source Knowledge Discovery**: The discovery engine MUST be extensible beyond social media. 
+    - **Engineering Blogs**: High-depth technical content from engineering blogs (via RSS/Atom) MUST be prioritized for high-impact dimensions.
+    - **Source Diversity**: Monitor X.com, GitHub Trending, and RSS Feeds to maintain a balanced flow of technical news and architectural deep-dives.
+25. **Tiered Health Monitoring Strategy**: To balance resource efficiency with high reliability:
+    - **Daily Critical Audit**: High-priority assets (`[DE FACTO STANDARD]` and `[ENTERPRISE-STABLE]`) MUST be verified daily to ensure zero downtime for core ecosystem tools.
+    - **Quarterly Exhaustive Scan**: The complete 17,000+ link archive MUST undergo a full health audit every 3 months. This cycle manages massive cleanup, orphaned metadata pruning, and canonical updates.
+26. **Dynamic AI Model Discovery**: To remain at the cutting edge and ensure system stability, all agents MUST use the dynamic model discovery engine.
     - **Live Discovery**: Query the `models.list` API at runtime to identify actually available models for each key.
     - **Scoring & Ranking**: Prioritize models using the established 2026 hierarchy (Generation 3.x > 2.x > 1.x; Pro > Flash).
     - **Resilient Fallback**: Automatically transition between models and API keys upon encountering 404 (Unsupported) or 429 (Rate Limit) errors.