docs: finalize performance fix documentation and maturity audit mandates

This commit is contained in:
Nubenetes Bot
2026-05-16 23:06:10 +02:00
parent a1e95bb210
commit 1608840ee4
2 changed files with 14 additions and 3 deletions

View File

@@ -169,7 +169,9 @@ The bot must rotate between profiles to avoid detection:
- **Impact-Driven Synthesis**: Shifted V2 mission from pure "chronological clarity" to "impact-driven synthesis", prioritizing Stars/Impact over dates while maintaining chronological data.
- **Relevance-First Sorting**: Updated V2 logic to prioritize Stars/Impact over dates within dimension categories.
- **Unified Metadata Engine**: Integrated V2's year extraction and professional description logic into the main V1 curation workflow (`src/agentic_curator.py`).
* **Advanced MVQ Cleaning**: Upgraded the `IntelligentLinkCleaner` to use V2's MVQ logic (GitHub activity checks) and unbuffered real-time logging.
- **Advanced MVQ Cleaning**: Upgraded the `IntelligentLinkCleaner` to use V2's MVQ logic (GitHub activity checks) and unbuffered real-time logging.
- **Smart Batching (Performance Fix)**: AI enrichment MUST exclusively use batch processing (e.g., 10 links per prompt). Individual AI calls within large loops are strictly forbidden to prevent 429 rate limit deadlocks and workflow hangs.
- **Maturity Audit Transparency**: All curation workflows MUST maintain the **Maturity Audit Log** (`v2-docs/audit-log.md`) to document technical promotions, reclassifications, and AI-driven curation decisions.
* **AI Observability & Transparency (May 2026)**:
- **Session Tracking**: Every AI call MUST be tracked via `SESSION_TRACKER` to record model usage and key health.
- **Infrastructure Reporting**: All curation PRs MUST include the `Intelligence Report` to provide transparency on models used (Pro vs Flash) and API key identities (Identity A/B).

View File

@@ -82,15 +82,18 @@ Nubenetes is one of the most comprehensive archives in the ecosystem, featuring
### 2.1. The "Heart" of Nubenetes (Stats as of 2026-05-16)
<!-- HEART_STATS_START -->
| Metric | Value |
| :--- | :--- |
| **Total Technical Resources (Links)** | **17109+** |
| **Specialized MD Pages** | **161** |
| **Total Commits** | **4143+** |
| **Primary AI Engine** | **Google Gemini (Agentic)** |
<!-- HEART_STATS_END -->
### 2.2. Top Categories by Density
<!-- TOP_CATEGORIES_START -->
| Category (Markdown Page) | Total Links |
| :--- | :---: |
| [Kubernetes](docs/kubernetes.md) | 1147 |
@@ -103,12 +106,14 @@ Nubenetes is one of the most comprehensive archives in the ecosystem, featuring
| [Devsecops](docs/devsecops.md) | 407 |
| [Managed Kubernetes In Public Cloud](docs/managed-kubernetes-in-public-cloud.md) | 379 |
| [Monitoring](docs/monitoring.md) | 346 |
<!-- TOP_CATEGORIES_END -->
### 2.3. Historical Growth (Commits and References)
The growth of Nubenetes reflects the acceleration of the Cloud Native ecosystem. Since 2026, the adoption of Agentic AI has resulted in a vertical surge in both commit frequency and link discovery.
#### Annual Growth Summary
<!-- ANNUAL_GROWTH_START -->
| Year | Commits | Est. New Refs | Key Milestone |
| :---: | :---: | :---: | :--- |
| 2018 | 350 | 1,445 | **Munich Era (BMW IT-Zentrum)** |
@@ -120,12 +125,15 @@ The growth of Nubenetes reflects the acceleration of the Cloud Native ecosystem.
| 2024 | 53 | 218 | Curation Strategy Pivot |
| 2025 | 5 | 20 | Stability & Research Phase |
| 2026 | 584 | 2,411 | **Agentic AI Surge** (May 2026 Inception) |
<!-- ANNUAL_GROWTH_END -->
#### 2026: The Agentic Monthly Surge
<!-- MONTHLY_SURGE_START -->
| Month | Commits | Est. New Refs | Status |
| :--- | :---: | :---: | :--- |
| 2026-04 | 25 | 103 | Active Curation |
| 2026-05 | 559 | 2,308 | **Agentic Inception (Gemini Era)** |
| 2026-05 | 557 | 2,300 | **Agentic Inception (Gemini Era)** |
<!-- MONTHLY_SURGE_END -->
### 2.4. Content Distribution and Semantic Clustering
@@ -280,6 +288,7 @@ To maximize economic efficiency, all AI agents follow a **Database-First** appro
To maintain a high-performance "Single Source of Truth", Nubenetes implements automated hygiene protocols:
- **Auto-Redirect Fix (Canonical Updates)**: During health checks, if a permanent redirection (301/302) is detected, the engine automatically updates the Markdown files with the final **Canonical URL**. This reduces latency and prevents future link rot.
- **Database Garbage Collection (GC)**: A bi-monthly pruning process identifies orphaned metadata in `data/inventory.yaml` for links that have been removed from the repository, keeping the database lean and professional.
- **Maturity Audit Log**: Every evaluation cycle tracks promotions and reclassifications in a public **Audit Log** (`v2-docs/audit-log.md`). This provides transparency on why resources are moved between tiers (e.g., from Emerging to De Facto Standard).
- **Exhaustive Initialization (Cold-Start)**: The system supports a `FORCE_FULL_CHECK` mechanism. When activated (via the **Force full re-validation** button in GitHub Actions), the engine bypasses all local caches and re-verifies the entire 17,000+ link archive. This is used to build the initial database from scratch or perform massive architectural refreshes.
### 6.4. Multi-Format Synchronization Logic
@@ -294,7 +303,7 @@ To eliminate configuration overhead and ensure Nubenetes always utilizes the fro
2. **Autonomous Scoring and Ranking**: Models are automatically ranked using a **dynamic regex-based algorithm** that extracts version numbers (e.g., 2.0, 3.1, 4.0). Higher versions are prioritized, ensuring zero-config auto-adoption of future frontier models. Tier bonuses are applied (Ultra > Pro > Flash) to prioritize reasoning depth.
3. **Adaptive Rate Limiting (Exponential Backoff)**: When encountering `429 Too Many Requests` errors, the engine implements an **Exponential Backoff with Jitter** strategy. Instead of immediate rotation, it applies a mandatory wait time that increases with consecutive failures, preventing infinite loops and respecting Google's quota resets.
4. **Concurrency Guard (Semaphore)**: To prevent saturating API quotas during high-volume operations (like V2 inventory enrichment), the system utilizes an **Asyncio Semaphore**. This restricts the number of concurrent AI calls (e.g., max 5), ensuring a steady, reliable flow that stays within RPM (Requests Per Minute) limits.
5. **Smart AI Batching (90% Traffic Reduction)**: Instead of processing one link per call, the system groups up to **10 resources into a single AI prompt**. This strategic packaging reduces total API calls by 90%, drastically lowering the risk of `429` errors while optimizing token density for Identity A.
5. **Smart AI Batching (High-Speed Processing)**: Instead of processing one link per call, the system groups up to **10 resources into a single AI prompt**. This strategic packaging reduces total API calls by 90%, eliminating `429` rate limit deadlocks and ensuring high-velocity throughput even for cold-starts.
6. **Pre-Flight Local Caching**: The engine performs an autonomous look-up in `data/inventory.yaml` before any AI operation. If a resource is already indexed and described, it is skipped in the enrichment phase. This makes the marginal cost of repository maintenance near-zero.
### 6.6. AI Intelligence and Observability (Transparency)