21 KiB
Nubenetes Intelligent Curation: Meta-Instructions & Learning Roadmap
This file contains the accumulated instructions and long-term vision for the autonomous maintenance of Nubenetes.com. AI agents must consult this document in every iteration to ensure learning continuity.
🧠 Core Mandates
- Information Preservation: NEVER delete summaries, comments, or stars (🌟) accompanying links. The bot should only update the URL or reorganize the item's position, never delete the descriptive context.
- Persistent Learning: Use
src/memory/health_learning.jsonto store knowledge about domains (anti-bot blocks, successful strategies) and navigation patterns. - Minimum Viable Quality (MVQ): For GitHub/GitLab repositories, the bot MUST check the last commit date. If the repository has had NO activity (commits) in more than 4 years, it must receive a significantly lower
impact_scoreand be deprioritized, even if the content remains technically relevant. This ensures Nubenetes stays fresh and focuses on maintained projects. - Style Guide (Descriptive Summaries): All injected summaries MUST follow a Descriptive style. Avoid generic "clickbait" or action-oriented phrases (e.g., "Check this out"). Instead, provide a clear, neutral description of what the resource contains, its scope, and why it is technically significant for the Kubernetes ecosystem.
- Semantic Interlinking: The bot should identify related categories for each resource. While the full entry is injected into the primary category, a short reference ("See also: Title in [Category]") should be added to up to two related categories to improve site navigation.
- Visual Health Dashboard: Every curation run MUST generate a local
report.html(outside the repo) for visual validation of metrics, quality (MVQ), and AI decisions. - Total Resilience: The workflow must be able to continue even if there are individual errors in link or file validations. Prioritize generating a result (PR) even if it is partial.
- Repository Consolidation: In case of a failure in a deep GitHub/GitLab link, always try to validate the repository root before considering it dead. We prefer stable links to repository roots over volatile deep-links.
- URL Expansion: All shortened links (t.co, bit.ly, buff.ly, etc.) MUST be expanded to their original long version before being evaluated or injected. This ensures inventory homogeneity and improves global deduplication precision.
- Official Language (English Only): All injected content (titles, descriptions, headers), execution logs, and automated communications (PRs) MUST be exclusively in ENGLISH. Nubenetes is a global resource and linguistic consistency is critical.
- Workflow-Config Synchronization: The GitHub Actions curation workflow form (
agentic_cron.yml) MUST remain perfectly synchronized with the curation sources configuration file (data/curation_sources.yaml). Any addition, removal, or renaming of topics/categories in the configuration file requires a corresponding update to the workflow's input fields (checkboxes) to ensure users can toggle those sources manually. This maintains consistency between data-driven sources and the UI trigger. - V2 Elite Maintenance: The Nubenetes V2 (Agentic Elite) edition is a derived view of the V1 archive. It is managed via the
src/v2_optimizer.pyscript and stored in thev2-docs/directory. Theagentic_v2_builder.ymlworkflow synchronizes V2 automatically whenever V1 (docs/) is updated (manually or via bot). Standard curation and cleaning workflows must always target thedocs/directory as the primary source of truth. - Detailed Logging for V2: When running the V2 Optimizer, agents MUST use unbuffered logging and detailed output messages. If the optimizer returns '0 links kept', the agent MUST investigate the logs to determine if it was due to AI selection or a parsing/API error.
- Persistent V2 Caching: The V2 Optimizer MUST use a persistent cache file (
data/v2_cache.json) to store AI evaluations (year, quality, category). This is mandatory to minimize API costs and ensure execution speed across 15k+ links. - GitHub Metadata Enrichment: For all
github.comresources, the bot MUST attempt to fetch real-time metadata (stars, last commit) using the GitHub API. This data must be included in the V2 rendering to provide current context. - Resilient Link Health & Global Cleaning:
- Health Checks: Every V2 generation and global cleaning cycle MUST perform asynchronous health checks using identity rotation (User-Agents) and multiple attempts (3x).
- V1 Exhaustiveness: The
IntelligentLinkCheckeroperating on V1 MUST preserve all technically valid links regardless of their age. Deletion is strictly reserved for definitively invalid links (404s, dead redirects, etc.). - V2 Elite Selection (MVQ): The
V2VisionEngineMUST continue to apply the Minimum Viable Quality (MVQ) logic. GitHub repositories inactive for >4 years with low impact (stars < 30) are deprioritized or excluded ONLY from the V2 Elite edition to ensure freshness. - Foundational Protection: GitHub and 'Foundational' resources are exempt from automatic removal based on health, but may be flagged for review.
- Consolidation: If a deep link fails but the repository root is alive, the bot MUST consolidate the reference to the root.
- Unified Curation Chronology: All curation workflows (V1 and V2) MUST utilize the same chronological and descriptive engine.
- Extraction: Every new link MUST attempt to extract a publication year (URL, metadata, or AI inference).
- Formatting: New links MUST follow the format
- **(YYYY)** [Title](URL) 🌟 - Description. If year is 'N/A', the prefix is omitted. - Elite Descriptions: AI-generated descriptions MUST be professional, neutral, and focus on the technical value for a 2026 Cloud Architect.
- Automated Branch Hygiene: To keep the repository clean and efficient, an automated cleanup MUST run every 15 days (1st and 15th) to delete remote branches already merged into
develop. The branchesmaster,develop, andgh-pagesare strictly protected and MUST NEVER be deleted. - V1/V2 Asset Integrity & Rendering:
- Source of Truth: V1 (
docs/) is the absolute source of truth for assets. V2 portal (v2-docs/) MUST NOT duplicate folders; it uses symlinks or relative paths. - Rendering Fix (HTML in MD): All
<center>tags MUST be defined as<center markdown="1">and followed by a mandatory blank line before and after the content. This ensures MkDocs processes the Markdown within the HTML block. - Flat Asset Routing: To avoid depth-related path breakage, both V1 (
mkdocs.yml) and V2 (v2-mkdocs.yml) MUST haveuse_directory_urls: false. This ensures relative paths (e.g.,images/img.png) resolve correctly regardless of the page depth.
- Source of Truth: V1 (
- V2 Navigation Design: The V2 top navigation bar MUST maintain a flat structure. All dimensions and categories must be top-level tabs in
v2-mkdocs.ymlto ensure direct discoverability and avoid nested groupings like "Categories". - V2 Impact-Driven Sorting: The V2 portal MUST prioritize relevance (Impact) over dates within sections to provide high-density technical value. Sorting MUST follow: 1. Stars/Relevance (DESC), 2. Year (DESC). The mission statement and descriptions MUST reflect this impact-driven synthesis.
- Unified Metadata Database (Local Storage & Persistence): All link metadata MUST be managed via the local YAML database in
data/.inventory.yaml: The primary source of truth for years, stars (0-5), and descriptions.structure_map.yaml: Tracks link locations and visual formatting (bold/highlight) across V1 and V2.- Persistence (MANDATORY): Every AI agent and workflow MUST include these YAML files in their Pull Requests if any change is detected. Discarding the database during a workflow run is a CRITICAL FAILURE. All workflows must load the DB, update it, and INJECT the modified YAML files into the final PR payload.
- Exhaustive Initialization: The system supports a
FORCE_FULL_CHECKenvironment variable to bypass all caches (e.g., 21-day health cache) and force a full re-validation and re-enrichment of the entire 17k+ link archive. - No Trusted Bypassing: All domains, including high-trust ones (GitHub, Google, AWS), MUST be verified for link validity. Trusted status only grants a lower priority for aggressive scraper rotation, not a bypass for existence checks.
- Manual Priority: AI agents MUST NOT overwrite existing manual descriptions in the V1 archive files. Enrichment is strictly for
inventory.yamland the V2 portal.
- Canonical URL Normalization: To prevent duplication and fragmented metadata, all agents MUST normalize URLs before any inventory operation.
- Tracking Stripping: Systematically remove UTM parameters, social media trackers (X.com, LinkedIn), and URL fragments (
#). - Protocol Uniformity: Standardize on
https://whenever possible. - Merge Logic: Metadata from multiple sources for the same canonical URL MUST be merged, prioritizing the highest star rating and most recent date.
- Tracking Stripping: Systematically remove UTM parameters, social media trackers (X.com, LinkedIn), and URL fragments (
- Dynamic AI Model Discovery: To remain at the cutting edge and ensure system stability, all agents MUST use the dynamic model discovery engine.
- Live Discovery: Query the
models.listAPI at runtime to identify actually available models for each key. - Scoring & Ranking: Prioritize models using the established 2026 hierarchy (Generation 3.x > 2.x > 1.x; Pro > Flash).
- Resilient Fallback: Automatically transition between models and API keys upon encountering 404 (Unsupported) or 429 (Rate Limit) errors.
- Live Discovery: Query the
🛠️ Structural Evolution & Navigation
...
- No Link Limits: There are NO hard limits on the number of links per page or per section (##/###). Nubenetes is built to host thousands of references.
- TOC Consistency: Every
.mdpage (including the main indexdocs/index.md) MUST maintain an internal Table of Contents (TOC) at the beginning. This TOC must include all sections (##) and subsections (###) nested correctly using a numbered list format with working anchors. - Relative References & Anchors:
- Internal: Use simplified lowercase slugs for anchors (remove special characters, replace spaces with hyphens).
- External/Cross-page: Ensure references between different
.mdfiles are correct and up-to-date.
- Main Index Maintenance (
docs/index.md):docs/index.mdis the landing page for nubenetes.com and the primary entry point. It MUST be updated whenever a new page is added or a major category is renamed.- Top Links Preservation: The "Motivation" section in
docs/index.mdcontains highly relevant links. These MUST be preserved even if they are duplicated in other thematic pages. The AI should prioritize keeping this index curated and high-signal.
- Intelligent Internal Reorganization:
- No File Splitting: Do NOT create new
.mdfiles unless strictly necessary for a major new theme. Prefer creating new sub-sections (## or ###) within existing files to maintain order. - Semantic Polish: When a section becomes excessively flat, the AI should propose and implement a reorganization into logical sub-sections purely to improve readability and classification, without restricting the volume of content.
- No File Splitting: Do NOT create new
- Navigation Integrity: Every structural change must be reflected in:
mkdocs.yml(Navigation menu).v2-mkdocs.yml(V2 Navigation menu).docs/index.md(Main Table of Contents).- The internal TOC of the modified page.
- Orphan Curation: Periodically audit the
docs/folder to find unlinked files and integrate them into the navigation based on their topic.
📊 Mermaid Diagram Best Practices
To ensure robust rendering across GitHub, VSCode, and MkDocs, follow these standards when creating or modifying Mermaid diagrams:
- Node Label Quoting: ALWAYS wrap node labels in double quotes (e.g.,
A["Label Text"]) if they contain spaces, special characters (parentheses, brackets, dots), or reserved words. This prevents parse errors in more restrictive environments. - Explicit Direction: Use
graph TD(Top-Down) for deep hierarchies andgraph LR(Left-to-Right) for flat process flows to optimize readability and prevent horizontal clipping. - Label Length: Keep labels concise (under 25 characters). If a longer description is needed, use a tooltip or sub-text.
- Syntax Validation: Before committing, verify the syntax using a Mermaid previewer. Common pitfalls include:
- Unescaped brackets
[or]inside labels. - Missing semicolons or newlines between node definitions.
- Recursive loops without proper termination.
- Unescaped brackets
- Integration with MkDocs: Ensure
pymdownx.superfencesis configured inmkdocs.ymlto support Mermaid blocks within Markdown.
🛡️ Repository Policies & Branch Protection
To maintain the integrity of the archive and ensure the AI agents operate correctly:
- Branch Hierarchy:
master: Read-only for contributors/bots. Restricted to repository owner only.develop: The only valid target for Pull Requests.
- Pull Request Policy:
- AI agents MUST always target
develop. - Manual contributions (human PRs) targeting
mastermust be automatically or manually redirected todevelop.
- AI agents MUST always target
- Owner-Only Merges: Only the repository owner has the authority to merge
developintomasterafter verifying the visual health dashboard and metrics.
📝 README Synchronization & Maintenance Protocols
The README.md is the primary entry point for Nubenetes and must accurately reflect the state of both the V1 (Exhaustive) and V2 (Elite) editions. AI agents and contributors MUST follow these protocols:
1. Mandatory Updates on develop Branch
Before any Pull Request is merged from develop to master, the README.md must be audited and updated to reflect the latest changes. This is critical for maintaining the "Source of Truth" status.
2. Metric Recalculation
Whenever a significant curation cycle (automatic or manual) is completed:
- Link Counts: Update the "Heart of Nubenetes" table with the current total link count and specialized page count.
- Top Categories: Recalculate the density of the top 10 categories.
- Historical Growth: Add/update the monthly surge rows in the "2026: The Agentic Monthly Surge" table.
- Reference Estimates: Use the established ratio (~4.13 links/commit) to estimate new reference growth if exact numbers aren't extracted by the bot.
3. Visual & Diagram Sync
- Mermaid Charts: If new top-level categories are created or existing ones grow significantly, update the "Major Ecosystem Pillars" and "Specialized Sub-ecosystems" pie charts.
- Architecture Flow: If the Agentic Stack or the deployment lifecycle changes (e.g., new workflows, different dependencies), the corresponding Mermaid diagrams MUST be updated immediately.
- Robustness: Follow the "Mermaid Diagram Best Practices" (node quoting, explicit direction) as defined in this document.
4. V1 vs V2 Alignment
- Ensure any changes to the
V2VisionEngineor the elite selection criteria are reflected in the "Dual-Edition Architecture" section. - Update the "Comparison Matrix" if the technical differences between V1 and V2 evolve.
5. Automation vs Manual Intervention
- Automated Updates: The Agentic Bot should ideally include a step to refresh these metrics in its curation PRs.
- Manual Fallback: If a manual update is performed (emergency fixes, structural changes), the human/AI agent is responsible for manually running the metric extraction scripts and updating the
README.mdaccordingly. - Algorithm-README Sync: Whenever the AI curation logic, model tiering, or the extraction algorithm is modified (e.g.,
src/gemini_utils.pyorsrc/v2_optimizer.py), theREADME.mdMUST be updated to reflect these technical changes in the "Agentic Stack" and "Architectural Shift" sections. - Hierarchical README Maintenance: Whenever
README.mdis modified, the Table of Contents (TOC) MUST be updated to reflect all changes in sections (H2) and subsections (H3). All titles in the document MUST include hierarchical numbering (e.g., "1. Section", "1.1. Subsection") perfectly synchronized with the TOC. - Robust Title Standards: Emojis and ampersands (&) MUST NOT be used in any section (H2) or subsection (H3) titles within
README.mdor the Table of Contents. Ampersands should be replaced with "and". This ensures maximum compatibility with Markdown anchor generation and prevents broken navigation links.
🚀 Block Evasion Strategies
The bot must rotate between profiles to avoid detection:
- Desktop/Google: Standard desktop request.
- Mobile/Twitter: Mobile request with Twitter Referer (high success rate).
- Playwright/LinkedIn: Real navigation with JS enabled.
- Firefox/Reddit: Alternative desktop profile.
📈 Learning Diary (Improvement History)
- May 2026: Initial implementation of the autonomous engine with Playwright and GitHub API.
- May 2026: Added Multidimensional Evasion system (5 attempts, profile rotation).
- May 2026: Creation of
AgenticCuratorfor navigation audit and repository consolidation. - May 2026: Generation of PRs with visual analytics (Mermaid) and Health Matrix.
- May 2026: Implementation of Backup-based Curation (JSON/MD) to avoid X.com blocks.
- May 2026: Implementation of multi-source curation and category-based filtering in GitHub Workflow.
- May 2026: Introduction of Nubenetes V2 (Agentic Elite) architecture. Implemented persistent
v2-docs/storage, thev2_optimizer.pyengine for 2026 standard filtering, and a dual-deployment pipeline to host both V1 (Exhaustive) and V2 (Elite) versions in parallel. - May 2026: V1 Restoration & V2 Optimization:
- V1 Integrity Restored: Recovered all V1 files in
docs/to ensure original descriptive content and images are preserved. - V2 Navigation Fixed: Converted V2 top bar to a flat structure for better UX and link stability.
- Relative Asset Routing: Updated all V2 image and configuration paths to point relatively to
../docs/to avoid asset duplication. - Rendering & Path Resolution: Implemented
<center markdown="1">anduse_directory_urls: falseacross V1 and V2 to resolve persistent image path breakage and ensure proper Markdown rendering within HTML tags. - Optimizer Alignment: Hardened
src/v2_optimizer.pyto enforce these architectural rules (flat navigation, relative paths, and resilient V1 content extraction). - Incremental Elite Engine: Implemented a sophisticated V2 sync strategy using
data/v2_cache.json.- Automatic Detection: The
agentic_v2_builder.ymlworkflow now triggers automatically wheneverdocs/changes or after a curation run. - Cost Efficiency: Only NEW links from V1 are sent to Gemini. Existing links use cached AI evaluations but are locally "upgraded" with real-time GitHub metadata (stars/dates) and dynamic maturity tagging.
- Maturity Taxonomy: Replaced generic labels with a professional 5-tier system (
[DE FACTO STANDARD],[ENTERPRISE-STABLE],[EMERGING],[LEGACY],[GUIDE]) explained in the V2 Index. - Mandatory Descriptions: Every resource in V2 MUST have a description. If the V1 source is missing one, the Optimizer uses Gemini to generate a professional 1-2 sentence summary and caches it.
- Manual Control: The workflow supports a
force_reevaluateflag for full architectural refreshes.
- Automatic Detection: The
- V1 Integrity Restored: Recovered all V1 files in
- May 2026: V2 UI Hardening & Unified Curation Engine:
- Highlighting Fixed: Enabled
pymdownx.markin V2 and implemented strategic highlighting (==text==) for top-tier/Standard resources. - Clean Chronology: Refined V1 and V2 engines to hide
(N/A)dates, providing a cleaner UI. - Impact-Driven Synthesis: Shifted V2 mission from pure "chronological clarity" to "impact-driven synthesis", prioritizing Stars/Impact over dates while maintaining chronological data.
- Relevance-First Sorting: Updated V2 logic to prioritize Stars/Impact over dates within dimension categories.
- Unified Metadata Engine: Integrated V2's year extraction and professional description logic into the main V1 curation workflow (
src/agentic_curator.py).
- Advanced MVQ Cleaning: Upgraded the
IntelligentLinkCleanerto use V2's MVQ logic (GitHub activity checks) and unbuffered real-time logging. - AI Observability & Transparency (May 2026):
- Session Tracking: Every AI call MUST be tracked via
SESSION_TRACKERto record model usage and key health. - Infrastructure Reporting: All curation PRs MUST include the
Intelligence Reportto provide transparency on models used (Pro vs Flash) and API key identities (Identity A/B). - Dynamic Discovery: Agents MUST utilize the dynamic discovery engine to automatically adopt the newest Gemini models and rotate keys upon reaching quotas.
- Session Tracking: Every AI call MUST be tracked via
- Highlighting Fixed: Enabled