Files
vim-ale/GO_WEBI.md
AJ ONeal bdf7ad4a56 docs: update GO_WEBI.md with current progress and design decisions
Reflect completed work (all fetchers, rawcache, classify, platlatest,
CompatArches), update repo layout to match actual packages, document
the fallback/compatibility design (classifier is 80/20 default,
per-installer config is the authority), add open questions for CPU
micro-arch detection and installer config format.
2026-03-09 22:07:26 -06:00

16 KiB
Raw Blame History

Go Webi — Rewrite Plan

This is the planning and tracking document for rewriting the Webi server in Go. This is not a straight port — we're redesigning internals while preserving the public API surface.

Guiding Principles

  1. Incremental migration. Rewrites fail when they try to replace everything at once. We integrate piece by piece, endpoint by endpoint, into the live system.
  2. Library over framework. The Go code should be composable pieces the caller controls — not a framework that calls your code.
  3. stdlib + pgx, nothing else. No third-party SDKs. Dependencies: stdlib, golang.org/x, github.com/jackc/pgx, github.com/therootcompany/golib.
  4. Resilient by default. The HTTP client, caching, and storage layers are built for failure — timeouts, retries, circuit breaking, graceful fallback.
  5. Simpler classification. Standard toolchains (goreleaser, cargo-dist, etc.) produce predictable filenames. Match those patterns directly; push esoteric naming into release-fetcher tagging/filtering rather than classifier heuristics.

Repository Layout

cmd/
  webid/              # main HTTP server
  webicached/         # release cache daemon (fetches + stores releases)
internal/
  buildmeta/          # OS, arch, libc, format constants and enums + CompatArches
  classify/           # build artifact classification (filename/URL → target)
  httpclient/         # resilient net/http client with best-practice defaults
  lexver/             # lexicographic version parsing and sorting
  platlatest/         # per-platform latest version index (triplet → version)
  rawcache/           # double-buffered raw upstream API response storage
  releases/           # release fetching — one package per source type
    github/           #   GitHub (thin wrapper over githubish)
    githubish/        #   generic GitHub-compatible API with Link header pagination
    githubsrc/        #   GitHub source archives (tarball/zipball URLs)
    gitea/            #   Gitea/Forgejo (own types, limit param, Link header)
    giteasrc/         #   Gitea source archives
    gitlab/           #   GitLab (own types, X-Total-Pages pagination)
    gitlabsrc/        #   GitLab source archives
    gittag/           #   bare git clone + tag listing
    node/             #   Node.js (official + unofficial builds)
    nodedist/         #   generic Node.js-style dist/index.json API
  render/             # installer script template rendering
  storage/            # release storage interface + implementations
    storage.go        # interface definition
    fsstore/          # filesystem (JSON cache, like current _cache/)
    pgstore/          # PostgreSQL (via sqlc + pgx)
  uadetect/           # User-Agent → OS/arch/libc detection (regex-based)

Public API Surface (Must Remain Stable)

These are the endpoints that clients depend on. The URLs, query parameters, and response formats must not change.

Bootstrap (curl-pipe entry point)

GET /{package}            # User-Agent dispatch:
GET /{package}@{version}  #   curl/wget/POSIX → bash bootstrap script
                          #   PowerShell      → ps1 bootstrap script
                          #   Browser         → HTML cheat sheet (separate app)

Installer Scripts

GET /api/installers/{package}.sh                # POSIX installer
GET /api/installers/{package}@{version}.sh
GET /api/installers/{package}.ps1               # PowerShell installer
GET /api/installers/{package}@{version}.ps1

Query: ?formats=tar,zip,xz,git,dmg,pkg
       &libc=msvc  (ps1 only)

Release Metadata

GET /api/releases/{package}.json
GET /api/releases/{package}@{version}.json
GET /api/releases/{package}.tab
GET /api/releases/{package}@{version}.tab

Query: ?os=linux&arch=amd64&libc=musl
       &channel=stable&limit=10&formats=tar,xz
       &pretty=true

Package Assets

GET /packages/{package}/README.md
GET /packages/{package}/{filename}

Debug

GET /api/debug            # returns detected OS/arch from User-Agent
Query: ?os=...&arch=...   # overrides

Response Formats

JSON{ oses, arches, libcs, formats, releases: [{ version, date, os, arch, libc, ext, download, channel, lts, name }] }

TSV (.tab)version \t lts \t channel \t date \t os \t arch \t ext \t - \t download \t name \t comment

Architecture

Two Servers

  • webid — the HTTP API server. Renders templates and serves responses. On each request, looks up releases by package name in storage (filesystem and/or Postgres, configurable). No package registry — if releases exist in storage for that name, it's a valid package. No restart needed when packages are added.
  • webicached — the cache daemon. Built with its package set compiled in. Periodically fetches releases from upstream sources, classifies builds, and writes to both Postgres and the filesystem. Adding a new package means rebuilding and redeploying webicached.

Adding a new installer requires rebuilding webicached, but not webid. The API server discovers packages from storage — when the new webicached writes a package's releases to Postgres or the filesystem, webid sees it on the next read. No restart, no config reload.

This means webid never blocks on upstream API calls. It serves from whatever is in storage — always fast, always available.

Double-Buffer Storage

The storage layer uses a double-buffer strategy so that a full release-history rewrite never disrupts active downloads:

Slot A: [current — being read by webid]
Slot B: [next — being written by webicached]

On completion: atomic swap A ↔ B

For fsstore: two directories per package, swap via atomic rename. For pgstore: two sets of rows per package (keyed by generation), swap via updating an active-generation pointer in a single transaction.

Storage Interface

type Store interface {
    // Read path (used by webid)
    GetPackageMeta(ctx context.Context, name string) (*PackageMeta, error)
    GetReleases(ctx context.Context, name string, filter ReleaseFilter) ([]Release, error)

    // Write path (used by webicached)
    BeginRefresh(ctx context.Context, name string) (RefreshTx, error)
}

type RefreshTx interface {
    PutReleases(ctx context.Context, releases []Release) error
    Commit(ctx context.Context) error   // atomic swap
    Rollback(ctx context.Context) error
}

Resilient HTTP Client (internal/httpclient)

A net/http client with best-practice defaults, used as the base for all upstream API calls:

  • Timeouts: connect, TLS handshake, response header, overall request
  • Connection pooling: sensible MaxIdleConns, IdleConnTimeout
  • TLS: MinVersion: tls.VersionTLS12, system cert pool
  • Redirects: limited redirect depth, no cross-scheme downgrades
  • User-Agent: identifies as Webi with contact info
  • Retries: exponential backoff with jitter for transient errors (429, 502, 503, 504), respects Retry-After headers
  • Context: all calls take context.Context for cancellation
  • No global state: created as instances, not http.DefaultClient

Release Fetchers (internal/releases/)

Each upstream source (GitHub, Gitea, git-tag) is a small package that uses httpclient and returns a common []Release slice. No SDK dependencies.

// internal/releases/github/github.go
func FetchReleases(ctx context.Context, client *httpclient.Client,
    owner, repo string, opts ...Option) ([]Release, error)

Build Classification (internal/classify)

The classifier is the 80/20 default — it handles the happy path where standard toolchains (goreleaser, cargo-dist, Zig, Rust) produce predictable filenames. It is not the authority; the per-installer config can override anything it detects.

  • Regex-based detection with priority ordering (x86_64 before x86, arm64 before armv7, amd64v4/v3/v2 before baseline).
  • OS-aware fixups: bare "arm" on Windows → ARM64.
  • Accepts filenames or full download URLs (signal may be in path segments).
  • Undetected fields are empty, not guessed.

Target triplet format: {os}-{arch}-{libc}.

Fallback & Compatibility

Arch and libc fallbacks are not universal rules. They vary by OS, package, and even package version:

  • OS-level arch compat (buildmeta.CompatArches): universal facts like "darwin arm64 runs x86_64 via Rosetta 2", "windows arm64 emulates x86_64". Includes macOS Universal1 (PPC+x86) and Universal2 (x86_64+ARM64).
  • Libc compat: per-package, per-version. Musl can be static (runs anywhere) or dynamically linked (needs polyfill). Windows GNU can be dependency-free or need mingw. This changes between versions of the same package.
  • Arch micro-levels: amd64v4→v3→v2→v1 fallback is universal, but a package may drop specific micro-arch builds between versions.

Per-installer config declares the package-specific rules. The resolver combines installer config + platlatest + CompatArches to pick the right binary.

Installer Rendering (internal/render)

Replaces installers.js. Reads template files, substitutes variables, injects the per-package install.sh / install.ps1.

The current template variable set (30+ env vars) is the contract with the client-side scripts. We must produce identical output for package-install.tpl.sh and package-install.tpl.ps1.

Reworking install.sh / install.ps1

Long-term, the per-package install scripts should feel like library users, not framework callbacks:

  • Current (framework): define pkg_install(), pkg_get_current_version(), etc. and the framework calls them.
  • Goal (library): source a helpers file, call functions like webi_download, webi_extract, webi_link explicitly from a linear script.

This is a separate migration from the Go rewrite — it changes the client-side contract. Plan it but don't block the server rewrite on it.

Migration Strategy

Each phase produces something that works in production alongside the existing Node.js server.

Phase 0: Foundation

  • internal/buildmeta — shared vocabulary (OS, arch, libc, format, channel)
  • internal/buildmetaCompatArches(os, arch) — OS-level arch compat facts
  • internal/buildmeta — amd64 micro-arch levels (v1v4), universal binary types
  • internal/lexver — version strings → comparable strings
  • internal/httpclient — resilient HTTP client for upstream API calls
  • internal/uadetect — User-Agent → OS/arch/libc (regex-based)
  • Go module init (go 1.26.1, stdlib only)
  • CI setup
  • CPU micro-arch detection in bootstrap scripts (POSIX + PowerShell)

Phase 1: Release Fetching & Caching

  • internal/releases/githubish — generic GitHub-compatible API fetcher
  • internal/releases/github — GitHub releases (thin wrapper)
  • internal/releases/githubsrc — GitHub source archives
  • internal/releases/gitea — Gitea/Forgejo releases (own types)
  • internal/releases/giteasrc — Gitea source archives
  • internal/releases/gitlab — GitLab releases (own types, X-Total-Pages)
  • internal/releases/gitlabsrc — GitLab source archives
  • internal/releases/gittag — git tag listing (bare clone)
  • internal/releases/nodedist — Node.js-style dist/index.json API
  • internal/releases/node — Node.js (official + unofficial builds)
  • internal/rawcache — double-buffered raw upstream response storage
  • internal/classify — build artifact classifier (80/20, filename→target)
  • internal/platlatest — per-platform latest version index
  • End-to-end: fetch complete histories for a few real packages
  • Per-installer config format (fallback rules, version-ranged overrides)
  • Resolver (platlatest + installer config + CompatArches → pick binary)
  • internal/storage — interface definition
  • internal/storage/fsstore — filesystem implementation
  • cmd/webicached — cache daemon that can replace the Node.js caching

Integration point: webicached writes the same _cache/ JSON format. The Node.js server can read from it. Zero-risk cutover for release fetching.

Phase 2: Release API

  • cmd/webid — HTTP server skeleton with middleware
  • GET /api/releases/{package}.json endpoint
  • GET /api/releases/{package}.tab endpoint
  • GET /api/debug endpoint

Integration point: reverse proxy specific /api/releases/ paths to the Go server. Node.js handles everything else.

Phase 3: Installer Rendering

  • internal/render — template engine
  • GET /api/installers/{package}.sh endpoint
  • GET /api/installers/{package}.ps1 endpoint
  • Bootstrap endpoint (GET /{package})

Integration point: reverse proxy installer paths to Go. Node.js only serves the website/cheat sheets (if it ever did — that may be a separate app).

Phase 4: PostgreSQL Storage

  • internal/storage/pgstore — sqlc-generated queries, double-buffer
  • Schema design and migrations
  • webicached writes to Postgres
  • webid reads from Postgres

Phase 5: Client-Side Rework

  • Design new library-style install.sh helpers
  • Migrate existing packages one at a time
  • Update package-install.tpl.sh to support both old and new styles

Key Design Decisions

Version: Go 1.26+

Using http.ServeMux with PathValue for routing (available since Go 1.22). Middleware via github.com/therootcompany/golib/http/middleware/v2.

No ORM

PostgreSQL access via pgx + sqlc. Queries are hand-written SQL, type-safe Go code is generated.

Template Rendering

Use text/template or simple string replacement (matching current behavior). The templates are shell scripts — they need literal $ and {} — so text/template may be the wrong tool. Likely better to stick with the current regex-replacement approach, ported to Go.

Error Handling

The current system returns a synthetic "error release" (version: 0.0.0, channel: error) when no match is found, rather than an HTTP error. This behavior must be preserved for backward compatibility.

Open Questions

  • Should webicached shell out to node releases.js during migration, or do we rewrite every releases.js as Go config/code from the start? (Shelling out preserves hot-add compatibility during the transition — a new releases.js just works without any Go changes.)
  • What's the deployment topology? Single binary serving both roles? Separate processes? Kubernetes pods?
  • Rate limiting for GitHub API calls in webicached — how to coordinate across multiple instances?
  • Per-installer config format: what structure best expresses version-ranged libc overrides, arch fallback overrides, and nonstandard asset naming? Go struct + TOML/YAML? Go code (compiled into webicached)?
  • CPU micro-arch detection: how should POSIX and PowerShell bootstrap scripts detect amd64v1/v2/v3/v4? Check /proc/cpuinfo flags (Linux), sysctl hw.optional (macOS), .NET intrinsics (Windows)?

Current Node.js Architecture (Reference)

For context, the current system's key files:

File Role
_webi/serve-installer.js Main request handler — dispatches to builds + rendering
_webi/builds.js Thin wrapper around builds-cacher
_webi/builds-cacher.js Release fetching, caching, classification, version matching
_webi/transform-releases.js Legacy release API (filter + cache + serve)
_webi/normalize.js OS/arch/libc/ext regex detection from filenames
_webi/installers.js Template rendering (bash + powershell)
_webi/ua-detect.js User-Agent → OS/arch/libc
_webi/projects.js Package metadata from README frontmatter
_webi/frontmarker.js YAML frontmatter parser
_common/github.js GitHub releases fetcher
_common/gitea.js Gitea releases fetcher
_common/git-tag.js Git tag listing
{pkg}/releases.js Per-package release config (fetcher + filters + transforms)
{pkg}/install.sh Per-package POSIX installer
{pkg}/install.ps1 Per-package PowerShell installer