- releases.conf format updated (source inferred from key) - Phase 1 checklist complete except resolver - All release fetchers listed (18 source packages) - Per-package releases packages documented - Legacy export filtering description corrected (Variants not Extra) - Resolved questions updated (rate limiting, config format, normalization) - Stale open question removed (rate limiting solved via round-robin)
23 KiB
Go Webi — Rewrite Plan
This is the planning and tracking document for rewriting the Webi server in Go. This is not a straight port — we're redesigning internals while preserving the public API surface.
Guiding Principles
- Incremental migration. Rewrites fail when they try to replace everything at once. We integrate piece by piece, endpoint by endpoint, into the live system.
- Library over framework. The Go code should be composable pieces the caller controls — not a framework that calls your code.
- stdlib + pgx, nothing else. No third-party SDKs. Dependencies: stdlib,
golang.org/x,github.com/jackc/pgx,github.com/therootcompany/golib. - Resilient by default. The HTTP client, caching, and storage layers are built for failure — timeouts, retries, circuit breaking, graceful fallback.
- Simpler classification. Standard toolchains (goreleaser, cargo-dist, etc.) produce predictable filenames. Match those patterns directly; push esoteric naming into release-fetcher tagging/filtering rather than classifier heuristics.
Repository Layout
cmd/
webid/ # main HTTP server
webicached/ # release cache daemon (fetches + stores releases)
internal/
buildmeta/ # OS, arch, libc, format constants and enums + CompatArches
classify/ # build artifact classification (filename/URL → target)
classifypkg/ # per-package pipeline: classify → tag → normalize → filter
installerconf/ # releases.conf parser (key=value, source inferred from key)
httpclient/ # resilient net/http client with best-practice defaults
lexver/ # lexicographic version parsing and sorting
platlatest/ # per-platform latest version index (triplet → version)
rawcache/ # double-buffered raw upstream API response storage
releases/ # release fetching — one package per source type
github/ # GitHub (thin wrapper over githubish)
githubish/ # generic GitHub-compatible API with Link header pagination
githubsrc/ # GitHub source archives (tarball/zipball URLs)
gitea/ # Gitea/Forgejo (own types, limit param, Link header)
giteasrc/ # Gitea source archives
gitlab/ # GitLab (own types, X-Total-Pages pagination)
gitlabsrc/ # GitLab source archives
gittag/ # bare git clone + tag listing
node/ # Node.js (official + unofficial builds)
nodedist/ # generic Node.js-style dist/index.json API
render/ # installer script template rendering
storage/ # release storage interface + implementations
storage.go # interface definition
fsstore/ # filesystem (JSON cache, like current _cache/)
pgstore/ # PostgreSQL (via sqlc + pgx)
uadetect/ # User-Agent → OS/arch/libc detection (regex-based)
Public API Surface (Must Remain Stable)
These are the endpoints that clients depend on. The URLs, query parameters, and response formats must not change.
Bootstrap (curl-pipe entry point)
GET /{package} # User-Agent dispatch:
GET /{package}@{version} # curl/wget/POSIX → bash bootstrap script
# PowerShell → ps1 bootstrap script
# Browser → HTML cheat sheet (separate app)
Installer Scripts
GET /api/installers/{package}.sh # POSIX installer
GET /api/installers/{package}@{version}.sh
GET /api/installers/{package}.ps1 # PowerShell installer
GET /api/installers/{package}@{version}.ps1
Query: ?formats=tar,zip,xz,git,dmg,pkg
&libc=msvc (ps1 only)
Release Metadata
GET /api/releases/{package}.json
GET /api/releases/{package}@{version}.json
GET /api/releases/{package}.tab
GET /api/releases/{package}@{version}.tab
Query: ?os=linux&arch=amd64&libc=musl
&channel=stable&limit=10&formats=tar,xz
&pretty=true
Package Assets
GET /packages/{package}/README.md
GET /packages/{package}/{filename}
Debug
GET /api/debug # returns detected OS/arch from User-Agent
Query: ?os=...&arch=... # overrides
Response Formats
JSON — { oses, arches, libcs, formats, releases: [{ version, date, os, arch, libc, ext, download, channel, lts, name }] }
TSV (.tab) — version \t lts \t channel \t date \t os \t arch \t ext \t - \t download \t name \t comment
Architecture
Two Servers
webid— the HTTP API server. Renders templates and serves responses. On each request, looks up releases by package name in storage (filesystem and/or Postgres, configurable). No package registry — if releases exist in storage for that name, it's a valid package. No restart needed when packages are added.webicached— the cache daemon. Built with its package set compiled in. Periodically fetches releases from upstream sources, classifies builds, and writes to both Postgres and the filesystem. Adding a new package means rebuilding and redeployingwebicached.
Adding a new installer requires rebuilding webicached, but not webid. The
API server discovers packages from storage — when the new webicached writes a
package's releases to Postgres or the filesystem, webid sees it on the next
read. No restart, no config reload.
This means webid never blocks on upstream API calls. It serves from whatever is
in storage — always fast, always available.
Double-Buffer Storage
The storage layer uses a double-buffer strategy so that a full release-history rewrite never disrupts active downloads:
Slot A: [current — being read by webid]
Slot B: [next — being written by webicached]
On completion: atomic swap A ↔ B
For fsstore: two directories per package, swap via atomic rename. For pgstore: two sets of rows per package (keyed by generation), swap via updating an active-generation pointer in a single transaction.
Storage Interface
type Store interface {
// Read path (used by webid)
GetPackageMeta(ctx context.Context, name string) (*PackageMeta, error)
GetReleases(ctx context.Context, name string, filter ReleaseFilter) ([]Release, error)
// Write path (used by webicached)
BeginRefresh(ctx context.Context, name string) (RefreshTx, error)
}
type RefreshTx interface {
PutReleases(ctx context.Context, releases []Release) error
Commit(ctx context.Context) error // atomic swap
Rollback(ctx context.Context) error
}
Resilient HTTP Client (internal/httpclient)
A net/http client with best-practice defaults, used as the base for all
upstream API calls:
- Timeouts: connect, TLS handshake, response header, overall request
- Connection pooling: sensible
MaxIdleConns,IdleConnTimeout - TLS:
MinVersion: tls.VersionTLS12, system cert pool - Redirects: limited redirect depth, no cross-scheme downgrades
- User-Agent: identifies as Webi with contact info
- Retries: exponential backoff with jitter for transient errors (429, 502,
503, 504), respects
Retry-Afterheaders - Context: all calls take
context.Contextfor cancellation - No global state: created as instances, not
http.DefaultClient
Release Fetchers (internal/releases/)
Each upstream source (GitHub, Gitea, git-tag) is a small package that uses
httpclient and returns a common []Release slice. No SDK dependencies.
// internal/releases/github/github.go
func FetchReleases(ctx context.Context, client *httpclient.Client,
owner, repo string, opts ...Option) ([]Release, error)
Build Classification (internal/classify)
The classifier is the 80/20 default — it handles the happy path where standard toolchains (goreleaser, cargo-dist, Zig, Rust) produce predictable filenames. It is not the authority; the per-installer config can override anything it detects.
- Regex-based detection with priority ordering (x86_64 before x86, arm64 before armv7, amd64v4/v3/v2 before baseline).
- OS-aware fixups: bare "arm" on Windows → ARM64.
- Accepts filenames or full download URLs (signal may be in path segments).
- Undetected fields are empty, not guessed.
Target triplet format: {os}-{arch}-{libc}.
Fallback & Compatibility
Arch and libc fallbacks are not universal rules. They vary by OS, package, and even package version:
- OS-level arch compat (
buildmeta.CompatArches): universal facts like "darwin arm64 runs x86_64 via Rosetta 2", "windows arm64 emulates x86_64". Includes macOS Universal1 (PPC+x86) and Universal2 (x86_64+ARM64). - Libc compat: per-package, per-version. Musl can be static (runs anywhere) or dynamically linked (needs polyfill). Windows GNU can be dependency-free or need mingw. This changes between versions of the same package.
- Arch micro-levels: amd64v4→v3→v2→v1 fallback is universal, but a package may drop specific micro-arch builds between versions.
Per-installer config declares the package-specific rules. The resolver combines installer config + platlatest + CompatArches to pick the right binary.
Installer Rendering (internal/render)
Replaces installers.js. Reads template files, substitutes variables, injects
the per-package install.sh / install.ps1.
The current template variable set (30+ env vars) is the contract with the
client-side scripts. We must produce identical output for package-install.tpl.sh
and package-install.tpl.ps1.
Reworking install.sh / install.ps1
Long-term, the per-package install scripts should feel like library users, not framework callbacks:
- Current (framework): define
pkg_install(),pkg_get_current_version(), etc. and the framework calls them. - Goal (library): source a helpers file, call functions like
webi_download,webi_extract,webi_linkexplicitly from a linear script.
This is a separate migration from the Go rewrite — it changes the client-side contract. Plan it but don't block the server rewrite on it.
Migration Strategy
Each phase produces something that works in production alongside the existing Node.js server.
Phase 0: Foundation
internal/buildmeta— shared vocabulary (OS, arch, libc, format, channel)internal/buildmeta—CompatArches(os, arch)— OS-level arch compat factsinternal/buildmeta— amd64 micro-arch levels (v1–v4), universal binary typesinternal/lexver— version strings → comparable stringsinternal/httpclient— resilient HTTP client for upstream API callsinternal/uadetect— User-Agent → OS/arch/libc (regex-based)- Go module init (
go 1.26.1, stdlib only) - CI setup
- CPU micro-arch detection in bootstrap scripts (POSIX + PowerShell)
Phase 1: Release Fetching & Caching
internal/releases/githubish— generic GitHub-compatible API fetcherinternal/releases/github— GitHub releases (thin wrapper)internal/releases/githubsrc— GitHub source archivesinternal/releases/gitea— Gitea/Forgejo releases (own types)internal/releases/giteasrc— Gitea source archivesinternal/releases/gitlab— GitLab releases (own types, X-Total-Pages)internal/releases/gitlabsrc— GitLab source archivesinternal/releases/gittag— git tag listing (bare clone)internal/releases/nodedist— Node.js-style dist/index.json APIinternal/releases/node— Node.js (official + unofficial builds)internal/releases/chromedist— Chrome/Chromedriver JSON endpointinternal/releases/flutterdist— Flutter SDK release indexinternal/releases/golang— Go downloads page JSONinternal/releases/gpgdist— GnuPG FTP directory scraperinternal/releases/hashicorp— HashiCorp releases APIinternal/releases/iterm2dist— iTerm2 downloads page scraperinternal/releases/juliadist— Julia versions JSON APIinternal/releases/mariadbdist— MariaDB downloads pageinternal/releases/zigdist— Zig downloads JSON- Per-package releases packages: bun, fish, git, lsd, ollama, postgres, pwsh, watchexec, xcaddy (variant tagging, version normalization, legacy data)
internal/rawcache— double-buffered raw upstream response storageinternal/classify— build artifact classifier (80/20, filename→target)internal/platlatest— per-platform latest version index- End-to-end: fetch complete histories for all 103 packages
internal/installerconf— key=value config parser (source inferred from key)internal/classifypkg— full classification pipeline: classifySource → TagVariants → NormalizeVersions → processGitTagHEAD → ApplyConfig → appendLegacy- Per-package variant taggers (pwsh, ollama, bun, node)
- Per-package version normalizers (git, lf, go, postgres, watchexec)
- Gittag HEAD handling (tagless→v{datetime}, mixed→exclude from legacy)
- Legacy releases (postgres EnterpriseDB 10.x–12.x via appendLegacy)
- Resolver (platlatest + installer config + CompatArches → pick binary)
internal/storage— interface definition (Asset, PackageData, Store, RefreshTx)internal/storage/legacy.go— LegacyAsset/LegacyCache with variant/format filteringinternal/storage/fsstore— filesystem implementation (atomic writes, alias symlinks)cmd/webicached— cache daemon (round-robin refresh, rate limiting, symlink detection)cmd/comparecache— Go vs Node.js cache comparison tool- Legacy cache generation verified for 101 packages
Integration point: webicached writes the same _cache/ JSON format. The
Node.js server can read from it. Zero-risk cutover for release fetching.
Phase 2: Release API
cmd/webid— HTTP server skeleton with middlewareGET /api/releases/{package}.jsonendpointGET /api/releases/{package}.tabendpointGET /api/debugendpoint
Integration point: reverse proxy specific /api/releases/ paths to the Go
server. Node.js handles everything else.
Phase 3: Installer Rendering
internal/render— template engineGET /api/installers/{package}.shendpointGET /api/installers/{package}.ps1endpoint- Bootstrap endpoint (
GET /{package})
Integration point: reverse proxy installer paths to Go. Node.js only serves the website/cheat sheets (if it ever did — that may be a separate app).
Phase 4: PostgreSQL Storage
internal/storage/pgstore— sqlc-generated queries, double-buffer- Schema design and migrations
webicachedwrites to Postgreswebidreads from Postgres
Phase 5: Client-Side Rework
- Design new library-style install.sh helpers
- Migrate existing packages one at a time
- Update
package-install.tpl.shto support both old and new styles
Key Design Decisions
Package Configuration (releases.conf)
Each package has a {pkg}/releases.conf — a flat key = value file parsed by
internal/installerconf. This replaces the per-package releases.js from Node.js.
The source type is inferred from the primary key:
github_repo = BurntSushi/ripgrep
git_url = https://github.com/tpope/vim-commentary.git
gitea_repo = root/pathman
base_url = https://git.rootprojects.org
hashicorp_product = terraform
One-off dist sources use an explicit source key:
source = nodedist
url = https://nodejs.org/download/release
Source types: github, gitea, gittag, hashicorp, nodedist, chromedist,
flutterdist, golang, gpgdist, iterm2dist, juliadist, mariadbdist,
zigdist.
Multi-source packages (like node, which merges official + unofficial builds)
are not yet supported by the config format. Current workaround: separate packages
(node-official, node-unofficial). Redesign needed — see Open Questions.
Unknown keys go into conf.Extra (a map[string]string). The alias_of key
marks a package as a symlink to another (e.g. alias_of = dashcore).
Asset Model and Extra Field
storage.Asset represents a single downloadable file. Key fields:
type Asset struct {
Filename string // "bat-v0.26.1-x86_64-unknown-linux-musl.tar.gz"
Version string // "v0.26.1"
OS string // "linux"
Arch string // "x86_64"
Libc string // "musl"
Format string // ".tar.gz"
Channel string // "stable"
Extra string // extra version info for sorting
Variants []string // ["rocm"], ["installer"], ["fxdependent"], etc.
Download string // full URL
...
}
The Extra field captures build variants — assets that target a specific
hardware or runtime configuration beyond OS/arch/libc:
rocm— AMD GPU compute (ollama)jetpack5,jetpack6— NVIDIA Jetson SDK (ollama)fxdependent,fxdependentWinDesktop— .NET framework-dependent (pwsh)profile— debug profiling build (bun)source— source archive, not a binaryinstaller—.exethat is a GUI installer, not the actual tool binary
The resolver deprioritizes assets with non-empty Variants — they're only
selected when the user explicitly requests that variant (e.g., ?variant=rocm).
The full API still serves them for broader use cases.
Not a variant — arch micro-levels: Bun's "baseline" is actually amd64 (v1),
and the non-baseline is amd64v3. These use the Arch field directly, and the
resolver's existing fallback chain (amd64v3 → amd64v2 → amd64) handles
selection naturally.
Format Classification
All assets are stored — nothing is dropped at classification time.
Most formats are extractable and need no special tagging — the file extension is enough signal for the install scripts:
.tar.gz,.tar.xz,.tar.zst,.zip,.7z— standard archives.pkg— macOS: extractable viapkgutil --expand-full(flat files, no install) See: https://coolaj86.com/articles/how-to-extract-pkg-files-and-payload/.deb— extractable viaar x+tar xf data.tar.*.dmg— mountable viahdiutil attach.msi— extractable viamsiexec /aorlessmsi.AppImage— self-contained, chmod +x and run
The one ambiguous format is .exe — it could be a bare binary (the actual tool)
or a GUI installer (which can't be automated). Assets where .exe is an installer
(not the tool itself) get Variants: ["installer"] so the resolver skips them
by default.
Legacy Export Filtering
During migration, fsstore writes JSON in the Node.js _cache/ format. The
Node.js server reads this directly. Two filters apply at export time
(storage.ExportLegacy):
- Build variants: Assets with non-empty
Variantsare stripped (Node.js doesn't know about rocm/jetpack/fxdependent/head/appimage) - Format: Assets with formats the Node.js server doesn't recognize are stripped (recognized: tar.gz, zip, xz, pkg, msi, exe, dmg, git, etc.)
This keeps the primary Go pipeline complete while the legacy path stays compat.
Version: Go 1.26+
Using http.ServeMux with PathValue for routing (available since Go 1.22).
Middleware via github.com/therootcompany/golib/http/middleware/v2.
No ORM
PostgreSQL access via pgx + sqlc. Queries are hand-written SQL, type-safe
Go code is generated.
Template Rendering
Use text/template or simple string replacement (matching current behavior).
The templates are shell scripts — they need literal $ and {} — so
text/template may be the wrong tool. Likely better to stick with the current
regex-replacement approach, ported to Go.
Error Handling
The current system returns a synthetic "error release" (version: 0.0.0,
channel: error) when no match is found, rather than an HTTP error. This
behavior must be preserved for backward compatibility.
Open Questions
- Multi-source config:
nodeneeds both official + unofficial URLs. Current releases.conf only supports onesource. Node's classifier handles this via a special case (unofficial_urlin Extra). Needs a cleaner design. - What's the deployment topology? Single binary serving both roles? Separate processes? Kubernetes pods?
- Per-installer config format: what structure best expresses version-ranged libc overrides, arch fallback overrides, and nonstandard asset naming? Go struct + TOML/YAML? Go code (compiled into webicached)?
- CPU micro-arch detection: how should POSIX and PowerShell bootstrap scripts detect amd64v1/v2/v3/v4? Check /proc/cpuinfo flags (Linux), sysctl hw.optional (macOS), .NET intrinsics (Windows)?
- Variant selection API: How do users request variant builds? Query param
(
?variant=rocm)? User-Agent hint? Per-installer default override?
Resolved Questions
- Shell out to
node releases.js? No — all source types are implemented in Go. The Go pipeline fetches and classifies everything directly. - Asset.Extra vs Variants?
Extrastays as version-related sort info. NewVariants []stringfield captures build qualifiers (rocm, installer, fxdependent, head, appimage, win-version-specific). Resolver deprioritizes assets with any variants. - Rate limiting for GitHub API calls?
webicacheduses round-robin refresh (one package per tick) with--page-delay(default 2s) via adelayTransportwrapper. No coordination needed — single instance by design. - Config format for source/owner/repo? Collapsed into single keys:
github_repo = owner/repo,git_url = ...,gitea_repo = owner/repo,hashicorp_product = name. Source type inferred from the key. - Per-package version normalization? Lives in Go code per-package (e.g.
internal/releases/postgres/versions.go), not in config. Each package that needs it implements aNormalizeVersions([]Asset)function called fromclassifypkg.NormalizeVersions.
Current Node.js Architecture (Reference)
For context, the current system's key files:
| File | Role |
|---|---|
_webi/serve-installer.js |
Main request handler — dispatches to builds + rendering |
_webi/builds.js |
Thin wrapper around builds-cacher |
_webi/builds-cacher.js |
Release fetching, caching, classification, version matching |
_webi/transform-releases.js |
Legacy release API (filter + cache + serve) |
_webi/normalize.js |
OS/arch/libc/ext regex detection from filenames |
_webi/installers.js |
Template rendering (bash + powershell) |
_webi/ua-detect.js |
User-Agent → OS/arch/libc |
_webi/projects.js |
Package metadata from README frontmatter |
_webi/frontmarker.js |
YAML frontmatter parser |
_common/github.js |
GitHub releases fetcher |
_common/gitea.js |
Gitea releases fetcher |
_common/git-tag.js |
Git tag listing |
{pkg}/releases.js |
Per-package release config (fetcher + filters + transforms) |
{pkg}/install.sh |
Per-package POSIX installer |
{pkg}/install.ps1 |
Per-package PowerShell installer |