docs: add GO_WEBI.md planning doc for Go rewrite

Captures the full migration plan: architecture, API surface inventory,
storage double-buffer design, incremental migration phases, and key
design decisions. Not a straight port — redesigning internals while
preserving the public API contract.
This commit is contained in:
AJ ONeal
2026-03-08 21:13:30 -06:00
parent 3e2e7f2f65
commit 0d4ca55e6f

333
GO_WEBI.md Normal file
View File

@@ -0,0 +1,333 @@
# Go Webi — Rewrite Plan
This is the planning and tracking document for rewriting the Webi server in Go.
This is **not a straight port** — we're redesigning internals while preserving the
public API surface.
## Guiding Principles
1. **Incremental migration.** Rewrites fail when they try to replace everything
at once. We integrate piece by piece, endpoint by endpoint, into the live
system.
2. **Library over framework.** The Go code should be composable pieces the caller
controls — not a framework that calls your code.
3. **stdlib + pgx, nothing else.** No third-party SDKs. Dependencies: stdlib,
`golang.org/x`, `github.com/jackc/pgx`, `github.com/therootcompany/golib`.
4. **Resilient by default.** The HTTP client, caching, and storage layers are
built for failure — timeouts, retries, circuit breaking, graceful fallback.
5. **Simpler classification.** Standard toolchains (goreleaser, cargo-dist, etc.)
produce predictable filenames. Match those patterns directly; push esoteric
naming into release-fetcher tagging/filtering rather than classifier heuristics.
## Repository Layout
```
cmd/
webid/ # main HTTP server
webicached/ # release cache daemon (fetches + stores releases)
internal/
buildmeta/ # OS, arch, libc, format constants and enums
classify/ # build artifact classification (filename → target)
httpclient/ # resilient net/http client with best-practice defaults
lexver/ # lexicographic version parsing and sorting
releases/ # release fetching (GitHub, Gitea, git-tag, custom)
github/
gitea/
gittag/
render/ # installer script template rendering
storage/ # release storage interface + implementations
storage.go # interface definition
fsstore/ # filesystem (JSON cache, like current _cache/)
pgstore/ # PostgreSQL (via sqlc + pgx)
uadetect/ # User-Agent → OS/arch/libc detection
```
## Public API Surface (Must Remain Stable)
These are the endpoints that clients depend on. The URLs, query parameters, and
response formats must not change.
### Bootstrap (curl-pipe entry point)
```
GET /{package} # User-Agent dispatch:
GET /{package}@{version} # curl/wget/POSIX → bash bootstrap script
# PowerShell → ps1 bootstrap script
# Browser → HTML cheat sheet (separate app)
```
### Installer Scripts
```
GET /api/installers/{package}.sh # POSIX installer
GET /api/installers/{package}@{version}.sh
GET /api/installers/{package}.ps1 # PowerShell installer
GET /api/installers/{package}@{version}.ps1
Query: ?formats=tar,zip,xz,git,dmg,pkg
&libc=msvc (ps1 only)
```
### Release Metadata
```
GET /api/releases/{package}.json
GET /api/releases/{package}@{version}.json
GET /api/releases/{package}.tab
GET /api/releases/{package}@{version}.tab
Query: ?os=linux&arch=amd64&libc=musl
&channel=stable&limit=10&formats=tar,xz
&pretty=true
```
### Package Assets
```
GET /packages/{package}/README.md
GET /packages/{package}/{filename}
```
### Debug
```
GET /api/debug # returns detected OS/arch from User-Agent
Query: ?os=...&arch=... # overrides
```
### Response Formats
**JSON**`{ oses, arches, libcs, formats, releases: [{ version, date, os,
arch, libc, ext, download, channel, lts, name }] }`
**TSV (.tab)**`version \t lts \t channel \t date \t os \t arch \t ext \t - \t
download \t name \t comment`
## Architecture
### Two Servers
- **`webid`** — the HTTP API server. Reads from storage, renders templates,
serves responses. Stateless (apart from in-memory caches of template files).
- **`webicached`** — the cache daemon. Periodically fetches releases from
upstream sources (GitHub API, Gitea, etc.), classifies builds, writes to
storage. Runs independently.
This separation means `webid` never blocks on upstream API calls. It serves from
whatever is in storage — always fast, always available.
### Double-Buffer Storage
The storage layer uses a double-buffer strategy so that a full release-history
rewrite never disrupts active downloads:
```
Slot A: [current — being read by webid]
Slot B: [next — being written by webicached]
On completion: atomic swap A ↔ B
```
For **fsstore**: two directories per package, swap via atomic rename.
For **pgstore**: two sets of rows per package (keyed by generation), swap via
updating an active-generation pointer in a single transaction.
### Storage Interface
```go
type Store interface {
// Read path (used by webid)
GetPackageMeta(ctx context.Context, name string) (*PackageMeta, error)
GetReleases(ctx context.Context, name string, filter ReleaseFilter) ([]Release, error)
// Write path (used by webicached)
BeginRefresh(ctx context.Context, name string) (RefreshTx, error)
}
type RefreshTx interface {
PutReleases(ctx context.Context, releases []Release) error
Commit(ctx context.Context) error // atomic swap
Rollback(ctx context.Context) error
}
```
### Resilient HTTP Client (`internal/httpclient`)
A `net/http` client with best-practice defaults, used as the base for all
upstream API calls:
- **Timeouts**: connect, TLS handshake, response header, overall request
- **Connection pooling**: sensible `MaxIdleConns`, `IdleConnTimeout`
- **TLS**: `MinVersion: tls.VersionTLS12`, system cert pool
- **Redirects**: limited redirect depth, no cross-scheme downgrades
- **User-Agent**: identifies as Webi with contact info
- **Retries**: exponential backoff with jitter for transient errors (429, 502,
503, 504), respects `Retry-After` headers
- **Context**: all calls take `context.Context` for cancellation
- **No global state**: created as instances, not `http.DefaultClient`
### Release Fetchers (`internal/releases/`)
Each upstream source (GitHub, Gitea, git-tag) is a small package that uses
`httpclient` and returns a common `[]Release` slice. No SDK dependencies.
```go
// internal/releases/github/github.go
func FetchReleases(ctx context.Context, client *httpclient.Client,
owner, repo string, opts ...Option) ([]Release, error)
```
### Build Classification (`internal/classify`)
Simplified from the current regex-heavy approach. Strategy:
1. **Known toolchain patterns first.** Goreleaser, cargo-dist, and Go's release
naming are predictable. Match those structures directly.
2. **Fallback regex for legacy.** Keep a simpler set of OS/arch/libc/ext regexes
for packages that don't use standard toolchains.
3. **Release-fetcher does the hard work.** The `releases.js` (or its Go
equivalent config) is responsible for filtering irrelevant assets and
normalizing oddball names _before_ classification sees them.
Target triplet format: `{os}-{arch}-{libc}` (simplified from the current
4-part `{arch}-{vendor}-{os}-{libc}`).
### Installer Rendering (`internal/render`)
Replaces `installers.js`. Reads template files, substitutes variables, injects
the per-package `install.sh` / `install.ps1`.
The current template variable set (30+ env vars) is the contract with the
client-side scripts. We must produce identical output for `package-install.tpl.sh`
and `package-install.tpl.ps1`.
### Reworking install.sh / install.ps1
Long-term, the per-package install scripts should feel like library users, not
framework callbacks:
- **Current (framework):** define `pkg_install()`, `pkg_get_current_version()`,
etc. and the framework calls them.
- **Goal (library):** source a helpers file, call functions like
`webi_download`, `webi_extract`, `webi_link` explicitly from a linear script.
This is a **separate migration** from the Go rewrite — it changes the client-side
contract. Plan it but don't block the server rewrite on it.
## Migration Strategy
Each phase produces something that works in production alongside the existing
Node.js server.
### Phase 0: Foundation
- [ ] `internal/buildmeta` — constants/enums for OS, arch, libc, format, channel
- [ ] `internal/lexver` — version parsing and comparison
- [ ] `internal/httpclient` — resilient HTTP client
- [ ] `internal/uadetect` — User-Agent parsing
- [ ] Go module init, CI setup
### Phase 1: Release Fetching
- [ ] `internal/releases/github` — GitHub releases fetcher
- [ ] `internal/releases/gitea` — Gitea releases fetcher
- [ ] `internal/releases/gittag` — git tag listing
- [ ] `internal/classify` — build artifact classifier
- [ ] `internal/storage` — interface definition
- [ ] `internal/storage/fsstore` — filesystem implementation with double-buffer
- [ ] `cmd/webicached` — cache daemon that can replace the Node.js caching
**Integration point:** `webicached` writes the same `_cache/` JSON format. The
Node.js server can read from it. Zero-risk cutover for release fetching.
### Phase 2: Release API
- [ ] `cmd/webid` — HTTP server skeleton with middleware
- [ ] `GET /api/releases/{package}.json` endpoint
- [ ] `GET /api/releases/{package}.tab` endpoint
- [ ] `GET /api/debug` endpoint
**Integration point:** reverse proxy specific `/api/releases/` paths to the Go
server. Node.js handles everything else.
### Phase 3: Installer Rendering
- [ ] `internal/render` — template engine
- [ ] `GET /api/installers/{package}.sh` endpoint
- [ ] `GET /api/installers/{package}.ps1` endpoint
- [ ] Bootstrap endpoint (`GET /{package}`)
**Integration point:** reverse proxy installer paths to Go. Node.js only serves
the website/cheat sheets (if it ever did — that may be a separate app).
### Phase 4: PostgreSQL Storage
- [ ] `internal/storage/pgstore` — sqlc-generated queries, double-buffer
- [ ] Schema design and migrations
- [ ] `webicached` writes to Postgres
- [ ] `webid` reads from Postgres
### Phase 5: Client-Side Rework
- [ ] Design new library-style install.sh helpers
- [ ] Migrate existing packages one at a time
- [ ] Update `package-install.tpl.sh` to support both old and new styles
## Key Design Decisions
### Version: Go 1.26+
Using `http.ServeMux` with `PathValue` for routing (available since Go 1.22).
Middleware via `github.com/therootcompany/golib/http/middleware/v2`.
### No ORM
PostgreSQL access via `pgx` + `sqlc`. Queries are hand-written SQL, type-safe
Go code is generated.
### Template Rendering
Use `text/template` or simple string replacement (matching current behavior).
The templates are shell scripts — they need literal `$` and `{}` — so
`text/template` may be the wrong tool. Likely better to stick with the current
regex-replacement approach, ported to Go.
### Error Handling
The current system returns a synthetic "error release" (`version: 0.0.0`,
`channel: error`) when no match is found, rather than an HTTP error. This
behavior must be preserved for backward compatibility.
## Open Questions
- [ ] How should the Go server discover which packages exist? Currently the
Node.js server scans the filesystem for directories with `releases.js`. The Go
cache daemon needs a similar discovery mechanism — or a static manifest.
- [ ] Should `webicached` shell out to `node releases.js` during migration, or
do we rewrite every releases.js as Go config/code from the start?
- [ ] What's the deployment topology? Single binary serving both roles? Separate
processes? Kubernetes pods?
- [ ] Rate limiting for GitHub API calls in `webicached` — how to coordinate
across multiple instances?
## Current Node.js Architecture (Reference)
For context, the current system's key files:
| File | Role |
|------|------|
| `_webi/serve-installer.js` | Main request handler — dispatches to builds + rendering |
| `_webi/builds.js` | Thin wrapper around builds-cacher |
| `_webi/builds-cacher.js` | Release fetching, caching, classification, version matching |
| `_webi/transform-releases.js` | Legacy release API (filter + cache + serve) |
| `_webi/normalize.js` | OS/arch/libc/ext regex detection from filenames |
| `_webi/installers.js` | Template rendering (bash + powershell) |
| `_webi/ua-detect.js` | User-Agent → OS/arch/libc |
| `_webi/projects.js` | Package metadata from README frontmatter |
| `_webi/frontmarker.js` | YAML frontmatter parser |
| `_common/github.js` | GitHub releases fetcher |
| `_common/gitea.js` | Gitea releases fetcher |
| `_common/git-tag.js` | Git tag listing |
| `{pkg}/releases.js` | Per-package release config (fetcher + filters + transforms) |
| `{pkg}/install.sh` | Per-package POSIX installer |
| `{pkg}/install.ps1` | Per-package PowerShell installer |