- Remove LegacyBackport from classifypkg and webicached; canonical values
now flow through storage untouched
- Add legacyFieldBackport() in storage/legacy.go, called only at export time
(go: armv6→arm, ffmpeg windows: .gz/.empty→.exe)
- ExportLegacy now takes pkg name and returns LegacyDropStats (variants + formats dropped)
- fsstore.Commit logs dropped assets so filtering is visible
- Add FormatAPK (.apk) and FormatAppImage (.AppImage) to buildmeta and classify
so those files are properly classified and then correctly dropped from legacy export
rather than passing through as empty-format
Implements storage.Store for PostgreSQL using pgx/v5.
Schema uses double-buffered generations per package — write into the
inactive gen, then atomically swap the active pointer on Commit. Readers
always see a complete consistent snapshot.
Write path: BeginRefresh → Put (staged in-memory) → Commit (CopyFrom + swap)
Read path: Load → reads active gen from webi_packages, fetches assets
Both webid and webicached now accept -pg=<dsn> to use pgstore instead
of fsstore. Schema is applied idempotently on startup.
Also:
- storage.Store interface gains ListPackages(ctx) — fsstore reads the
directory; pgstore queries webi_packages
- webid.loadAll() uses ListPackages instead of filepath.ReadDir
- Fixed .gitignore: /webid (root binary) was incorrectly matching cmd/webid/
Move legacy-specific field translations out of the core classifier into
LegacyBackport(), called by webicached before writing the JSON cache.
Core classifier now outputs canonical values:
- Go dist arm → armv6 (correct per GOARM default)
- ffmpeg Windows .gz → .gz (correct file extension)
LegacyBackport remaps for Node.js compat:
- Go dist armv6 → arm (production keeps raw API value)
- ffmpeg Windows .gz → exe (production releases.js override)
sass armv6→armv7 stays in classifier (Dart Sass genuinely targets ARMv7).
Replace direct string comparison with canonical equivalence checks so
naming convention differences (darwin/macos, x86_64/amd64, aarch64/arm64)
don't appear as false diffs. Now only real classification disagreements
surface:
- go: illumos/solaris→sunos mapping, arm ambiguity per OS
- sass: bare "arm" should be armv7, not armv6
- ffmpeg: Windows .gz ext classified as exe in prod
- terraform: alpha channel detected correctly by Go, missed by prod
- postgres: legacy EDB ext "tar" vs "tar.gz"
- pg/releases.conf: add asset_filter=postgres so pg only returns server
assets (which include the client), matching production releases.js
- classifypkg: add "pg" to postgres version normalizer switch case
- comparecache: compare os/arch/libc/ext/channel fields on shared assets,
distinguishing real disagreements (diff-*) from expected fill diffs
where Go classifies at write time but Node.js leaves fields empty
Add PowerShell() function to render .ps1 installers by injecting
$Env: variables and splicing install.ps1 content. Wire it into
the webid server for .ps1 extension requests.
-sample N now randomly samples N assets from each package's diff list,
giving a representative view of classification differences instead of
showing only the first alphabetical entries. Implies -windowed -diffs
to filter out version-depth noise and focus on real bugs.
Usage: go run ./cmd/comparecache -sample 8 -diffs
Picks 8 random packages beyond any explicitly named ones, logs which
ones were sampled for reproducibility.
Production has two separate flows:
1. /{pkg} (curl-pipe bootstrap) — minimal script that sets WEBI_PKG,
WEBI_HOST, WEBI_CHECKSUM and downloads+runs webi
2. /api/installers/{pkg}.sh — full installer with resolved release
and embedded install.sh
Previously handleBootstrap served the full installer. Now:
- handleBootstrap: curl-pipe bootstrap (reads curl-pipe-bootstrap.tpl.sh)
- handleInstaller: full installer (/api/installers/{pkg}.sh)
Also:
- Export render.InjectVar for use by bootstrap handler
- Add webi.sh checksum calculation (SHA-1 first 8 chars)
- Add /api/installers/ route to mux and test server
The commaToTab byte replacement was fragile — URLs containing commas
would break. Now uses csv.Writer with Comma='\t' as the backend for
csvutil.Encoder, producing correct TSV output regardless of field content.
- Added TestV1ResolveJQ to verify jq resolves to binary, not git
- Changed upstream gap detection in resolve_cache_test to t.Skipf
(shellcheck/windows and xz/linux-arm64 don't have upstream builds)
- Updated ANSWERS.md with git assets investigation results
New API routes:
- GET /v1/releases/{pkg}.tab — list releases as TSV (with header)
- GET /v1/releases/{pkg}.json — list releases as JSON array
- GET /v1/resolve/{pkg}.tab — resolve best asset for platform (TSV)
- GET /v1/resolve/{pkg}.json — resolve best asset for platform (JSON)
Key design decisions:
- TSV as primary format via csvutil (easy for cut/grep/sort/agents)
- Go-native naming: darwin, x86_64, aarch64 (no legacy mapping)
- No quoted fields — spaces for lists within fields
- Always includes header row in TSV output
- Resolve endpoint returns single best match with triplet info
Query params: os, arch, libc, channel, version, lts, format, variant, limit
- JSON response returns bare array (not wrapped in {"releases": [...]})
- OS names mapped to Node.js conventions: darwin → macos
- Arch names mapped: x86_64 → amd64, aarch64 → arm64
- Version strings stripped of "v" prefix
- Extension stripped of "." prefix
- Empty libc defaults to "none"
- Tab format uses actual TSV (not comma-separated)
- Tab LTS field uses "lts" / "-" (not "true" / "false")
- Tab shows header row only with ?pretty=true
- Releases sorted newest-first by version (using lexver)
- Added comprehensive format tests and production comparison test
Renders package-install.tpl.sh with WEBI_* variable injection and
install.sh splicing. Bootstrap route at /{package}@{version} detects
UA, resolves best release, and returns rendered installer script.
Serves /api/releases/{pkg}@{version}.json and .tab matching the
Node.js format. Supports query params for os, arch, libc, channel,
formats, lts, limit. Handles selfhosted packages (install.sh only).
Pre-loads all cached packages on startup. Includes /api/debug for
UA detection and /api/health endpoint.
Source type is now inferred from the primary key:
github_repo = owner/repo (was source=github + owner + repo)
git_url = https://... (was source=gittag + url)
gitea_repo = owner/repo (was source=gitea + owner + repo)
hashicorp_product = name (was source=hashicorp + product)
One-off dist sources (nodedist, zigdist, etc.) keep the explicit
source= key since they're already one-liners.
Parser still accepts the old format via the default fallback branch.
- Default mode: classify all from rawcache on startup, then
fetch+refresh one package per tick (round-robin).
- --eager flag for the old behavior (fetch all on startup).
- Skip aliases and symlinked dirs — legacy cache doesn't create
entries for them (resolved at request time by the server).
- Add --page-delay (default 2s) to rate-limit paginated API requests.
- Add delayTransport wrapper on http.Client.
Symlinked directories (e.g. rust.vim → vim-rust) are now treated as
aliases instead of being independently fetched and classified. Creates
cache symlinks just like alias_of config entries.
Flutter's API returns separate entries for universal (x64) and arm64
macOS builds under the same version/channel/os. The rawcache tag
was version-channel-os, so arm64 overwrote universal. Now extracts
arch from the archive path and appends it to the tag.
Re-fetched flutter: +218 entries recovered.
Packages with alias_of in releases.conf (e.g. dashd → dashcore,
golang → go) now get symlinked cache files so they resolve to the
same JSON as their target. 13 aliases total.
Added AliasOf as a proper field in installerconf.Conf, LinkAlias
method to fsstore, and alias handling in webicached's Run loop.
Bun releases use tags like bun-v1.2.3. Without tag_prefix, the version
included the bun- prefix, causing mismatches. Also update comparecache
with bun version normalizer for accurate comparison.
Node.js pads Go versions like "1.10" to "1.10.0". Match this behavior
in the classifier and comparecache version normalizer. Also filter
-arm6. malformed arch and .src. source tarballs from comparison noise.
Match count: 73/106
- Add .app.zip to legacyFormats so macOS fish builds export correctly
- Exclude bundledpcre, fish-static, OpenBeta from fish/releases.conf
- Add fish Linux binaries to comparecache noise (Go improvement)
Match count: 72/106
Git for Windows uses tags like v2.53.0.windows.1. Node.js strips
".windows.1" and replaces ".windows.N" (N>1) with ".N".
Add NormalizeVersions to the git package and wire it into the classify
pipeline. Also add version normalization to comparecache so the
comparison uses canonical versions for both caches.
Remaining git diffs: data freshness (.windows.2 releases Go hasn't
fetched) and RC versions in Go that live doesn't have.
rocm and jetpack variants are tagged by Go's variant system but kept
by Node.js with special arch names. Filter them from comparison noise
to avoid false positives.
Match count: 70/106
- Hugo: exclude Linux-64bit legacy filename alias
- Hugo-extended: exclude Linux-64bit legacy filename alias
- Gitea: exclude -src- and -docs- tarballs
- Pathman: exclude armv8 legacy alias
- UUID v7: exclude exotic architectures (thumb, armeb, loong, gnux32, risc)
- comparecache: filter bare executables and docs tarballs as noise,
apply noise filter to both live and Go sides
- legacy.go: add .tar.bz2 to legacyFormats
Match count: 69/106 (up from 58)
Move all source-specific classifiers, variant tagging, config filtering,
and readAllRaw out of cmd/webicached into internal/classifypkg. The new
Package() function runs the full classify pipeline: source dispatch →
tag variants → apply config.
webicached now only handles fetching raw data and writing to fsstore.
The classification logic is reusable by comparecache and future tools.
- gittag classifier: use "{repo}-{tag}" filenames (matching Node.js),
strip "v" prefix from version, synthesize date-based version for
tagless repos (HEAD of master/main)
- GitHub source-only: use "git" format (no dot) and "{repo}-{tag}"
filename for clone assets
- Legacy export: add "git" to recognized formats so gittag packages
appear in the legacy cache
- Derives repo name from the git URL in releases.conf
vim-commentary now matches. vim-zig matches on format but has newer
data (expected — Go fetched more recently than Node.js).
Moved isMetaAsset from cmd/webicached to classify.IsMetaAsset so
both webicached and comparecache use the same logic. Removed
duplicated isMetaFile from comparecache. The comparecache
isLiveNoise now delegates to classify.IsMetaAsset and adds
live-specific filters (.deb, .rpm, -src-).
Drop the Owner-Repo prefix from source tarball filenames — the
actual download name comes from Content-Disposition. Added TODO
to resolve the full filename via HEAD at fetch time.
Source-only releases (no uploaded assets) now also emit a .git
asset with the GitHub clone URL, matching how gittag-sourced
packages like vim-commentary and vim-zig work. This allows
install via git clone --branch <tag> as an alternative to
downloading the tarball.
Source-only GitHub releases now use the API-provided tarball_url
and zipball_url directly. Filename follows the legacy pattern
(Owner-Repo-Tag.ext) to approximate the Content-Disposition
filename that Node.js gets by following the redirect.
Source-only releases now use the API-provided tarball_url and
zipball_url directly instead of guessing the archive URL format.
The filename uses the git tag, and the download URL is what
GitHub's API actually returns.
Strips known noise from the live cache before comparison: .deb, .rpm,
.asc, .sig, .gpg, .sbom, .sha256, checksums, install.sh, install.ps1,
.txt, and other non-installable files. Matches went from 16 to 50.
Uses Node.js version range (2nd to 2nd-to-last) as the window.
All Node.js versions in the window are included so missing Go
versions/assets are visible. Go-only versions are hidden since
those are just deeper fetch history, not real gaps.
- yq: move man_page_only from general isMetaAsset to yq-specific tagger
- node: restore .exe as stored asset with "bare-exe" variant (installable
by Go, excluded from legacy)
- ollama: rename Ollama-darwin.zip variant from "installer" to "app"
(.app bundle is installable by Go, just not by legacy Node.js)
The distinction: general classification/filter (isMetaAsset) handles
truly non-installable files. Installer-specific taggers handle assets
that are installable but need variant tagging. Legacy filter strips
variants and unsupported formats for Node.js compat.
Node.js normalizes .tgz extensions to .tar.gz in the cache name field
while keeping the real .tgz URL in download. Match this behavior so
legacy export filenames are consistent. Affects ollama-darwin.tgz and
any other packages using .tgz.