Files
vim-ale/COMPARISON.md
AJ ONeal 37d6474675 fix(fish): tag source tarball as variant, exclude from legacy export
fish-{version}.tar.xz is an uploaded source tarball with no OS/arch in
the filename. GitHub API doesn't distinguish it from binaries. Tag assets
with no OS and no arch as "source" variant so they're filtered from
legacy export. The linux .tar.xz binaries classify correctly and are
kept — Node.js just doesn't have them yet.
2026-03-10 14:39:25 -06:00

8.3 KiB

Go vs Node.js Cache Comparison

Systematic comparison of Go pipeline output (_cache/) vs Node.js production cache (LIVE_cache/). Generated by cmd/comparecache.

Latest run: 2026-03-10. -no-fetch rebuild from existing raw data, with variant tagging and legacy export filter applied.

Summary (latest version only)

Category Count Meaning
match 21 Identical asset filenames at latest version
go-missing 4 Go produces no output (alias, meta-package, no config)
live-missing 16 Package exists in Go but not in live cache
go-extra-versions 44 Go has more version history (deeper fetch)
live-extra-versions 11 Live has newer data (rate-limited Go fetch)
go-extra-assets 19 Go includes assets that Node.js filters out
live-extra-assets 41 Node.js includes assets that Go filters out
live-has-meta 41 Node.js includes meta-assets (checksums, sigs)

Changes since last comparison: variant tagging (bun, pwsh, ollama, git, node, lsd, fish, xcaddy) and legacy export filter (strips Variants-tagged assets and non-legacy formats like .deb, .rpm, .tar.bz2, .tar.zst, .7z) are now active. Match count dropped from 28→21 because .deb files that Node.js keeps are now correctly categorized as live-extra-assets (Go filters them from legacy output).

Key Observations

1. Classification Timing

The Node.js cache stores assets with empty os/arch/ext fields — normalize.js fills those at serve time. The Go pipeline classifies at write time. The Go cache has richer data per-asset. Comparison is done at the filename level.

2. Meta-Asset Filtering

Go's isMetaAsset() filters out checksums, signatures, SBOMs, etc. Node.js keeps them. This accounts for 41 packages showing live-has-meta differences. Correct behavior — Go filters non-installable files at cache time.

3. Version Depth

Go has deeper version history for most GitHub-sourced packages (fetches all pages unless -shallow). Node.js limits to 30 releases per API call. This is a feature — Go provides complete histories when doing a full fetch.

4. Build Variants (IMPLEMENTED)

Variant tagging is now active. Per-package taggers in internal/releases/{pkg}/. Assets with Variants are stored but excluded from legacy export.

  • bun: -profile → Variants: ["profile"]; non-baseline → Arch: amd64v3
  • ollama: -rocm, -jetpack5, -jetpack6 → Variants
  • pwsh: -fxdependent, -fxdependentWinDesktop → Variants
  • git: .exe and PortableGit → Variants: ["installer"]; -pdb → ["pdb"]
  • node: .msi → Variants: ["installer"]
  • lsd: .deb → ["deb"]; -msvc → ["msvc"]
  • fish: .pkg → Variants: ["installer"]
  • xcaddy: .deb → Variants: ["deb"]

5. Legacy Export Filter (IMPLEMENTED)

ExportLegacy now strips:

  • Assets with non-empty Variants
  • Formats Node.js doesn't handle: .deb, .rpm, .snap, .appx, .tar.zst, .tar.bz2, .7z

This means the _cache/ JSON files only contain assets the Node.js server can actually serve.

6. Format Handling

All formats are stored in the internal Go model — nothing is dropped at classification time. The legacy filter applies only at export time.

  • .pkgpkgutil --expand-full
  • .debar x + tar xf data.tar.*
  • .dmghdiutil attach
  • .msimsiexec /a

Only .exe is ambiguous (binary vs installer). Installer .exe files get Variants: ["installer"].

7. Node Multi-Source

The node package merges official + unofficial builds via unofficial_url in releases.conf. Down to 4 differences at latest version:

  • Live has .7z (filtered from Go legacy export) and .msi (Go tags as installer)
  • Go has .exe bare binary that live doesn't (naming diff)

Per-Package Checklist

Status: [x] reviewed, [-] known acceptable, [ ] needs work

Exact Matches at Latest Version (21)

  • atomicparsley
  • awless
  • chromedriver (chromedist)
  • comrak
  • dotenv-linter
  • gpg (gpgdist)
  • iterm2 (iterm2dist)
  • julia (juliadist)
  • koji
  • lsd — .deb and msvc variants now correctly filtered
  • mariadb (mariadbdist)
  • pathman
  • sass
  • sd
  • shellcheck
  • shfmt
  • sqlc
  • terraform (hashicorp)
  • xcaddy — .deb variants now correctly filtered
  • xsv
  • zig (zigdist)

Go Missing (4)

  • [-] dashd — alias_of=dashcore (correct)
  • [-] macos — no releases.conf
  • [-] pg-essentials — meta-package
  • [-] zig.vim — gittag source, 0 raw data

Live Missing — Go-Only (16)

  • [-] node-official — Go split, not in live cache
  • [-] node-unofficial — Go split, not in live cache
  • [-] pg — Go alias, live uses postgres
  • [-] ripgrep — Go alias, live uses rg
  • [-] rust.vim — symlink to vim-rust
  • [-] vim-* (11 packages) — gittag packages not in live cache

Meta-Only Diffs (Go correctly filters, Node.js keeps)

  • [-] caddy, cilium, cmake, curlie, dashmsg, deno, dotenv, ffuf, fzf, gh, gitdeploy, goreleaser, gprox, grype, hugo, k9s, keypairs, kind, kubectx, kubens, monorel, mutagen, ots, rclone, rg, runzip, sclient, sqlpkg, sttr, syncthing, terramate, watchexec, xz, yq (41 packages)

Live has .deb/.rpm that Go correctly filters from legacy export

  • [-] bat — 8 .deb files
  • [-] caddy — 9 .deb files
  • [-] delta — 5 .deb files
  • [-] fd — 8 .deb files
  • [-] gh — 8 .deb/.rpm files
  • [-] goreleaser — 18 .deb/.rpm files
  • [-] grype — 8 .deb/.rpm files
  • [-] hexyl — 7 .deb files
  • [-] k9s — 10 .deb/.rpm files
  • [-] pandoc — 2 .deb files
  • [-] pwsh — 4 .deb/.rpm files
  • [-] rclone — 16 .deb/.rpm files
  • [-] sttr — 9 .deb/.rpm files
  • [-] syncthing — 5 .deb/.rpm files
  • [-] terramate — 6 .deb/.rpm files
  • [-] tinygo — 3 .deb files
  • [-] trip — 3 .deb files
  • [-] watchexec — 16 .deb files
  • [-] zoxide — 4 .deb files

Remaining Go-Extra-Assets (need review)

  • bun — baseline builds now serve as legacy amd64 (filename stripped, download URL kept); non-baseline tagged as v3 variant (excluded).
  • fish — source tarball tagged as variant; linux .tar.xz binaries are correct (Node.js just doesn't have them yet)
  • git — 4 extras: MinGit-busybox .zip, pdbs .zip at latest version
  • [-] hugo — 1 extra: Linux-64bit.tar.gz (old naming); keep as-is for now
  • [-] hugo-extended — 14 extras: non-extended assets leaking in; keep as-is for now
  • kubectx — 14 extras: kubens assets from shared GitHub release
  • kubens — 14 extras: kubectx assets from shared release
  • node — 1 extra: .exe bare binary naming difference
  • ollama — 2 extras: Ollama-darwin.zip (case difference?)
  • uuidv7 — 16 extras: exotic arches (thumbeb, armeb, riscv32)
  • yq — 1 extra: naming difference
  • ffmpeg — 21 extras: many platform/format combinations

Source/Naming Diffs

  • [-] aliasman — source tarball naming differences (GitHub archive format)
  • [-] duckdns.sh — source tarball naming differences
  • [-] serviceman — source naming + version differences

Stale Data (rate-limited, need re-fetch with token)

  • [-] go — live has 98 extra versions (Go didn't fetch golang.org)
  • [-] lf — live has 30 extra versions
  • [-] postgres, psql — Go has v17, live has v18
  • [-] ffmpeg — Go has older, live has newer

Cross-Package Issues

  • kubectx/kubens — shared GitHub release, assets for both packages appear in each. Need to split by asset name prefix.

Remaining Action Items

  1. hugo-extended exclude: Deferred — keep matching Node.js behavior for now
  2. kubectx/kubens split: Filter assets by name prefix in shared release
  3. bun baseline in legacy: Resolved — baseline is legacy amd64, non-baseline tagged as v3 variant
  4. Re-fetch with GITHUB_TOKEN: Fix rate-limited/stale packages
  5. Unknown asset notifications: Log new/unrecognized assets to _notices/

Deferred Decisions

  1. Consolidate cmd/classify and cmd/webicached duplication: Both have their own classifyPackage switch, isMetaAsset, detectFormat, and GitHub API types (ghRelease, ghAsset, etc.). cmd/classify is a diagnostic tool (CSV output), cmd/webicached is the production pipeline ([]storage.Asset). Shared pieces could move to internal/ packages. Keep separate dispatchers since they return different types.