Files
vim-ale/COMPARISON.md
AJ ONeal cdec995183 ref(node): remove node-official/node-unofficial split packages
The node package already merges both sources via unofficial_url
in releases.conf. The split packages were a workaround that
produced cache files not present in the live Node.js cache.
2026-03-10 16:58:26 -06:00

200 lines
8.3 KiB
Markdown

# Go vs Node.js Cache Comparison
Systematic comparison of Go pipeline output (`_cache/`) vs Node.js production
cache (`LIVE_cache/`). Generated by `cmd/comparecache`.
Latest run: 2026-03-10. `-no-fetch` rebuild from existing raw data, with
variant tagging and legacy export filter applied.
## Summary (latest version only)
| Category | Count | Meaning |
|----------|-------|---------|
| match | 21 | Identical asset filenames at latest version |
| go-missing | 4 | Go produces no output (alias, meta-package, no config) |
| live-missing | 16 | Package exists in Go but not in live cache |
| go-extra-versions | 44 | Go has more version history (deeper fetch) |
| live-extra-versions | 11 | Live has newer data (rate-limited Go fetch) |
| go-extra-assets | 19 | Go includes assets that Node.js filters out |
| live-extra-assets | 41 | Node.js includes assets that Go filters out |
| live-has-meta | 41 | Node.js includes meta-assets (checksums, sigs) |
Changes since last comparison: variant tagging (bun, pwsh, ollama, git, node,
lsd, fish, xcaddy) and legacy export filter (strips Variants-tagged assets and
non-legacy formats like .deb, .rpm, .tar.bz2, .tar.zst, .7z) are now active.
Match count dropped from 28→21 because .deb files that Node.js keeps are now
correctly categorized as live-extra-assets (Go filters them from legacy output).
## Key Observations
### 1. Classification Timing
The Node.js cache stores assets with **empty** os/arch/ext fields — `normalize.js`
fills those at serve time. The Go pipeline classifies at write time. The Go cache
has richer data per-asset. Comparison is done at the **filename level**.
### 2. Meta-Asset Filtering
Go's `isMetaAsset()` filters out checksums, signatures, SBOMs, etc. Node.js
keeps them. This accounts for 41 packages showing `live-has-meta` differences.
**Correct behavior** — Go filters non-installable files at cache time.
### 3. Version Depth
Go has deeper version history for most GitHub-sourced packages (fetches all pages
unless `-shallow`). Node.js limits to 30 releases per API call. This is a
**feature** — Go provides complete histories when doing a full fetch.
### 4. Build Variants (IMPLEMENTED)
Variant tagging is now active. Per-package taggers in `internal/releases/{pkg}/`.
Assets with Variants are stored but excluded from legacy export.
- **bun**: `-profile` → Variants: ["profile"]; non-baseline → Arch: amd64v3
- **ollama**: `-rocm`, `-jetpack5`, `-jetpack6` → Variants
- **pwsh**: `-fxdependent`, `-fxdependentWinDesktop` → Variants
- **git**: `.exe` and PortableGit → Variants: ["installer"]; `-pdb` → ["pdb"]
- **node**: `.msi` → Variants: ["installer"]
- **lsd**: `.deb` → ["deb"]; `-msvc` → ["msvc"]
- **fish**: `.pkg` → Variants: ["installer"]
- **xcaddy**: `.deb` → Variants: ["deb"]
### 5. Legacy Export Filter (IMPLEMENTED)
`ExportLegacy` now strips:
- Assets with non-empty `Variants`
- Formats Node.js doesn't handle: `.deb`, `.rpm`, `.snap`, `.appx`, `.tar.zst`,
`.tar.bz2`, `.7z`
This means the `_cache/` JSON files only contain assets the Node.js server
can actually serve.
### 6. Format Handling
All formats are stored in the internal Go model — nothing is dropped at
classification time. The legacy filter applies only at export time.
- `.pkg``pkgutil --expand-full`
- `.deb``ar x` + `tar xf data.tar.*`
- `.dmg``hdiutil attach`
- `.msi``msiexec /a`
Only `.exe` is ambiguous (binary vs installer). Installer `.exe` files get
`Variants: ["installer"]`.
### 7. Node Multi-Source
The `node` package merges official + unofficial builds via `unofficial_url`
in releases.conf. Down to 4 differences at latest version:
- Live has `.7z` (filtered from Go legacy export) and `.msi` (Go tags as installer)
- Go has `.exe` bare binary that live doesn't (naming diff)
## Per-Package Checklist
Status: `[x]` reviewed, `[-]` known acceptable, `[ ]` needs work
### Exact Matches at Latest Version (21)
- [x] atomicparsley
- [x] awless
- [x] chromedriver (chromedist)
- [x] comrak
- [x] dotenv-linter
- [x] gpg (gpgdist)
- [x] iterm2 (iterm2dist)
- [x] julia (juliadist)
- [x] koji
- [x] lsd — .deb and msvc variants now correctly filtered
- [x] mariadb (mariadbdist)
- [x] pathman
- [x] sass
- [x] sd
- [x] shellcheck
- [x] shfmt
- [x] sqlc
- [x] terraform (hashicorp)
- [x] xcaddy — .deb variants now correctly filtered
- [x] xsv
- [x] zig (zigdist)
### Go Missing (4)
- [-] dashd — alias_of=dashcore (correct)
- [-] macos — no releases.conf
- [-] pg-essentials — meta-package
- [-] zig.vim — gittag source, 0 raw data
### Live Missing — Go-Only (14)
- [-] pg — Go alias, live uses postgres
- [-] ripgrep — Go alias, live uses rg
- [-] rust.vim — symlink to vim-rust
- [-] vim-* (11 packages) — gittag packages not in live cache
### Meta-Only Diffs (Go correctly filters, Node.js keeps)
- [-] caddy, cilium, cmake, curlie, dashmsg, deno, dotenv, ffuf, fzf, gh,
gitdeploy, goreleaser, gprox, grype, hugo, k9s, keypairs, kind, kubectx,
kubens, monorel, mutagen, ots, rclone, rg, runzip, sclient, sqlpkg, sttr,
syncthing, terramate, watchexec, xz, yq (41 packages)
### Live has .deb/.rpm that Go correctly filters from legacy export
- [-] bat — 8 .deb files
- [-] caddy — 9 .deb files
- [-] delta — 5 .deb files
- [-] fd — 8 .deb files
- [-] gh — 8 .deb/.rpm files
- [-] goreleaser — 18 .deb/.rpm files
- [-] grype — 8 .deb/.rpm files
- [-] hexyl — 7 .deb files
- [-] k9s — 10 .deb/.rpm files
- [-] pandoc — 2 .deb files
- [-] pwsh — 4 .deb/.rpm files
- [-] rclone — 16 .deb/.rpm files
- [-] sttr — 9 .deb/.rpm files
- [-] syncthing — 5 .deb/.rpm files
- [-] terramate — 6 .deb/.rpm files
- [-] tinygo — 3 .deb files
- [-] trip — 3 .deb files
- [-] watchexec — 16 .deb files
- [-] zoxide — 4 .deb files
### Remaining Go-Extra-Assets (need review)
- [x] bun — baseline builds now serve as legacy amd64 (filename stripped,
download URL kept); non-baseline tagged as v3 variant (excluded).
- [x] fish — source tarball tagged as variant; linux .tar.xz binaries are
correct (Node.js just doesn't have them yet)
- [x] git — busybox and pdbs-for-git tagged as variants
- [-] hugo — 1 extra: `Linux-64bit.tar.gz` (old naming); keep as-is for now
- [-] hugo-extended — 14 extras: non-extended assets leaking in; keep as-is for now
- [x] kubectx — asset_filter splits shared release
- [x] kubens — asset_filter splits shared release
- [x] node — .exe bare binary stored with "bare-exe" variant (Go can serve,
legacy excludes); .msi tagged as installer
- [x] ollama — Ollama-darwin.zip tagged as "app" variant (Go can install,
legacy excludes); .tgz normalized to .tar.gz in filename
- [-] uuidv7 — exotic arches correctly classified; resolver filters by request
- [x] yq — man_page_only excluded via releases.conf
- [x] ffmpeg — asset_filter=ffmpeg excludes ffprobe/ffplay; .LICENSE/.README
now caught by isMetaAsset
### Source/Naming Diffs
- [-] aliasman — source tarball naming differences (GitHub archive format)
- [-] duckdns.sh — source tarball naming differences
- [-] serviceman — source naming + version differences
### Stale Data (rate-limited, need re-fetch with token)
- [-] go — live has 98 extra versions (Go didn't fetch golang.org)
- [-] lf — live has 30 extra versions
- [-] postgres, psql — Go has v17, live has v18
- [-] ffmpeg — Go has older, live has newer
### Cross-Package Issues
- [x] kubectx/kubens — resolved via asset_filter in releases.conf
## Remaining Action Items
1. ~~**hugo-extended exclude**~~: Deferred — keep matching Node.js behavior for now
2. ~~**kubectx/kubens split**~~: Resolved — asset_filter in releases.conf
3. ~~**bun baseline in legacy**~~: Resolved — baseline is legacy amd64,
non-baseline tagged as v3 variant
4. **Re-fetch with GITHUB_TOKEN**: Fix rate-limited/stale packages
5. **Unknown asset notifications**: Log new/unrecognized assets to `_notices/`
## Deferred Decisions
1. **Consolidate cmd/classify and cmd/webicached duplication**: Both have their
own `classifyPackage` switch, `isMetaAsset`, `detectFormat`, and GitHub API
types (`ghRelease`, `ghAsset`, etc.). `cmd/classify` is a diagnostic tool
(CSV output), `cmd/webicached` is the production pipeline (`[]storage.Asset`).
Shared pieces could move to `internal/` packages. Keep separate dispatchers
since they return different types.