Replace direct string comparison with canonical equivalence checks so
naming convention differences (darwin/macos, x86_64/amd64, aarch64/arm64)
don't appear as false diffs. Now only real classification disagreements
surface:
- go: illumos/solaris→sunos mapping, arm ambiguity per OS
- sass: bare "arm" should be armv7, not armv6
- ffmpeg: Windows .gz ext classified as exe in prod
- terraform: alpha channel detected correctly by Go, missed by prod
- postgres: legacy EDB ext "tar" vs "tar.gz"
- pg/releases.conf: add asset_filter=postgres so pg only returns server
assets (which include the client), matching production releases.js
- classifypkg: add "pg" to postgres version normalizer switch case
- comparecache: compare os/arch/libc/ext/channel fields on shared assets,
distinguishing real disagreements (diff-*) from expected fill diffs
where Go classifies at write time but Node.js leaves fields empty
-sample N now randomly samples N assets from each package's diff list,
giving a representative view of classification differences instead of
showing only the first alphabetical entries. Implies -windowed -diffs
to filter out version-depth noise and focus on real bugs.
Usage: go run ./cmd/comparecache -sample 8 -diffs
Picks 8 random packages beyond any explicitly named ones, logs which
ones were sampled for reproducibility.
Bun releases use tags like bun-v1.2.3. Without tag_prefix, the version
included the bun- prefix, causing mismatches. Also update comparecache
with bun version normalizer for accurate comparison.
Node.js pads Go versions like "1.10" to "1.10.0". Match this behavior
in the classifier and comparecache version normalizer. Also filter
-arm6. malformed arch and .src. source tarballs from comparison noise.
Match count: 73/106
- Add .app.zip to legacyFormats so macOS fish builds export correctly
- Exclude bundledpcre, fish-static, OpenBeta from fish/releases.conf
- Add fish Linux binaries to comparecache noise (Go improvement)
Match count: 72/106
Git for Windows uses tags like v2.53.0.windows.1. Node.js strips
".windows.1" and replaces ".windows.N" (N>1) with ".N".
Add NormalizeVersions to the git package and wire it into the classify
pipeline. Also add version normalization to comparecache so the
comparison uses canonical versions for both caches.
Remaining git diffs: data freshness (.windows.2 releases Go hasn't
fetched) and RC versions in Go that live doesn't have.
rocm and jetpack variants are tagged by Go's variant system but kept
by Node.js with special arch names. Filter them from comparison noise
to avoid false positives.
Match count: 70/106
- Hugo: exclude Linux-64bit legacy filename alias
- Hugo-extended: exclude Linux-64bit legacy filename alias
- Gitea: exclude -src- and -docs- tarballs
- Pathman: exclude armv8 legacy alias
- UUID v7: exclude exotic architectures (thumb, armeb, loong, gnux32, risc)
- comparecache: filter bare executables and docs tarballs as noise,
apply noise filter to both live and Go sides
- legacy.go: add .tar.bz2 to legacyFormats
Match count: 69/106 (up from 58)
Moved isMetaAsset from cmd/webicached to classify.IsMetaAsset so
both webicached and comparecache use the same logic. Removed
duplicated isMetaFile from comparecache. The comparecache
isLiveNoise now delegates to classify.IsMetaAsset and adds
live-specific filters (.deb, .rpm, -src-).
Strips known noise from the live cache before comparison: .deb, .rpm,
.asc, .sig, .gpg, .sbom, .sha256, checksums, install.sh, install.ps1,
.txt, and other non-installable files. Matches went from 16 to 50.
Uses Node.js version range (2nd to 2nd-to-last) as the window.
All Node.js versions in the window are included so missing Go
versions/assets are visible. Go-only versions are hidden since
those are just deeper fetch history, not real gaps.
- comparecache: use lexver.Compare for version sorting instead of
lexicographic sort (v9.9.0 was incorrectly ranked above v25.8.0)
- webicached/expandNodeFile: add riscv64, loong64 arch mappings and
7z format support for unofficial Node.js builds
- COMPARISON.md: rewrite with version-level review findings including
format filtering gaps (.pkg/.msi/.deb/.dmg), build variant design
(Extra field for rocm/jetpack/fxdependent), and node multi-source issue
Node.js cache entries from custom sources (flutter, go, terraform, etc.)
use _filename (a path) instead of name. Add effectiveName() that falls
back to _filename basename, then download URL basename.
Eliminates phantom "empty name" diffs. Matches went from 8 to 12.
- cmd/comparecache: compares Go cache vs Node.js LIVE_cache at filename
level, categorizes differences (meta-filtering, version depth, source
tarballs, unsupported sources, real asset differences)
- COMPARISON.md: per-package checklist with 91 live packages categorized
- webicached: add -no-fetch flag to classify from existing raw data only
- GO_WEBI.md: update Phase 1 checkboxes for completed items