Commit Graph

1365 Commits

Author SHA1 Message Date
AJ ONeal
05abb1ffd2 fix(git): normalize .windows.N version suffix
Git for Windows uses tags like v2.53.0.windows.1. Node.js strips
".windows.1" and replaces ".windows.N" (N>1) with ".N".

Add NormalizeVersions to the git package and wire it into the classify
pipeline. Also add version normalization to comparecache so the
comparison uses canonical versions for both caches.

Remaining git diffs: data freshness (.windows.2 releases Go hasn't
fetched) and RC versions in Go that live doesn't have.
2026-03-10 18:26:41 -06:00
AJ ONeal
ada10ed43a fix(comparecache): filter GPU variant assets as known noise
rocm and jetpack variants are tagged by Go's variant system but kept
by Node.js with special arch names. Filter them from comparison noise
to avoid false positives.

Match count: 70/106
2026-03-10 18:22:37 -06:00
AJ ONeal
e1bfad0bb8 fix(git): filter to MinGit assets only, exclude busybox
Node.js releases.js only keeps MinGit assets and excludes busybox.
Add asset_filter and exclude to releases.conf to match.

Remaining diff: version normalization (.windows.N suffix stripping)
and data freshness (Go missing .windows.2 releases).
2026-03-10 18:21:19 -06:00
AJ ONeal
8f9cf8e487 fix: exclude known noise from cache comparison and configs
- Hugo: exclude Linux-64bit legacy filename alias
- Hugo-extended: exclude Linux-64bit legacy filename alias
- Gitea: exclude -src- and -docs- tarballs
- Pathman: exclude armv8 legacy alias
- UUID v7: exclude exotic architectures (thumb, armeb, loong, gnux32, risc)
- comparecache: filter bare executables and docs tarballs as noise,
  apply noise filter to both live and Go sides
- legacy.go: add .tar.bz2 to legacyFormats

Match count: 69/106 (up from 58)
2026-03-10 18:18:38 -06:00
AJ ONeal
2ebecb644e feat(gitea): add gogit variant tagger
Tag assets with "-gogit-" in the filename as the "gogit" variant.
These use a pure-Go Git backend instead of the default C Git library.
2026-03-10 18:08:19 -06:00
AJ ONeal
86e3d8f969 ref: extract classification pipeline into internal/classifypkg
Move all source-specific classifiers, variant tagging, config filtering,
and readAllRaw out of cmd/webicached into internal/classifypkg. The new
Package() function runs the full classify pipeline: source dispatch →
tag variants → apply config.

webicached now only handles fetching raw data and writing to fsstore.
The classification logic is reusable by comparecache and future tools.
2026-03-10 18:06:02 -06:00
AJ ONeal
c1b81157dc fix(gittag): produce correct filenames, versions, and format for git assets
- gittag classifier: use "{repo}-{tag}" filenames (matching Node.js),
  strip "v" prefix from version, synthesize date-based version for
  tagless repos (HEAD of master/main)
- GitHub source-only: use "git" format (no dot) and "{repo}-{tag}"
  filename for clone assets
- Legacy export: add "git" to recognized formats so gittag packages
  appear in the legacy cache
- Derives repo name from the git URL in releases.conf

vim-commentary now matches. vim-zig matches on format but has newer
data (expected — Go fetched more recently than Node.js).
2026-03-10 18:00:43 -06:00
AJ ONeal
72a8c56b13 fix(mariadb): skip source tarballs with OS="Source" or whitespace CPU
The MariaDB API returns OS="Source" and CPU=" " for source packages.
The previous check only tested for empty strings, missing these.
2026-03-10 17:44:26 -06:00
AJ ONeal
2b0b293728 feat(cache): add timing instrumentation to webicached and comparecache
Log classify/write/total per package in webicached, and
discover/compare/total in comparecache. Helps identify slow
packages as the dataset grows.
2026-03-10 17:42:50 -06:00
AJ ONeal
1412c7c374 fix(comparecache): filter _src. source tarballs from live cache noise 2026-03-10 17:37:12 -06:00
AJ ONeal
72fec20fb0 ref: move IsMetaAsset to classify package, share between tools
Moved isMetaAsset from cmd/webicached to classify.IsMetaAsset so
both webicached and comparecache use the same logic. Removed
duplicated isMetaFile from comparecache. The comparecache
isLiveNoise now delegates to classify.IsMetaAsset and adds
live-specific filters (.deb, .rpm, -src-).
2026-03-10 17:28:44 -06:00
AJ ONeal
f101037dfd fix: restore checksums/sha256sum/sha512sum substring filters
These are exact filenames with no extension — .txt doesn't catch them.
2026-03-10 17:25:55 -06:00
AJ ONeal
9247de98d2 fix: filter all .txt files as non-installable meta assets
.txt files are never installable (checksums, release notes, etc.).
Filter them generically instead of matching specific patterns.
2026-03-10 17:25:06 -06:00
AJ ONeal
65ab0f9c1f feat(comparecache): filter source tarballs (-src-) from live cache noise 2026-03-10 17:23:58 -06:00
AJ ONeal
3f1f909005 fix: use repo-tag as filename for source tarballs (drop owner prefix) 2026-03-10 17:19:24 -06:00
AJ ONeal
19de4c3caa fix: use tag as filename for source tarballs, add TODO for HEAD lookup
Drop the Owner-Repo prefix from source tarball filenames — the
actual download name comes from Content-Disposition. Added TODO
to resolve the full filename via HEAD at fetch time.
2026-03-10 17:19:05 -06:00
AJ ONeal
2bd1537e9c feat: add .git asset for source-only GitHub releases
Source-only releases (no uploaded assets) now also emit a .git
asset with the GitHub clone URL, matching how gittag-sourced
packages like vim-commentary and vim-zig work. This allows
install via git clone --branch <tag> as an alternative to
downloading the tarball.
2026-03-10 17:18:21 -06:00
AJ ONeal
d56f43e3b4 fix: use API URLs for source tarballs, match legacy filename pattern
Source-only GitHub releases now use the API-provided tarball_url
and zipball_url directly. Filename follows the legacy pattern
(Owner-Repo-Tag.ext) to approximate the Content-Disposition
filename that Node.js gets by following the redirect.
2026-03-10 17:17:35 -06:00
AJ ONeal
4a9088fea7 fix: use GitHub API tarball/zipball URLs instead of constructing them
Source-only releases now use the API-provided tarball_url and
zipball_url directly instead of guessing the archive URL format.
The filename uses the git tag, and the download URL is what
GitHub's API actually returns.
2026-03-10 17:13:04 -06:00
AJ ONeal
9f3e9445dc ref(mariadb): split galera into its own installer
Exclude galera assets from mariadb via `exclude = galera`.
Create mariadb-galera with `asset_filter = galera` to serve
those assets separately.
2026-03-10 17:04:17 -06:00
AJ ONeal
cba699a952 fix(node): only tag bare .exe as variant, not .msi/.pkg
.msi and .pkg are standard package formats that the extension
already identifies. Only the bare node.exe (no npm) needs a
variant tag to exclude it.
2026-03-10 17:03:04 -06:00
AJ ONeal
c45e54a69b fix(node): use format-specific variant names instead of "installer"
.msi and .pkg are package formats we can extract from, not GUI
installers. Use "msi" and "pkg" as variant names to reflect that.
2026-03-10 17:02:06 -06:00
AJ ONeal
68ecaf2fbc fix(node): tag .pkg as installer variant alongside .msi
The macOS .pkg is a pkgutil installer, not a plain archive.
Tagged as installer so it's excluded from legacy export but
available for Go's native installer support.
2026-03-10 17:01:47 -06:00
AJ ONeal
cdec995183 ref(node): remove node-official/node-unofficial split packages
The node package already merges both sources via unofficial_url
in releases.conf. The split packages were a workaround that
produced cache files not present in the live Node.js cache.
2026-03-10 16:58:26 -06:00
AJ ONeal
81a6400f4f feat(comparecache): pre-filter .deb/.rpm/meta/sigs from Node.js cache
Strips known noise from the live cache before comparison: .deb, .rpm,
.asc, .sig, .gpg, .sbom, .sha256, checksums, install.sh, install.ps1,
.txt, and other non-installable files. Matches went from 16 to 50.
2026-03-10 16:53:32 -06:00
AJ ONeal
7550020299 feat(comparecache): add -diffs flag to skip matching packages 2026-03-10 16:48:56 -06:00
AJ ONeal
755fa7f594 feat(comparecache): add -windowed flag for version-scoped comparison
Uses Node.js version range (2nd to 2nd-to-last) as the window.
All Node.js versions in the window are included so missing Go
versions/assets are visible. Go-only versions are hidden since
those are just deeper fetch history, not real gaps.
2026-03-10 16:45:40 -06:00
AJ ONeal
dae987376e test(resolve): restore platform expectations, document upstream gaps
shellcheck has no Windows builds, xz has no arm64 builds — these are
real upstream gaps that the test suite now surfaces as failures rather
than silently excluding. 891 pass, 2 known upstream gaps.
2026-03-10 16:07:11 -06:00
AJ ONeal
37ea9a4227 feat(resolve): 895 tests passing across 103 real packages
Resolver fixes:
- Accept "*" as ANYARCH (legacy cache uses "*" for universal builds)
- Accept bare binaries (empty format) as last-resort format match
- POSIX/ANYOS/ANYARCH matching (from previous commit)

Test suite covers:
- All 103 cache packages × 8 platforms (darwin/linux/windows × arches)
- 18 known packages with mandatory platform expectations
- Version constraint pinning (bat@0.25, node@20, etc.)
- Arch fallback (Rosetta 2, Windows ARM64, micro-arch)
- POSIX package resolution (aliasman, pathman, serviceman)
- Libc preference (musl/gnu/none)
- Format preference cascading
- Base-over-variant preference
2026-03-10 15:17:52 -06:00
AJ ONeal
f779e240fd feat(resolve): add POSIX/ANYOS/ANYARCH matching and test coverage
The resolver now handles:
- ANYOS assets match any query OS
- posix_2017/posix_2024 assets match any non-Windows OS
- ANYARCH assets match any query architecture (ranked below specific)

14 tests covering: exact match, version constraints, arch fallback
(Rosetta 2, Windows ARM64, micro-arch), format preference, libc
filtering, base-over-variant preference, POSIX/ANYOS/ANYARCH fallback,
Survey catalog, and no-match.
2026-03-10 15:06:19 -06:00
AJ ONeal
7e134ead87 fix(yq): use exclude in releases.conf instead of variant tagger
Man pages aren't a variant — they're just assets we don't install.
The exclude key in releases.conf is the right place for this.
2026-03-10 14:59:25 -06:00
AJ ONeal
6eeed80610 fix: separate general vs installer-specific vs legacy filters
- yq: move man_page_only from general isMetaAsset to yq-specific tagger
- node: restore .exe as stored asset with "bare-exe" variant (installable
  by Go, excluded from legacy)
- ollama: rename Ollama-darwin.zip variant from "installer" to "app"
  (.app bundle is installable by Go, just not by legacy Node.js)

The distinction: general classification/filter (isMetaAsset) handles
truly non-installable files. Installer-specific taggers handle assets
that are installable but need variant tagging. Legacy filter strips
variants and unsupported formats for Node.js compat.
2026-03-10 14:58:37 -06:00
AJ ONeal
d8ecac6d6a fix: normalize .tgz to .tar.gz in display filenames
Node.js normalizes .tgz extensions to .tar.gz in the cache name field
while keeping the real .tgz URL in download. Match this behavior so
legacy export filenames are consistent. Affects ollama-darwin.tgz and
any other packages using .tgz.
2026-03-10 14:49:48 -06:00
AJ ONeal
99159d748c fix: ollama installer tag, yq/ffmpeg meta detection, ffmpeg asset_filter
- ollama: Ollama-darwin.zip (macOS .app) tagged as installer variant
- isMetaAsset: add man_page_only, .LICENSE, .README patterns
- ffmpeg: asset_filter=ffmpeg excludes ffprobe/ffplay/LICENSE/README
- uuidv7: exotic arches are correct, marked as known-acceptable
2026-03-10 14:47:01 -06:00
AJ ONeal
878009e5aa fix(node): skip nodedist "exe" format code — no real download exists
Node.js index lists "win-x64-exe" but there's no .exe file on the
download server. The MSI installer (separate "msi" entry) is the actual
Windows installer. The "exe" entry was generating a phantom filename.
2026-03-10 14:44:39 -06:00
AJ ONeal
b408b42464 feat: add asset_filter to releases.conf, fix kubectx/kubens split
asset_filter is a substring that asset filenames must contain. Used when
multiple packages share a GitHub release (kubectx/kubens both come from
ahmetb/kubectx). Added as a first-class Conf field and applied in
webicached's applyConfig.
2026-03-10 14:42:37 -06:00
AJ ONeal
34dcc6c148 fix(git): tag busybox and pdbs-for-git assets as variants
MinGit-busybox is a stripped-down MinGit using busybox instead of MSYS2.
pdbs-for-git-* filenames weren't caught by the existing "-pdb" check.
Both are now tagged as variants and excluded from legacy export.
2026-03-10 14:40:38 -06:00
AJ ONeal
37d6474675 fix(fish): tag source tarball as variant, exclude from legacy export
fish-{version}.tar.xz is an uploaded source tarball with no OS/arch in
the filename. GitHub API doesn't distinguish it from binaries. Tag assets
with no OS and no arch as "source" variant so they're filtered from
legacy export. The linux .tar.xz binaries classify correctly and are
kept — Node.js just doesn't have them yet.
2026-03-10 14:39:25 -06:00
AJ ONeal
5d316334c8 fix(bun): baseline serves as legacy amd64, non-baseline tagged as v3 variant
Baseline builds (-baseline suffix) are plain x86_64 and match what Node.js
serves. Strip -baseline from Filename (keep in Download URL) so legacy
export sees a clean name. Non-baseline builds get Arch: x86_64_v3 and
Variants: ["v3"], excluding them from legacy output.
2026-03-10 14:19:56 -06:00
AJ ONeal
a1714e0598 update comparison after variant tagging and legacy filter
Add .tar.bz2 to classifier format detection (was slipping through
as empty format). Update COMPARISON.md with fresh results: 21 exact
matches, .deb/.rpm/.tar.zst/.tar.bz2 now correctly filtered from
legacy export. Document remaining items for review.
2026-03-10 14:04:00 -06:00
AJ ONeal
8ce911ade8 feat: legacy export filter for variants and unsupported formats
ExportLegacy now skips assets with non-empty Variants (installer,
rocm, fxdependent, etc.) and formats Node.js doesn't handle (.deb,
.rpm, .snap, .appx, .tar.zst, .tar.bz2, .7z). This ensures the
_cache/ JSON files are compatible with the legacy Node.js server.

Also fix test data to use dotted format strings (.tar.gz) matching
what the classifier actually produces.
2026-03-10 13:59:42 -06:00
AJ ONeal
6687cad126 ref: simplify variant taggers to plain functions with switch dispatch
Drop VariantTagger interface and map-based lookup. Each per-installer
package now exports a plain TagVariants function. webicached dispatches
via a switch on package name, consistent with fetchRaw and
classifyPackage.
2026-03-10 13:54:03 -06:00
AJ ONeal
9cb9ffc4c6 ref: extract variant taggers to per-installer packages
Move variant detection logic from inline functions in webicached to
per-installer packages (internal/releases/{bun,fish,git,lsd,node,
ollama,pwsh,xcaddy}). Each exports a Tagger implementing the new
storage.VariantTagger interface. webicached uses an explicit map
of package name → tagger, no magic registration.
2026-03-10 13:35:32 -06:00
AJ ONeal
39c136caa3 feat: whitespace-delimited releases.conf, variant tagging
- Switch installerconf parser from comma to whitespace delimiters
- Add asset_exclude as alias for exclude (fixes hugo)
- Add variants key (documentation cue, detection in Go code)
- Add per-package variant taggers: bun (profile, amd64v3 arch),
  pwsh (fxdependent), ollama (rocm, jetpack5/6), git (installer),
  node (msi installer), lsd (deb, msvc), fish (pkg), xcaddy (deb)
- Update releases.conf files with variant declarations
2026-03-10 13:30:33 -06:00
AJ ONeal
d229eb618d update COMPARISON.md with fresh shallow-fetch results
28 exact matches at latest version (up from 12). Reorganize by
difference category. Update action items to reflect current design
(Variants field, format handling, node multi-source fix).
2026-03-10 13:02:22 -06:00
AJ ONeal
f4e816606f chore: gitignore LIVE_cache symlink 2026-03-10 12:59:54 -06:00
AJ ONeal
f441a3bf8c ref(webicached): extract WebiCache struct, add -shallow flag
Extract shared state (store, client, auth, rawDir, config flags) into
a WebiCache struct. Convert refreshPackage, fetchRaw, and paginated
fetchers (github, gitea, gittag, nodedist) to methods.

Add -shallow flag: fetches only the first page of releases from
paginated sources. Single-index sources (nodedist, chromedist, etc.)
are always complete in one request.
2026-03-10 12:57:50 -06:00
AJ ONeal
d1016eb589 add Variants []string to Asset and Dist, keep Extra for version info
Extra is for version-related sort metadata (build numbers, etc.).
Variants captures build qualifiers like "rocm", "jetpack5",
"fxdependent", "installer" — things the resolver should skip by
default unless explicitly requested.

Also update format classification docs: most formats (.pkg, .deb,
.dmg, .msi) are extractable — only .exe is ambiguous and needs
the "installer" variant tag when it's not the actual binary.
2026-03-10 12:51:11 -06:00
AJ ONeal
27950420dc doc(GO_WEBI): installer formats tagged not dropped, store everything
Installer formats (.pkg, .msi, .deb, etc.) get Extra="installer"
rather than being filtered at classification time. The resolver
skips them by default but the full API can still serve them.
2026-03-10 12:42:42 -06:00
AJ ONeal
84c943b160 feat(node): merge official + unofficial builds into single cache
Add unofficial_url to node/releases.conf and update the nodedist
fetcher/classifier to fetch from both URLs. Raw entries are stored
with "official/" or "unofficial/" tag prefixes so they don't overwrite
each other. The classifier picks the correct base URL from the prefix.

This matches the Node.js releases.js behavior which merges both sources,
adding musl, riscv64, loong64, and 7z builds from unofficial.
2026-03-10 12:35:18 -06:00