Commit Graph

72 Commits

Author SHA1 Message Date
AJ ONeal
44721b9aa8 fix(postgres/psql): normalize REL_17_0 tag format to 17.0
Strip REL_ prefix and convert underscores to dots in a per-package
normalizer rather than config, matching the convention for watchexec.
2026-03-11 00:41:41 -06:00
AJ ONeal
c173873bac fix(pwsh): tag win-version-specific and AppImage builds as variants
Early PowerShell releases (pre-6.1) used Windows-version-specific
filenames (win10-win2016, win81-win2012r2) that the legacy cache
can't resolve. Tag them as variants so they're filtered from legacy
export but preserved for future Go resolver use.
2026-03-11 00:13:29 -06:00
AJ ONeal
ec30b34241 fix(gittag): use HEAD-{date} format for tagless repos
Avoids HEAD date-versions (2024.06.08) sorting ahead of real semver
tags (v1.2) since they measure different things.
2026-03-11 00:10:30 -06:00
AJ ONeal
795fff1bb4 fix(iterm2dist): fix version extraction for preview releases and deduplicate URLs
The regex captured the beta/preview number but not the keyword itself,
so "3.0.0-preview" collapsed to "3.0.0". Also deduplicate by version
since the downloads page has duplicate links with different URL formats
(e.g. iTerm2-3_5_1beta1.zip and iTerm2-3_5_1_beta1.zip).
2026-03-10 23:59:11 -06:00
AJ ONeal
f53c508303 style: one entry per line in map/slice literals
Put each entry on its own line for readability — no staggering
multiple entries per line.
2026-03-10 23:29:22 -06:00
AJ ONeal
b8c67491fe feat: resolve alias_of in cache pipeline
Packages with alias_of in releases.conf (e.g. dashd → dashcore,
golang → go) now get symlinked cache files so they resolve to the
same JSON as their target. 13 aliases total.

Added AliasOf as a proper field in installerconf.Conf, LinkAlias
method to fsstore, and alias handling in webicached's Run loop.
2026-03-10 23:28:36 -06:00
AJ ONeal
f36e734539 fix: infer release channel from version string
GitHub's prerelease boolean is often not set for rc/beta/alpha/dev/pre
releases. Add channelFromVersion() to detect these from the version
string as a fallback. Applied to github, gitea, gittag, and hashicorp
classifiers. Hashicorp's inline checks replaced with the shared helper.

-pre maps to beta (prerelease), -preview stays preview.
2026-03-10 23:18:11 -06:00
AJ ONeal
f963b35e01 ref(watchexec): move cli- prefix stripping from config to code
The cli- prefix is a watchexec-specific monorepo artifact, not a generic
config concern. Move it to internal/releases/watchexec/versions.go
alongside other per-package normalizers (git, lf).
2026-03-10 23:11:14 -06:00
AJ ONeal
07d5f36ed4 fix: postgres/psql cross-contamination, watchexec tag filter, meta assets
- postgres/psql: add asset_filter to separate assets from shared repo
  (bnnanet/postgresql-releases contains postgres-*, postgresql-*, psql-*)
- watchexec: change tag_prefix to version_prefixes so old plain-tagged
  releases (v1.20.6+) aren't filtered out — only strip the cli- prefix
- classify: add .minisig, b3sums, dist-manifest.json to IsMetaAsset
  filter to prevent checksum/signature files from leaking into cache
2026-03-10 18:56:19 -06:00
AJ ONeal
7e22ba01a0 fix: ffmpeg version prefix, .gz legacy format, iterm2 regex
- ffmpeg: add version_prefix = b to strip 'b' from tags (b6.0 → 6.0)
- legacy.go: add .gz to legacyFormats for bare gzipped binaries
- iterm2: broaden regex to handle preview/beta variants, skip empty
  versions

Match count: 75/106
2026-03-10 18:35:51 -06:00
AJ ONeal
2d01a1cf54 fix: jq version prefix, watchexec monorepo tag filter
- jq: add version_prefixes = jq- to strip jq- from version strings
- watchexec: add tag_prefix = cli- to filter monorepo tags correctly
- classifyGitHub: skip tags not matching tag_prefix in monorepos
- comparecache: add watchexec version normalization

Match count: 74/106
2026-03-10 18:33:26 -06:00
AJ ONeal
a4e9f875cd fix(go): pad versions to 3 parts, filter -arm6. oddity
Node.js pads Go versions like "1.10" to "1.10.0". Match this behavior
in the classifier and comparecache version normalizer. Also filter
-arm6. malformed arch and .src. source tarballs from comparison noise.

Match count: 73/106
2026-03-10 18:30:57 -06:00
AJ ONeal
56a8a8ea71 fix(fish): add .app.zip to legacy formats, exclude noise assets
- Add .app.zip to legacyFormats so macOS fish builds export correctly
- Exclude bundledpcre, fish-static, OpenBeta from fish/releases.conf
- Add fish Linux binaries to comparecache noise (Go improvement)

Match count: 72/106
2026-03-10 18:29:35 -06:00
AJ ONeal
13798de1b0 fix(lf): normalize rN version tags to 0.N.0
lf uses tags like "r21", "r33". Node.js converts these to "0.21.0".
Add version normalization in both classifier and comparecache.

Match count: 71/106
2026-03-10 18:28:13 -06:00
AJ ONeal
05abb1ffd2 fix(git): normalize .windows.N version suffix
Git for Windows uses tags like v2.53.0.windows.1. Node.js strips
".windows.1" and replaces ".windows.N" (N>1) with ".N".

Add NormalizeVersions to the git package and wire it into the classify
pipeline. Also add version normalization to comparecache so the
comparison uses canonical versions for both caches.

Remaining git diffs: data freshness (.windows.2 releases Go hasn't
fetched) and RC versions in Go that live doesn't have.
2026-03-10 18:26:41 -06:00
AJ ONeal
8f9cf8e487 fix: exclude known noise from cache comparison and configs
- Hugo: exclude Linux-64bit legacy filename alias
- Hugo-extended: exclude Linux-64bit legacy filename alias
- Gitea: exclude -src- and -docs- tarballs
- Pathman: exclude armv8 legacy alias
- UUID v7: exclude exotic architectures (thumb, armeb, loong, gnux32, risc)
- comparecache: filter bare executables and docs tarballs as noise,
  apply noise filter to both live and Go sides
- legacy.go: add .tar.bz2 to legacyFormats

Match count: 69/106 (up from 58)
2026-03-10 18:18:38 -06:00
AJ ONeal
2ebecb644e feat(gitea): add gogit variant tagger
Tag assets with "-gogit-" in the filename as the "gogit" variant.
These use a pure-Go Git backend instead of the default C Git library.
2026-03-10 18:08:19 -06:00
AJ ONeal
86e3d8f969 ref: extract classification pipeline into internal/classifypkg
Move all source-specific classifiers, variant tagging, config filtering,
and readAllRaw out of cmd/webicached into internal/classifypkg. The new
Package() function runs the full classify pipeline: source dispatch →
tag variants → apply config.

webicached now only handles fetching raw data and writing to fsstore.
The classification logic is reusable by comparecache and future tools.
2026-03-10 18:06:02 -06:00
AJ ONeal
c1b81157dc fix(gittag): produce correct filenames, versions, and format for git assets
- gittag classifier: use "{repo}-{tag}" filenames (matching Node.js),
  strip "v" prefix from version, synthesize date-based version for
  tagless repos (HEAD of master/main)
- GitHub source-only: use "git" format (no dot) and "{repo}-{tag}"
  filename for clone assets
- Legacy export: add "git" to recognized formats so gittag packages
  appear in the legacy cache
- Derives repo name from the git URL in releases.conf

vim-commentary now matches. vim-zig matches on format but has newer
data (expected — Go fetched more recently than Node.js).
2026-03-10 18:00:43 -06:00
AJ ONeal
72fec20fb0 ref: move IsMetaAsset to classify package, share between tools
Moved isMetaAsset from cmd/webicached to classify.IsMetaAsset so
both webicached and comparecache use the same logic. Removed
duplicated isMetaFile from comparecache. The comparecache
isLiveNoise now delegates to classify.IsMetaAsset and adds
live-specific filters (.deb, .rpm, -src-).
2026-03-10 17:28:44 -06:00
AJ ONeal
cba699a952 fix(node): only tag bare .exe as variant, not .msi/.pkg
.msi and .pkg are standard package formats that the extension
already identifies. Only the bare node.exe (no npm) needs a
variant tag to exclude it.
2026-03-10 17:03:04 -06:00
AJ ONeal
c45e54a69b fix(node): use format-specific variant names instead of "installer"
.msi and .pkg are package formats we can extract from, not GUI
installers. Use "msi" and "pkg" as variant names to reflect that.
2026-03-10 17:02:06 -06:00
AJ ONeal
68ecaf2fbc fix(node): tag .pkg as installer variant alongside .msi
The macOS .pkg is a pkgutil installer, not a plain archive.
Tagged as installer so it's excluded from legacy export but
available for Go's native installer support.
2026-03-10 17:01:47 -06:00
AJ ONeal
dae987376e test(resolve): restore platform expectations, document upstream gaps
shellcheck has no Windows builds, xz has no arm64 builds — these are
real upstream gaps that the test suite now surfaces as failures rather
than silently excluding. 891 pass, 2 known upstream gaps.
2026-03-10 16:07:11 -06:00
AJ ONeal
37ea9a4227 feat(resolve): 895 tests passing across 103 real packages
Resolver fixes:
- Accept "*" as ANYARCH (legacy cache uses "*" for universal builds)
- Accept bare binaries (empty format) as last-resort format match
- POSIX/ANYOS/ANYARCH matching (from previous commit)

Test suite covers:
- All 103 cache packages × 8 platforms (darwin/linux/windows × arches)
- 18 known packages with mandatory platform expectations
- Version constraint pinning (bat@0.25, node@20, etc.)
- Arch fallback (Rosetta 2, Windows ARM64, micro-arch)
- POSIX package resolution (aliasman, pathman, serviceman)
- Libc preference (musl/gnu/none)
- Format preference cascading
- Base-over-variant preference
2026-03-10 15:17:52 -06:00
AJ ONeal
f779e240fd feat(resolve): add POSIX/ANYOS/ANYARCH matching and test coverage
The resolver now handles:
- ANYOS assets match any query OS
- posix_2017/posix_2024 assets match any non-Windows OS
- ANYARCH assets match any query architecture (ranked below specific)

14 tests covering: exact match, version constraints, arch fallback
(Rosetta 2, Windows ARM64, micro-arch), format preference, libc
filtering, base-over-variant preference, POSIX/ANYOS/ANYARCH fallback,
Survey catalog, and no-match.
2026-03-10 15:06:19 -06:00
AJ ONeal
7e134ead87 fix(yq): use exclude in releases.conf instead of variant tagger
Man pages aren't a variant — they're just assets we don't install.
The exclude key in releases.conf is the right place for this.
2026-03-10 14:59:25 -06:00
AJ ONeal
6eeed80610 fix: separate general vs installer-specific vs legacy filters
- yq: move man_page_only from general isMetaAsset to yq-specific tagger
- node: restore .exe as stored asset with "bare-exe" variant (installable
  by Go, excluded from legacy)
- ollama: rename Ollama-darwin.zip variant from "installer" to "app"
  (.app bundle is installable by Go, just not by legacy Node.js)

The distinction: general classification/filter (isMetaAsset) handles
truly non-installable files. Installer-specific taggers handle assets
that are installable but need variant tagging. Legacy filter strips
variants and unsupported formats for Node.js compat.
2026-03-10 14:58:37 -06:00
AJ ONeal
99159d748c fix: ollama installer tag, yq/ffmpeg meta detection, ffmpeg asset_filter
- ollama: Ollama-darwin.zip (macOS .app) tagged as installer variant
- isMetaAsset: add man_page_only, .LICENSE, .README patterns
- ffmpeg: asset_filter=ffmpeg excludes ffprobe/ffplay/LICENSE/README
- uuidv7: exotic arches are correct, marked as known-acceptable
2026-03-10 14:47:01 -06:00
AJ ONeal
b408b42464 feat: add asset_filter to releases.conf, fix kubectx/kubens split
asset_filter is a substring that asset filenames must contain. Used when
multiple packages share a GitHub release (kubectx/kubens both come from
ahmetb/kubectx). Added as a first-class Conf field and applied in
webicached's applyConfig.
2026-03-10 14:42:37 -06:00
AJ ONeal
34dcc6c148 fix(git): tag busybox and pdbs-for-git assets as variants
MinGit-busybox is a stripped-down MinGit using busybox instead of MSYS2.
pdbs-for-git-* filenames weren't caught by the existing "-pdb" check.
Both are now tagged as variants and excluded from legacy export.
2026-03-10 14:40:38 -06:00
AJ ONeal
37d6474675 fix(fish): tag source tarball as variant, exclude from legacy export
fish-{version}.tar.xz is an uploaded source tarball with no OS/arch in
the filename. GitHub API doesn't distinguish it from binaries. Tag assets
with no OS and no arch as "source" variant so they're filtered from
legacy export. The linux .tar.xz binaries classify correctly and are
kept — Node.js just doesn't have them yet.
2026-03-10 14:39:25 -06:00
AJ ONeal
5d316334c8 fix(bun): baseline serves as legacy amd64, non-baseline tagged as v3 variant
Baseline builds (-baseline suffix) are plain x86_64 and match what Node.js
serves. Strip -baseline from Filename (keep in Download URL) so legacy
export sees a clean name. Non-baseline builds get Arch: x86_64_v3 and
Variants: ["v3"], excluding them from legacy output.
2026-03-10 14:19:56 -06:00
AJ ONeal
a1714e0598 update comparison after variant tagging and legacy filter
Add .tar.bz2 to classifier format detection (was slipping through
as empty format). Update COMPARISON.md with fresh results: 21 exact
matches, .deb/.rpm/.tar.zst/.tar.bz2 now correctly filtered from
legacy export. Document remaining items for review.
2026-03-10 14:04:00 -06:00
AJ ONeal
8ce911ade8 feat: legacy export filter for variants and unsupported formats
ExportLegacy now skips assets with non-empty Variants (installer,
rocm, fxdependent, etc.) and formats Node.js doesn't handle (.deb,
.rpm, .snap, .appx, .tar.zst, .tar.bz2, .7z). This ensures the
_cache/ JSON files are compatible with the legacy Node.js server.

Also fix test data to use dotted format strings (.tar.gz) matching
what the classifier actually produces.
2026-03-10 13:59:42 -06:00
AJ ONeal
6687cad126 ref: simplify variant taggers to plain functions with switch dispatch
Drop VariantTagger interface and map-based lookup. Each per-installer
package now exports a plain TagVariants function. webicached dispatches
via a switch on package name, consistent with fetchRaw and
classifyPackage.
2026-03-10 13:54:03 -06:00
AJ ONeal
9cb9ffc4c6 ref: extract variant taggers to per-installer packages
Move variant detection logic from inline functions in webicached to
per-installer packages (internal/releases/{bun,fish,git,lsd,node,
ollama,pwsh,xcaddy}). Each exports a Tagger implementing the new
storage.VariantTagger interface. webicached uses an explicit map
of package name → tagger, no magic registration.
2026-03-10 13:35:32 -06:00
AJ ONeal
39c136caa3 feat: whitespace-delimited releases.conf, variant tagging
- Switch installerconf parser from comma to whitespace delimiters
- Add asset_exclude as alias for exclude (fixes hugo)
- Add variants key (documentation cue, detection in Go code)
- Add per-package variant taggers: bun (profile, amd64v3 arch),
  pwsh (fxdependent), ollama (rocm, jetpack5/6), git (installer),
  node (msi installer), lsd (deb, msvc), fish (pkg), xcaddy (deb)
- Update releases.conf files with variant declarations
2026-03-10 13:30:33 -06:00
AJ ONeal
d1016eb589 add Variants []string to Asset and Dist, keep Extra for version info
Extra is for version-related sort metadata (build numbers, etc.).
Variants captures build qualifiers like "rocm", "jetpack5",
"fxdependent", "installer" — things the resolver should skip by
default unless explicitly requested.

Also update format classification docs: most formats (.pkg, .deb,
.dmg, .msi) are extractable — only .exe is ambiguous and needs
the "installer" variant tag when it's not the actual binary.
2026-03-10 12:51:11 -06:00
AJ ONeal
a553b0f407 feat: add storage interface and fsstore implementation
storage.Store is the read/write interface for release asset storage.
storage.Asset uses correct terminology (Filename, Format) internally.
storage.LegacyAsset / LegacyCache preserve the Node.js wire format
("releases", "name", "ext") for backward compatibility.

fsstore writes to _cache/YYYY-MM/{pkg}.json with atomic rename,
matching the existing Node.js layout. The Node.js server can read
files written by Go and vice versa.
2026-03-10 10:53:19 -06:00
AJ ONeal
8b9d101132 ref(installerconf): make VersionPrefixes a list, not a single string
Tag conventions can change across versions of the same project
(e.g. "jq-1.7.1" → bare "1.8.0"). A comma-separated list lets
the config express all historical prefixes. The parser tries each
in order and strips the first match.

Back-compat: singular "version_prefix" still works (parsed as a
single-element list).
2026-03-10 10:46:47 -06:00
AJ ONeal
8cdc00b2d8 ref(installerconf): use typed struct instead of string map
Conf is now a plain struct with typed fields (Source, Owner, Repo,
TagPrefix, VersionPrefix, Exclude, BaseURL) instead of a generic
map[string]string with accessor methods. Unrecognized keys go into
an Extra map for forward compatibility.

Config stays flat key=value — covers the common patterns (simple
github, version prefix stripping, monorepo tag prefix, filename
exclusions). Complex cases belong in Go code, not config.
2026-03-10 10:42:37 -06:00
AJ ONeal
3626a04a48 feat: add UA analysis tool and fix uadetect gaps from live data
Add cmd/uaparse — analyzes User-Agent strings from webi.sh logs,
deduplicates by (os, arch, libc), extracts platform hints (cloud
provider, container runtime, distro), and flags malformed UAs.

Fix uadetect issues discovered by running against 2,186 live UAs:
- Msys/MINGW/Cygwin now correctly detected as Windows (was Linux)
- FreeBSD detection added
- s390x and riscv64 arch detection added
- WSL libc no longer falsely detected as MSVC ("microsoft" in kernel
  version string was triggering the MSVC check)
2026-03-10 10:24:26 -06:00
AJ ONeal
8aeda55e3b feat: add resolve package and end-to-end test
internal/resolve: picks the best release for a platform query.
Handles arch compatibility fallbacks (Rosetta 2, Windows ARM64
emulation, amd64 micro-arch levels), format preferences, variant
filtering (prefers base over rocm/jetpack GPU variants), and
universal (arch-less) binaries.

cmd/e2etest: fetches releases for goreleaser, ollama, and node,
classifies them, resolves for 9 test queries across linux/darwin/
windows x86_64/arm64, then compares against the live webi.sh API.

Results: 8/9 exact match, 1 warn where the Go resolver is more
correct than the live API (ollama arm64 base vs jetpack variant).

Edge cases fixed during development:
- .tgz is a valid archive format (not npm metadata)
- Empty arch in filename = universal binary (ranked below native)
- GPU variants (rocm, jetpack) ranked below base binaries
2026-03-10 10:09:32 -06:00
AJ ONeal
28dab7dade feat: complete classification of all 116 packages (169,867 rows)
- Add asset_filter/asset_exclude conf keys for shared-repo packages
- Split hugo/hugo-extended: exclude/require "extended" in asset name
- Add macosx, ia32, .snap, .appx classifier patterns
- Fix zig Platform.Size JSON string type (was int64, upstream sends string)
- Filter install scripts, cosign keys, compat.json as meta-assets
- Add riscv64, loong64, armv5, mipsle, mips64le to buildmeta

Full classification produces 169,867 distributable rows across 116 packages.
2026-03-10 00:27:57 -06:00
AJ ONeal
e78a721b51 fix: infer macOS from .app.zip/.dmg, filter npm tarballs and .d.ts
- .app.zip and .dmg formats now infer darwin OS when absent
- Filter .tgz (npm packages) and .d.ts (TypeScript defs) as meta-assets
- Reduces bun false positives by 64, deno by 294
2026-03-10 00:24:15 -06:00
AJ ONeal
f7a6db53b3 fix: zig platform data lost in cache, expand classifier coverage
- Fix zig Platform.Size type: string in upstream JSON (json.Number)
- Fix zig Platforms json tag: was "-" (dropped in cache), now serializes
- Add riscv64, loong64, armv5 archs to buildmeta and classifier
- Add mipsle, mips64le arch detection patterns
- Add plan9 OS detection
- Add "mac" (word boundary) → darwin OS detection
- Add armhf → armv7, arm7 → armv7 patterns
- Infer Linux from .deb/.rpm format when OS absent
- Filter source archives and buildable-artifact meta-assets

Batch 2 tested: zig (246), flutter (2082), chromedriver (10300),
terraform (5550), julia (1783), iterm2 (262), mariadb (207), gpg (45)
serviceman/aliasman: 0 (source-only, no binary assets)
2026-03-10 00:22:33 -06:00
AJ ONeal
d398625f5d feat: add cmd/classify and improve classifier coverage
- Add cmd/classify: reads raw cached releases and produces a CSV of all
  distributables with sortable version columns (ver_major/minor/patch/pre)
- Export rawcache.ActivePath() for use by cmd/classify
- Add OS detection: openbsd, netbsd, dragonflybsd, plan9, mac→darwin
- Add arch detection: armv5, armhf→armv7, arm7→armv7, 386→x86,
  32bit/64bit (no hyphen), universal→universal2, riscv64, loong64,
  mipsle, mips64le
- Infer Linux from .deb/.rpm format when OS not in filename
- Add .deb and .rpm as recognized formats
- Normalize all per-source values to buildmeta vocabulary (x86_64, aarch64)
- Filter source archives and buildable-artifact meta-assets
- Add CAT-RULES.md tracking classifier learnings
- Add CATEGORIZED.md and LINKS.md for reference

Batch 1 tested: go, node, hugo, caddy, pathman (35,919 rows)
2026-03-10 00:17:17 -06:00
AJ ONeal
7f0c92e262 add releases.conf for all remaining packages and wire new fetchers
New fetcher packages:
- chromedist: Chrome for Testing API (googlechromelabs.github.io)
- gpgdist: SourceForge RSS for GPG macOS
- mariadbdist: MariaDB downloads REST API

New releases.conf files for:
- GitHub: aliasman, awless, duckdns.sh, hugo-extended, kubens, rg, postgres
- gittag: vim-commentary, vim-zig
- gitea: pathman
- chromedist: chromedriver
- gpgdist: gpg
- mariadbdist: mariadb
- nodedist: node

Alias support (alias_of key):
- golang → go, dashd → dashcore, psql → postgres, zig.vim → vim-zig
- Aliases skip fetching and share cache with their target

Every package with a releases.js now has a releases.conf (except the
dead macos package). fetchraw dispatches to all 13 source types.
2026-03-09 22:48:11 -06:00
AJ ONeal
990221454e add fetchers for non-GitHub release sources
New fetcher packages:
- golang: golang.org/dl/?mode=json&include=all
- zigdist: ziglang.org/download/index.json
- flutterdist: Google Storage per-OS release indexes
- iterm2dist: scrapes iterm2.com/downloads.html
- hashicorp: releases.hashicorp.com/{product}/index.json
- juliadist: julialang-s3.julialang.org/bin/versions.json

Each follows the same iter.Seq2 pattern as the existing nodedist/github
fetchers. Added releases.conf files for all six packages and wired them
into cmd/fetchraw.

Fixed latest-version detection for sources that return unordered data
(hashicorp, zigdist, juliadist) by comparing all versions with lexver
instead of taking the first stable one found.
2026-03-09 22:39:16 -06:00