Commit Graph

33 Commits

Author SHA1 Message Date
AJ ONeal
a553b0f407 feat: add storage interface and fsstore implementation
storage.Store is the read/write interface for release asset storage.
storage.Asset uses correct terminology (Filename, Format) internally.
storage.LegacyAsset / LegacyCache preserve the Node.js wire format
("releases", "name", "ext") for backward compatibility.

fsstore writes to _cache/YYYY-MM/{pkg}.json with atomic rename,
matching the existing Node.js layout. The Node.js server can read
files written by Go and vice versa.
2026-03-10 10:53:19 -06:00
AJ ONeal
8b9d101132 ref(installerconf): make VersionPrefixes a list, not a single string
Tag conventions can change across versions of the same project
(e.g. "jq-1.7.1" → bare "1.8.0"). A comma-separated list lets
the config express all historical prefixes. The parser tries each
in order and strips the first match.

Back-compat: singular "version_prefix" still works (parsed as a
single-element list).
2026-03-10 10:46:47 -06:00
AJ ONeal
8cdc00b2d8 ref(installerconf): use typed struct instead of string map
Conf is now a plain struct with typed fields (Source, Owner, Repo,
TagPrefix, VersionPrefix, Exclude, BaseURL) instead of a generic
map[string]string with accessor methods. Unrecognized keys go into
an Extra map for forward compatibility.

Config stays flat key=value — covers the common patterns (simple
github, version prefix stripping, monorepo tag prefix, filename
exclusions). Complex cases belong in Go code, not config.
2026-03-10 10:42:37 -06:00
AJ ONeal
3626a04a48 feat: add UA analysis tool and fix uadetect gaps from live data
Add cmd/uaparse — analyzes User-Agent strings from webi.sh logs,
deduplicates by (os, arch, libc), extracts platform hints (cloud
provider, container runtime, distro), and flags malformed UAs.

Fix uadetect issues discovered by running against 2,186 live UAs:
- Msys/MINGW/Cygwin now correctly detected as Windows (was Linux)
- FreeBSD detection added
- s390x and riscv64 arch detection added
- WSL libc no longer falsely detected as MSVC ("microsoft" in kernel
  version string was triggering the MSVC check)
2026-03-10 10:24:26 -06:00
AJ ONeal
8aeda55e3b feat: add resolve package and end-to-end test
internal/resolve: picks the best release for a platform query.
Handles arch compatibility fallbacks (Rosetta 2, Windows ARM64
emulation, amd64 micro-arch levels), format preferences, variant
filtering (prefers base over rocm/jetpack GPU variants), and
universal (arch-less) binaries.

cmd/e2etest: fetches releases for goreleaser, ollama, and node,
classifies them, resolves for 9 test queries across linux/darwin/
windows x86_64/arm64, then compares against the live webi.sh API.

Results: 8/9 exact match, 1 warn where the Go resolver is more
correct than the live API (ollama arm64 base vs jetpack variant).

Edge cases fixed during development:
- .tgz is a valid archive format (not npm metadata)
- Empty arch in filename = universal binary (ranked below native)
- GPU variants (rocm, jetpack) ranked below base binaries
2026-03-10 10:09:32 -06:00
AJ ONeal
28dab7dade feat: complete classification of all 116 packages (169,867 rows)
- Add asset_filter/asset_exclude conf keys for shared-repo packages
- Split hugo/hugo-extended: exclude/require "extended" in asset name
- Add macosx, ia32, .snap, .appx classifier patterns
- Fix zig Platform.Size JSON string type (was int64, upstream sends string)
- Filter install scripts, cosign keys, compat.json as meta-assets
- Add riscv64, loong64, armv5, mipsle, mips64le to buildmeta

Full classification produces 169,867 distributable rows across 116 packages.
2026-03-10 00:27:57 -06:00
AJ ONeal
e78a721b51 fix: infer macOS from .app.zip/.dmg, filter npm tarballs and .d.ts
- .app.zip and .dmg formats now infer darwin OS when absent
- Filter .tgz (npm packages) and .d.ts (TypeScript defs) as meta-assets
- Reduces bun false positives by 64, deno by 294
2026-03-10 00:24:15 -06:00
AJ ONeal
f7a6db53b3 fix: zig platform data lost in cache, expand classifier coverage
- Fix zig Platform.Size type: string in upstream JSON (json.Number)
- Fix zig Platforms json tag: was "-" (dropped in cache), now serializes
- Add riscv64, loong64, armv5 archs to buildmeta and classifier
- Add mipsle, mips64le arch detection patterns
- Add plan9 OS detection
- Add "mac" (word boundary) → darwin OS detection
- Add armhf → armv7, arm7 → armv7 patterns
- Infer Linux from .deb/.rpm format when OS absent
- Filter source archives and buildable-artifact meta-assets

Batch 2 tested: zig (246), flutter (2082), chromedriver (10300),
terraform (5550), julia (1783), iterm2 (262), mariadb (207), gpg (45)
serviceman/aliasman: 0 (source-only, no binary assets)
2026-03-10 00:22:33 -06:00
AJ ONeal
d398625f5d feat: add cmd/classify and improve classifier coverage
- Add cmd/classify: reads raw cached releases and produces a CSV of all
  distributables with sortable version columns (ver_major/minor/patch/pre)
- Export rawcache.ActivePath() for use by cmd/classify
- Add OS detection: openbsd, netbsd, dragonflybsd, plan9, mac→darwin
- Add arch detection: armv5, armhf→armv7, arm7→armv7, 386→x86,
  32bit/64bit (no hyphen), universal→universal2, riscv64, loong64,
  mipsle, mips64le
- Infer Linux from .deb/.rpm format when OS not in filename
- Add .deb and .rpm as recognized formats
- Normalize all per-source values to buildmeta vocabulary (x86_64, aarch64)
- Filter source archives and buildable-artifact meta-assets
- Add CAT-RULES.md tracking classifier learnings
- Add CATEGORIZED.md and LINKS.md for reference

Batch 1 tested: go, node, hugo, caddy, pathman (35,919 rows)
2026-03-10 00:17:17 -06:00
AJ ONeal
7f0c92e262 add releases.conf for all remaining packages and wire new fetchers
New fetcher packages:
- chromedist: Chrome for Testing API (googlechromelabs.github.io)
- gpgdist: SourceForge RSS for GPG macOS
- mariadbdist: MariaDB downloads REST API

New releases.conf files for:
- GitHub: aliasman, awless, duckdns.sh, hugo-extended, kubens, rg, postgres
- gittag: vim-commentary, vim-zig
- gitea: pathman
- chromedist: chromedriver
- gpgdist: gpg
- mariadbdist: mariadb
- nodedist: node

Alias support (alias_of key):
- golang → go, dashd → dashcore, psql → postgres, zig.vim → vim-zig
- Aliases skip fetching and share cache with their target

Every package with a releases.js now has a releases.conf (except the
dead macos package). fetchraw dispatches to all 13 source types.
2026-03-09 22:48:11 -06:00
AJ ONeal
990221454e add fetchers for non-GitHub release sources
New fetcher packages:
- golang: golang.org/dl/?mode=json&include=all
- zigdist: ziglang.org/download/index.json
- flutterdist: Google Storage per-OS release indexes
- iterm2dist: scrapes iterm2.com/downloads.html
- hashicorp: releases.hashicorp.com/{product}/index.json
- juliadist: julialang-s3.julialang.org/bin/versions.json

Each follows the same iter.Seq2 pattern as the existing nodedist/github
fetchers. Added releases.conf files for all six packages and wired them
into cmd/fetchraw.

Fixed latest-version detection for sources that return unordered data
(hashicorp, zigdist, juliadist) by comparing all versions with lexver
instead of taking the first stable one found.
2026-03-09 22:39:16 -06:00
AJ ONeal
b98cbc975c feat: add releases.conf files and installerconf parser
Simple key=value config per package declaring the fetch source and
its parameters. Greppable, no dependencies needed to parse.

  grep 'source = github' */releases.conf
  grep 'owner = therootcompany' */releases.conf

70 packages configured. installerconf package provides the reader.
fetchraw will be updated to read these instead of a hardcoded list.
2026-03-09 22:27:26 -06:00
AJ ONeal
69a23f3592 feat: add audit log, merge strategy, and all GitHub packages
- rawcache: add Merge() that skips unchanged releases, logs added/
  changed events to an append-only JSONL audit log with SHA-256
- rawcache: drop .json extension from filenames — raw cache stores
  opaque bytes (upstream may be JSON, CSV, XML, or bespoke)
- fetchraw: add all 68 GitHub packages, use Merge instead of Put
- fetchraw: log format shows +added ~changed =skipped
2026-03-09 22:19:11 -06:00
AJ ONeal
5dba2de20b feat(buildmeta): add CompatArches and universal binary arch types
CompatArches returns what a given OS+arch can execute — OS-level
facts like Rosetta 2 (darwin arm64 runs x86_64), Windows ARM
emulation, and x86-64 micro-arch backward compat. Also adds
ArchUniversal1 (PPC+x86) and ArchUniversal2 (x86_64+ARM64).

Per-package/per-version overrides (libc compat, nonstandard naming)
remain the installer config's responsibility.
2026-03-09 21:57:43 -06:00
AJ ONeal
1253fcd671 ref: remove universal fallback chains from buildmeta and platlatest
Arch and libc fallbacks are not universal — they depend on the OS,
the package, and even the version. ARM64 on macOS/Windows can run
x64 (Rosetta/emulation) but not on Linux. Musl can be static or
dynamically linked depending on the package version. Windows GNU
may or may not need mingw.

These rules belong in per-installer config, not in shared types.
platlatest stays as a simple fact store (triplet → version).
Resolution with fallbacks will be the caller's job.
2026-03-09 21:50:10 -06:00
AJ ONeal
34cfe32492 feat: add arch/libc fallback chains and version waterfall resolution
Prefer latest version over best CPU match. An amd64v4 machine gets
v2.0.0 (baseline only) instead of v1.0.0 (which had a v4 build)
because recency beats specificity.

- buildmeta: add amd64v2/v3/v4 micro-levels, ArchFallbacks, LibcFallbacks
- classify: detect micro-arch levels, treat Windows "arm" as ARM64
- platlatest: add Resolve() that walks fallback chains picking newest
2026-03-09 21:44:06 -06:00
AJ ONeal
1e26a3e5ec feat: add classify and platlatest packages
classify extracts OS, arch, libc, and format from release asset
filenames using regex pattern matching with priority ordering
(x86_64 before x86, arm64 before armv7, etc.).

platlatest tracks the newest release version per build target
(OS+arch+libc triplet) to handle the common case where Windows
or macOS releases lag behind Linux by several versions.
2026-03-09 21:33:59 -06:00
AJ ONeal
ae39837145 feat(rawcache): add double-buffered raw release cache
Stores one JSON file per release, named by tag. Supports:
- Incremental updates: atomic writes to the active slot
- Full refresh: write to standby slot, atomic symlink swap
- O(1) existence check and latest-tag lookup
2026-03-09 21:28:03 -06:00
AJ ONeal
574e5be929 feat(releases): add source archive fetchers for GitHub, Gitea, GitLab
For packages installed from auto-generated source tarballs rather
than uploaded binary assets (shell scripts, vim plugins, etc.).
Each delegates to its respective forge fetcher — the distinction
is organizational, signaling which fields the consumer should use.
2026-03-09 21:10:18 -06:00
AJ ONeal
e1bd6bb82f ref(gitea): rewrite as standalone fetcher, not a githubish wrapper
Gitea's API is similar to GitHub's but not identical (different URL
prefix, limit vs per_page, token auth header). Give it its own types
and pagination logic rather than coupling through githubish.
2026-03-09 21:06:58 -06:00
AJ ONeal
fd9d5ca080 feat(releases): add GitLab release fetcher
GitLab's API differs from GitHub: different URL pattern
(/api/v4/projects/:id/releases), nested asset structure
(sources + links), page/per_page pagination with X-Total-Pages
header, and PRIVATE-TOKEN auth.
2026-03-09 21:05:51 -06:00
AJ ONeal
6576ca65b6 feat(githubish): add TarballURL and ZipballURL to Release
Some packages (shell scripts, vim plugins) use the auto-generated
source archives rather than uploaded binary assets. These URLs are
already in the API response — just needed to be deserialized.
2026-03-09 20:57:01 -06:00
AJ ONeal
1116dd3935 feat(releases): add Gitea and git-tag fetchers
gitea: thin wrapper over githubish that appends /api/v1 to the base URL.

gittag: clones/fetches a bare repo, lists version-like tags with
commit metadata, includes HEAD. For packages installed by cloning
(vim plugins, shell scripts) rather than downloading binaries.
2026-03-09 20:55:32 -06:00
AJ ONeal
befb1fb425 feat(releases): add GitHub-compatible release fetcher with pagination
githubish: generic fetcher for any GitHub-compatible API (GitHub,
Gitea, Forgejo). Paginates via Link headers, supports Bearer auth.
Returns raw API data with no transformation.

github: thin wrapper that sets the base URL to api.github.com.
2026-03-08 23:20:39 -06:00
AJ ONeal
b7e3fe69ad feat(releases): add Node.js distribution fetchers
nodedist: generic fetcher for any Node.js-style dist index.json API.
Returns raw API entries with no transformation or normalization.
Uses iter.Seq2 for a paginated interface consistent across sources.

node: calls nodedist twice — official builds and unofficial builds
(musl, loong64, etc.) — yielding one batch per source.
2026-03-08 23:12:54 -06:00
AJ ONeal
4f3bdd7d58 feat(uadetect): add FromRequest for full agent detection
The user agent identifies itself through multiple signals — the
User-Agent header and query parameters (?os, ?arch). FromRequest
unifies both, with explicit query params taking precedence.
2026-03-08 22:58:59 -06:00
AJ ONeal
43ab591061 ref(internal): rewrite buildmeta, uadetect, httpclient from scratch
buildmeta: remove premature Release/PackageMeta structs and
ChannelNames slice — keep only the shared vocabulary types.

uadetect: replace regex-based matching with token-based matching.
Split UA on whitespace/slash/semicolon, match lowercase tokens.
Strip xnu kernel info for Rosetta. Single Parse() entry point.

httpclient: return plain *http.Client from New(). Make Do() and Get()
free functions. Only retry idempotent methods (GET/HEAD).
2026-03-08 22:58:58 -06:00
AJ ONeal
1374bca46b feat(lexver): add Original field and ExtraSort tiebreaker
Original preserves the upstream tag as the releaser published it (e.g.
"REL_17_0"), while Raw holds Webi's normalized form ("17.0").

ExtraSort is an opaque string for package-specific ordering where Nums
alone can't capture sort order (e.g. flutter "2.3.0-16.0.pre"). Set by
release-fetcher code using zero-padded strings or whatever works for
that package.
2026-03-08 22:58:58 -06:00
AJ ONeal
c1a40cebf3 ref(lexver): support arbitrary version depth, use time.Time for dates
Versions aren't always semver — chromedriver uses 4 parts, gpg uses 4
parts, atomicparsley uses dates. Replaced fixed Major/Minor/Patch/Build
fields with a Nums slice. Date is now time.Time for minute-level
precision.

Also adds ODDITIES.md cataloging non-standard version formats across
Webi packages for future reference.
2026-03-08 22:58:58 -06:00
AJ ONeal
66f9f5f5fe ref(lexver): replace Build field with Date for tiebreaking
Build is often a hash with no ordering meaning. When release dates are
known, they're a better tiebreaker for versions with the same
major.minor.patch. Date is only used when both sides have one.
2026-03-08 22:58:58 -06:00
AJ ONeal
c2c54e54ea ref(lexver): rewrite as structured type with Compare function
Instead of encoding versions as padded strings (a JS-ism), parse into a
Version struct and compare fields directly. Pre-releases sort before
their corresponding stable release via channel comparison.
2026-03-08 22:58:58 -06:00
AJ ONeal
ed377c93c6 docs: rewrite package comments to focus on what/why, not how
Each package doc now explains what problem it solves and why it exists,
with the public interface as the only "how" detail. Implementation
notes removed from doc comments.
2026-03-08 22:58:58 -06:00
AJ ONeal
cf9dd4d2e2 feat: add Phase 0 foundation packages for Go rewrite
- internal/buildmeta: canonical constants for OS, arch, libc, format, channel
- internal/lexver: version string → lexicographically sortable string
- internal/uadetect: User-Agent → OS/arch/libc detection
- internal/httpclient: resilient net/http client with retry and backoff
- go.mod: initialize module (stdlib only, no dependencies)
2026-03-08 22:58:58 -06:00