Files
vim-ale/CAT-RULES.md
AJ ONeal f7a6db53b3 fix: zig platform data lost in cache, expand classifier coverage
- Fix zig Platform.Size type: string in upstream JSON (json.Number)
- Fix zig Platforms json tag: was "-" (dropped in cache), now serializes
- Add riscv64, loong64, armv5 archs to buildmeta and classifier
- Add mipsle, mips64le arch detection patterns
- Add plan9 OS detection
- Add "mac" (word boundary) → darwin OS detection
- Add armhf → armv7, arm7 → armv7 patterns
- Infer Linux from .deb/.rpm format when OS absent
- Filter source archives and buildable-artifact meta-assets

Batch 2 tested: zig (246), flutter (2082), chromedriver (10300),
terraform (5550), julia (1783), iterm2 (262), mariadb (207), gpg (45)
serviceman/aliasman: 0 (source-only, no binary assets)
2026-03-10 00:22:33 -06:00

3.6 KiB

Classification Rules & Learnings

Tracking classifier decisions and edge cases discovered during batch processing.

Batch 1 (go, node, hugo, caddy, pathman)

Vocabulary

All values in the CSV use buildmeta canonical names:

  • Arch: x86_64 (not amd64), aarch64 (not arm64), x86 (not 386/i386)
  • Format: .tar.gz (with leading dot), matching buildmeta.Format constants
  • OS: darwin (not mac/macos), dragonfly (not dragonflybsd)

Classifier Additions

From this batch, these patterns were added to the generic classifier:

OS:

  • mac (word boundary) → darwin (caddy uses mac_amd64)
  • openbsd, netbsd, dragonfly(?:bsd)?, plan9 → new OS types
  • .deb/.rpm → infer Linux when OS undetectable from filename

Arch:

  • 386 (word boundary) → x86 (Go naming convention)
  • 32bit/64bit (no hyphen) → x86/x86_64 (Hugo naming)
  • arm7 → armv7 (old caddy naming)
  • armhf → armv7 (Debian convention)
  • armv5 → new arch type
  • universal → universal2 (Hugo fat binaries)
  • riscv64, loong64, mipsle, mips64le → new arch types

Per-Source Normalizers

Each upstream API uses different naming. Normalizers convert to buildmeta vocabulary:

Source "amd64" "arm64" "arm"
Go API amd64→x86_64 arm64→aarch64 -
Node dist x64→x86_64 arm64→aarch64 -
Zig x86_64 (same) aarch64 (same) -
HashiCorp amd64→x86_64 arm64→aarch64 arm→armv6
Julia x86_64 (same) aarch64 (same) -
Chrome x64→x86_64 arm64→aarch64 -
MariaDB x86_64 (same) aarch64 (same) -

Meta-Asset Filtering

Skipped patterns: checksums (.sha256, .sha512, .md5, checksums.txt), signatures (.sig, .asc, .pem), SBOMs (.sbom, .spdx, .sigstore), source archives (_src.tar.gz), and buildable-artifact.

Edge Cases (Accepted)

  • Caddy v2 beta bare binaries (caddy2_beta12_macos) — no arch in filename, shows empty
  • Hugo macOS-all — means universal but only 2 files, not worth special-casing
  • Hugo extended editions — detected via extended in filename, tracked in extra column? (TODO: not yet)
  • Node "odd major = beta" heuristic — v15, v17, v19, v21, v23 are "current" not LTS
  • Go version prefix: stripped go from go1.23.61.23.6 for clean parsing

Batch 2 (zig, flutter, chromedriver, terraform, julia, iterm2, mariadb, gpg, serviceman, aliasman)

Zig Fetcher Fix

The zig upstream API returns "size" as a JSON string, not a number. Changed Platform.Size from int64 to json.Number to avoid unmarshal failures. Also changed Platforms tag from json:"-" to json:"platforms,omitempty" so platform data is preserved in cache.

Source-Only Packages

serviceman and aliasman have GitHub releases with empty assets:[]. These are source-only repos that install via go install or script download, not binary releases. The classifier correctly produces 0 distributables for them — they don't belong in the binary CSV.

Flutter Arch Detection

Early Flutter releases (pre-2020) had no arch-specific builds — single platform SDK. No arch in filename → empty arch in CSV. This is correct; the installer would default to x86_64 on supported platforms.

TODO for Next Batches

  • Hugo "extended" variant should be captured in extra column
  • Consider whether bare binaries (no format extension) should get a format marker
  • Track _extended suffix detection more broadly
  • arm32 is vague — may mean armv6 or armv7. Leave as per-installer responsibility unless a distinct pattern emerges (user direction 2026-03-10)