- Add asset_filter/asset_exclude conf keys for shared-repo packages - Split hugo/hugo-extended: exclude/require "extended" in asset name - Add macosx, ia32, .snap, .appx classifier patterns - Fix zig Platform.Size JSON string type (was int64, upstream sends string) - Filter install scripts, cosign keys, compat.json as meta-assets - Add riscv64, loong64, armv5, mipsle, mips64le to buildmeta Full classification produces 169,867 distributable rows across 116 packages.
5.1 KiB
Classification Rules & Learnings
Tracking classifier decisions and edge cases discovered during batch processing.
Batch 1 (go, node, hugo, caddy, pathman)
Vocabulary
All values in the CSV use buildmeta canonical names:
- Arch:
x86_64(not amd64),aarch64(not arm64),x86(not 386/i386) - Format:
.tar.gz(with leading dot), matching buildmeta.Format constants - OS:
darwin(not mac/macos),dragonfly(not dragonflybsd)
Classifier Additions
From this batch, these patterns were added to the generic classifier:
OS:
mac(word boundary) → darwin (caddy usesmac_amd64)openbsd,netbsd,dragonfly(?:bsd)?,plan9→ new OS types.deb/.rpm→ infer Linux when OS undetectable from filename
Arch:
386(word boundary) → x86 (Go naming convention)32bit/64bit(no hyphen) → x86/x86_64 (Hugo naming)arm7→ armv7 (old caddy naming)armhf→ armv7 (Debian convention)armv5→ new arch typeuniversal→ universal2 (Hugo fat binaries)riscv64,loong64,mipsle,mips64le→ new arch types
Per-Source Normalizers
Each upstream API uses different naming. Normalizers convert to buildmeta vocabulary:
| Source | "amd64" | "arm64" | "arm" |
|---|---|---|---|
| Go API | amd64→x86_64 | arm64→aarch64 | - |
| Node dist | x64→x86_64 | arm64→aarch64 | - |
| Zig | x86_64 (same) | aarch64 (same) | - |
| HashiCorp | amd64→x86_64 | arm64→aarch64 | arm→armv6 |
| Julia | x86_64 (same) | aarch64 (same) | - |
| Chrome | x64→x86_64 | arm64→aarch64 | - |
| MariaDB | x86_64 (same) | aarch64 (same) | - |
Meta-Asset Filtering
Skipped patterns: checksums (.sha256, .sha512, .md5, checksums.txt),
signatures (.sig, .asc, .pem), SBOMs (.sbom, .spdx, .sigstore),
source archives (_src.tar.gz), and buildable-artifact.
Edge Cases (Accepted)
- Caddy v2 beta bare binaries (
caddy2_beta12_macos) — no arch in filename, shows empty - Hugo
macOS-all— means universal but only 2 files, not worth special-casing - Hugo extended editions — detected via
extendedin filename, tracked inextracolumn? (TODO: not yet) - Node "odd major = beta" heuristic — v15, v17, v19, v21, v23 are "current" not LTS
- Go version prefix: stripped
gofromgo1.23.6→1.23.6for clean parsing
Batch 2 (zig, flutter, chromedriver, terraform, julia, iterm2, mariadb, gpg, serviceman, aliasman)
Zig Fetcher Fix
The zig upstream API returns "size" as a JSON string, not a number.
Changed Platform.Size from int64 to json.Number to avoid unmarshal failures.
Also changed Platforms tag from json:"-" to json:"platforms,omitempty" so
platform data is preserved in cache.
Source-Only Packages
serviceman and aliasman have GitHub releases with empty assets:[]. These are
source-only repos that install via go install or script download, not binary
releases. The classifier correctly produces 0 distributables for them — they
don't belong in the binary CSV.
Flutter Arch Detection
Early Flutter releases (pre-2020) had no arch-specific builds — single platform SDK. No arch in filename → empty arch in CSV. This is correct; the installer would default to x86_64 on supported platforms.
Batch 3 (25 packages: arc through gitdeploy)
New Classifier Patterns
macosx→ darwin (syncthing usesmacosx)ia32→ x86 (dart-sass usesia32).snapformat → Linux-only.appxformat added for PowerShell
New Meta-Asset Filters
.pub(cosign keys)install.sh,install.ps1(install scripts)compat.json(syncthing metadata)
Batch 4 (62 remaining packages) + Full Run
Hugo/Hugo-Extended Split
hugo-extended shares the same GitHub repo as hugo. Added asset_filter and
asset_exclude conf keys to split them:
hugo/releases.conf:asset_exclude = extended(6,354 assets)hugo-extended/releases.conf:asset_filter = extended(2,193 assets)
User direction: "hugo-extended should be a separate release. I believe the README covered this. I think it should have been the default."
Remaining Empty-Field Patterns (Per-Installer Territory)
These have empty OS or arch from the generic classifier and need per-installer config to resolve:
- Git-for-Windows:
Git-2.x.x-32-bit.tar.bz2— no OS in filename, always Windows - CMake: HP-UX, IRIX targets — exotic/dead platforms
- Dashcore: old naming conventions
- Old PowerShell
.msifiles — no arch in filename - Bare binaries (ollama-darwin, caddy2_beta12_macos) — no arch info
Full Results
169,867 distributable rows across 116 packages. 3 packages produce 0 rows: serviceman, aliasman (source-only), duckdns.sh.
TODO
- Consider whether bare binaries (no format extension) should get a format marker
- Per-installer configs for packages with known-but-undetectable OS/arch
arm32classification: leave to per-installer unless pattern emergesarm32is vague — may mean armv6 or armv7. Leave as per-installer responsibility unless a distinct pattern emerges (user direction 2026-03-10)