- Fix zig Platform.Size type: string in upstream JSON (json.Number) - Fix zig Platforms json tag: was "-" (dropped in cache), now serializes - Add riscv64, loong64, armv5 archs to buildmeta and classifier - Add mipsle, mips64le arch detection patterns - Add plan9 OS detection - Add "mac" (word boundary) → darwin OS detection - Add armhf → armv7, arm7 → armv7 patterns - Infer Linux from .deb/.rpm format when OS absent - Filter source archives and buildable-artifact meta-assets Batch 2 tested: zig (246), flutter (2082), chromedriver (10300), terraform (5550), julia (1783), iterm2 (262), mariadb (207), gpg (45) serviceman/aliasman: 0 (source-only, no binary assets)
3.6 KiB
Classification Rules & Learnings
Tracking classifier decisions and edge cases discovered during batch processing.
Batch 1 (go, node, hugo, caddy, pathman)
Vocabulary
All values in the CSV use buildmeta canonical names:
- Arch:
x86_64(not amd64),aarch64(not arm64),x86(not 386/i386) - Format:
.tar.gz(with leading dot), matching buildmeta.Format constants - OS:
darwin(not mac/macos),dragonfly(not dragonflybsd)
Classifier Additions
From this batch, these patterns were added to the generic classifier:
OS:
mac(word boundary) → darwin (caddy usesmac_amd64)openbsd,netbsd,dragonfly(?:bsd)?,plan9→ new OS types.deb/.rpm→ infer Linux when OS undetectable from filename
Arch:
386(word boundary) → x86 (Go naming convention)32bit/64bit(no hyphen) → x86/x86_64 (Hugo naming)arm7→ armv7 (old caddy naming)armhf→ armv7 (Debian convention)armv5→ new arch typeuniversal→ universal2 (Hugo fat binaries)riscv64,loong64,mipsle,mips64le→ new arch types
Per-Source Normalizers
Each upstream API uses different naming. Normalizers convert to buildmeta vocabulary:
| Source | "amd64" | "arm64" | "arm" |
|---|---|---|---|
| Go API | amd64→x86_64 | arm64→aarch64 | - |
| Node dist | x64→x86_64 | arm64→aarch64 | - |
| Zig | x86_64 (same) | aarch64 (same) | - |
| HashiCorp | amd64→x86_64 | arm64→aarch64 | arm→armv6 |
| Julia | x86_64 (same) | aarch64 (same) | - |
| Chrome | x64→x86_64 | arm64→aarch64 | - |
| MariaDB | x86_64 (same) | aarch64 (same) | - |
Meta-Asset Filtering
Skipped patterns: checksums (.sha256, .sha512, .md5, checksums.txt),
signatures (.sig, .asc, .pem), SBOMs (.sbom, .spdx, .sigstore),
source archives (_src.tar.gz), and buildable-artifact.
Edge Cases (Accepted)
- Caddy v2 beta bare binaries (
caddy2_beta12_macos) — no arch in filename, shows empty - Hugo
macOS-all— means universal but only 2 files, not worth special-casing - Hugo extended editions — detected via
extendedin filename, tracked inextracolumn? (TODO: not yet) - Node "odd major = beta" heuristic — v15, v17, v19, v21, v23 are "current" not LTS
- Go version prefix: stripped
gofromgo1.23.6→1.23.6for clean parsing
Batch 2 (zig, flutter, chromedriver, terraform, julia, iterm2, mariadb, gpg, serviceman, aliasman)
Zig Fetcher Fix
The zig upstream API returns "size" as a JSON string, not a number.
Changed Platform.Size from int64 to json.Number to avoid unmarshal failures.
Also changed Platforms tag from json:"-" to json:"platforms,omitempty" so
platform data is preserved in cache.
Source-Only Packages
serviceman and aliasman have GitHub releases with empty assets:[]. These are
source-only repos that install via go install or script download, not binary
releases. The classifier correctly produces 0 distributables for them — they
don't belong in the binary CSV.
Flutter Arch Detection
Early Flutter releases (pre-2020) had no arch-specific builds — single platform SDK. No arch in filename → empty arch in CSV. This is correct; the installer would default to x86_64 on supported platforms.
TODO for Next Batches
- Hugo "extended" variant should be captured in
extracolumn - Consider whether bare binaries (no format extension) should get a format marker
- Track
_extendedsuffix detection more broadly arm32is vague — may mean armv6 or armv7. Leave as per-installer responsibility unless a distinct pattern emerges (user direction 2026-03-10)