Commit Graph

37 Commits

Author SHA1 Message Date
AJ ONeal
b236c8ac6b ref: move legacy field backport from classifypkg to ExportLegacy; add .apk/.AppImage formats
- Remove LegacyBackport from classifypkg and webicached; canonical values
  now flow through storage untouched
- Add legacyFieldBackport() in storage/legacy.go, called only at export time
  (go: armv6→arm, ffmpeg windows: .gz/.empty→.exe)
- ExportLegacy now takes pkg name and returns LegacyDropStats (variants + formats dropped)
- fsstore.Commit logs dropped assets so filtering is visible
- Add FormatAPK (.apk) and FormatAppImage (.AppImage) to buildmeta and classify
  so those files are properly classified and then correctly dropped from legacy export
  rather than passing through as empty-format
2026-03-11 14:41:30 -06:00
AJ ONeal
102be6e635 feat(pgstore): add PostgreSQL storage backend
Implements storage.Store for PostgreSQL using pgx/v5.

Schema uses double-buffered generations per package — write into the
inactive gen, then atomically swap the active pointer on Commit. Readers
always see a complete consistent snapshot.

Write path: BeginRefresh → Put (staged in-memory) → Commit (CopyFrom + swap)
Read path:  Load → reads active gen from webi_packages, fetches assets

Both webid and webicached now accept -pg=<dsn> to use pgstore instead
of fsstore. Schema is applied idempotently on startup.

Also:
- storage.Store interface gains ListPackages(ctx) — fsstore reads the
  directory; pgstore queries webi_packages
- webid.loadAll() uses ListPackages instead of filepath.ReadDir
- Fixed .gitignore: /webid (root binary) was incorrectly matching cmd/webid/
2026-03-11 14:29:01 -06:00
AJ ONeal
31dc1f114b ref(classify): separate core classifier from legacy backport
Move legacy-specific field translations out of the core classifier into
LegacyBackport(), called by webicached before writing the JSON cache.

Core classifier now outputs canonical values:
- Go dist arm → armv6 (correct per GOARM default)
- ffmpeg Windows .gz → .gz (correct file extension)

LegacyBackport remaps for Node.js compat:
- Go dist armv6 → arm (production keeps raw API value)
- ffmpeg Windows .gz → exe (production releases.js override)

sass armv6→armv7 stays in classifier (Dart Sass genuinely targets ARMv7).
2026-03-11 13:58:59 -06:00
AJ ONeal
0861ebc8b8 ref(releases.conf): collapse source/owner/repo into single keys
Source type is now inferred from the primary key:
  github_repo = owner/repo   (was source=github + owner + repo)
  git_url = https://...      (was source=gittag + url)
  gitea_repo = owner/repo    (was source=gitea + owner + repo)
  hashicorp_product = name   (was source=hashicorp + product)

One-off dist sources (nodedist, zigdist, etc.) keep the explicit
source= key since they're already one-liners.

Parser still accepts the old format via the default fallback branch.
2026-03-11 01:05:08 -06:00
AJ ONeal
90149ac945 ref(webicached): round-robin refresh, skip aliases, rate limit API
- Default mode: classify all from rawcache on startup, then
  fetch+refresh one package per tick (round-robin).
- --eager flag for the old behavior (fetch all on startup).
- Skip aliases and symlinked dirs — legacy cache doesn't create
  entries for them (resolved at request time by the server).
- Add --page-delay (default 2s) to rate-limit paginated API requests.
- Add delayTransport wrapper on http.Client.
2026-03-11 00:29:40 -06:00
AJ ONeal
413ec722f2 fix(webicached): detect symlinked package dirs as aliases
Symlinked directories (e.g. rust.vim → vim-rust) are now treated as
aliases instead of being independently fetched and classified. Creates
cache symlinks just like alias_of config entries.
2026-03-11 00:24:51 -06:00
AJ ONeal
b8c67491fe feat: resolve alias_of in cache pipeline
Packages with alias_of in releases.conf (e.g. dashd → dashcore,
golang → go) now get symlinked cache files so they resolve to the
same JSON as their target. 13 aliases total.

Added AliasOf as a proper field in installerconf.Conf, LinkAlias
method to fsstore, and alias handling in webicached's Run loop.
2026-03-10 23:28:36 -06:00
AJ ONeal
86e3d8f969 ref: extract classification pipeline into internal/classifypkg
Move all source-specific classifiers, variant tagging, config filtering,
and readAllRaw out of cmd/webicached into internal/classifypkg. The new
Package() function runs the full classify pipeline: source dispatch →
tag variants → apply config.

webicached now only handles fetching raw data and writing to fsstore.
The classification logic is reusable by comparecache and future tools.
2026-03-10 18:06:02 -06:00
AJ ONeal
c1b81157dc fix(gittag): produce correct filenames, versions, and format for git assets
- gittag classifier: use "{repo}-{tag}" filenames (matching Node.js),
  strip "v" prefix from version, synthesize date-based version for
  tagless repos (HEAD of master/main)
- GitHub source-only: use "git" format (no dot) and "{repo}-{tag}"
  filename for clone assets
- Legacy export: add "git" to recognized formats so gittag packages
  appear in the legacy cache
- Derives repo name from the git URL in releases.conf

vim-commentary now matches. vim-zig matches on format but has newer
data (expected — Go fetched more recently than Node.js).
2026-03-10 18:00:43 -06:00
AJ ONeal
72a8c56b13 fix(mariadb): skip source tarballs with OS="Source" or whitespace CPU
The MariaDB API returns OS="Source" and CPU=" " for source packages.
The previous check only tested for empty strings, missing these.
2026-03-10 17:44:26 -06:00
AJ ONeal
2b0b293728 feat(cache): add timing instrumentation to webicached and comparecache
Log classify/write/total per package in webicached, and
discover/compare/total in comparecache. Helps identify slow
packages as the dataset grows.
2026-03-10 17:42:50 -06:00
AJ ONeal
72fec20fb0 ref: move IsMetaAsset to classify package, share between tools
Moved isMetaAsset from cmd/webicached to classify.IsMetaAsset so
both webicached and comparecache use the same logic. Removed
duplicated isMetaFile from comparecache. The comparecache
isLiveNoise now delegates to classify.IsMetaAsset and adds
live-specific filters (.deb, .rpm, -src-).
2026-03-10 17:28:44 -06:00
AJ ONeal
f101037dfd fix: restore checksums/sha256sum/sha512sum substring filters
These are exact filenames with no extension — .txt doesn't catch them.
2026-03-10 17:25:55 -06:00
AJ ONeal
9247de98d2 fix: filter all .txt files as non-installable meta assets
.txt files are never installable (checksums, release notes, etc.).
Filter them generically instead of matching specific patterns.
2026-03-10 17:25:06 -06:00
AJ ONeal
3f1f909005 fix: use repo-tag as filename for source tarballs (drop owner prefix) 2026-03-10 17:19:24 -06:00
AJ ONeal
19de4c3caa fix: use tag as filename for source tarballs, add TODO for HEAD lookup
Drop the Owner-Repo prefix from source tarball filenames — the
actual download name comes from Content-Disposition. Added TODO
to resolve the full filename via HEAD at fetch time.
2026-03-10 17:19:05 -06:00
AJ ONeal
2bd1537e9c feat: add .git asset for source-only GitHub releases
Source-only releases (no uploaded assets) now also emit a .git
asset with the GitHub clone URL, matching how gittag-sourced
packages like vim-commentary and vim-zig work. This allows
install via git clone --branch <tag> as an alternative to
downloading the tarball.
2026-03-10 17:18:21 -06:00
AJ ONeal
d56f43e3b4 fix: use API URLs for source tarballs, match legacy filename pattern
Source-only GitHub releases now use the API-provided tarball_url
and zipball_url directly. Filename follows the legacy pattern
(Owner-Repo-Tag.ext) to approximate the Content-Disposition
filename that Node.js gets by following the redirect.
2026-03-10 17:17:35 -06:00
AJ ONeal
4a9088fea7 fix: use GitHub API tarball/zipball URLs instead of constructing them
Source-only releases now use the API-provided tarball_url and
zipball_url directly instead of guessing the archive URL format.
The filename uses the git tag, and the download URL is what
GitHub's API actually returns.
2026-03-10 17:13:04 -06:00
AJ ONeal
7e134ead87 fix(yq): use exclude in releases.conf instead of variant tagger
Man pages aren't a variant — they're just assets we don't install.
The exclude key in releases.conf is the right place for this.
2026-03-10 14:59:25 -06:00
AJ ONeal
6eeed80610 fix: separate general vs installer-specific vs legacy filters
- yq: move man_page_only from general isMetaAsset to yq-specific tagger
- node: restore .exe as stored asset with "bare-exe" variant (installable
  by Go, excluded from legacy)
- ollama: rename Ollama-darwin.zip variant from "installer" to "app"
  (.app bundle is installable by Go, just not by legacy Node.js)

The distinction: general classification/filter (isMetaAsset) handles
truly non-installable files. Installer-specific taggers handle assets
that are installable but need variant tagging. Legacy filter strips
variants and unsupported formats for Node.js compat.
2026-03-10 14:58:37 -06:00
AJ ONeal
d8ecac6d6a fix: normalize .tgz to .tar.gz in display filenames
Node.js normalizes .tgz extensions to .tar.gz in the cache name field
while keeping the real .tgz URL in download. Match this behavior so
legacy export filenames are consistent. Affects ollama-darwin.tgz and
any other packages using .tgz.
2026-03-10 14:49:48 -06:00
AJ ONeal
99159d748c fix: ollama installer tag, yq/ffmpeg meta detection, ffmpeg asset_filter
- ollama: Ollama-darwin.zip (macOS .app) tagged as installer variant
- isMetaAsset: add man_page_only, .LICENSE, .README patterns
- ffmpeg: asset_filter=ffmpeg excludes ffprobe/ffplay/LICENSE/README
- uuidv7: exotic arches are correct, marked as known-acceptable
2026-03-10 14:47:01 -06:00
AJ ONeal
878009e5aa fix(node): skip nodedist "exe" format code — no real download exists
Node.js index lists "win-x64-exe" but there's no .exe file on the
download server. The MSI installer (separate "msi" entry) is the actual
Windows installer. The "exe" entry was generating a phantom filename.
2026-03-10 14:44:39 -06:00
AJ ONeal
b408b42464 feat: add asset_filter to releases.conf, fix kubectx/kubens split
asset_filter is a substring that asset filenames must contain. Used when
multiple packages share a GitHub release (kubectx/kubens both come from
ahmetb/kubectx). Added as a first-class Conf field and applied in
webicached's applyConfig.
2026-03-10 14:42:37 -06:00
AJ ONeal
6687cad126 ref: simplify variant taggers to plain functions with switch dispatch
Drop VariantTagger interface and map-based lookup. Each per-installer
package now exports a plain TagVariants function. webicached dispatches
via a switch on package name, consistent with fetchRaw and
classifyPackage.
2026-03-10 13:54:03 -06:00
AJ ONeal
9cb9ffc4c6 ref: extract variant taggers to per-installer packages
Move variant detection logic from inline functions in webicached to
per-installer packages (internal/releases/{bun,fish,git,lsd,node,
ollama,pwsh,xcaddy}). Each exports a Tagger implementing the new
storage.VariantTagger interface. webicached uses an explicit map
of package name → tagger, no magic registration.
2026-03-10 13:35:32 -06:00
AJ ONeal
39c136caa3 feat: whitespace-delimited releases.conf, variant tagging
- Switch installerconf parser from comma to whitespace delimiters
- Add asset_exclude as alias for exclude (fixes hugo)
- Add variants key (documentation cue, detection in Go code)
- Add per-package variant taggers: bun (profile, amd64v3 arch),
  pwsh (fxdependent), ollama (rocm, jetpack5/6), git (installer),
  node (msi installer), lsd (deb, msvc), fish (pkg), xcaddy (deb)
- Update releases.conf files with variant declarations
2026-03-10 13:30:33 -06:00
AJ ONeal
f441a3bf8c ref(webicached): extract WebiCache struct, add -shallow flag
Extract shared state (store, client, auth, rawDir, config flags) into
a WebiCache struct. Convert refreshPackage, fetchRaw, and paginated
fetchers (github, gitea, gittag, nodedist) to methods.

Add -shallow flag: fetches only the first page of releases from
paginated sources. Single-index sources (nodedist, chromedist, etc.)
are always complete in one request.
2026-03-10 12:57:50 -06:00
AJ ONeal
84c943b160 feat(node): merge official + unofficial builds into single cache
Add unofficial_url to node/releases.conf and update the nodedist
fetcher/classifier to fetch from both URLs. Raw entries are stored
with "official/" or "unofficial/" tag prefixes so they don't overwrite
each other. The classifier picks the correct base URL from the prefix.

This matches the Node.js releases.js behavior which merges both sources,
adding musl, riscv64, loong64, and 7z builds from unofficial.
2026-03-10 12:35:18 -06:00
AJ ONeal
14f588f4d9 version-level comparison: fix lexver sorting, add riscv64/7z, update findings
- comparecache: use lexver.Compare for version sorting instead of
  lexicographic sort (v9.9.0 was incorrectly ranked above v25.8.0)
- webicached/expandNodeFile: add riscv64, loong64 arch mappings and
  7z format support for unofficial Node.js builds
- COMPARISON.md: rewrite with version-level review findings including
  format filtering gaps (.pkg/.msi/.deb/.dmg), build variant design
  (Extra field for rocm/jetpack/fxdependent), and node multi-source issue
2026-03-10 12:27:16 -06:00
AJ ONeal
83748185bd use current (non-legacy) GitHub archive format for source archives
GitHub has two archive formats:
- legacy: codeload.github.com/.../legacy.tar.gz/... → Owner-Repo-Hash/
- current: github.com/.../archive/refs/tags/TAG.tar.gz → repo-version/

The API's tarball_url redirects to the legacy format. Node.js follows
this redirect. The current format is cleaner: predictable filenames
(repo-version.tar.gz), consistent directory names (repo-version/),
and standard github.com URLs.

Verified: aliasman-1.1.2.tar.gz extracts to aliasman-1.1.2/ which
matches the install script glob (mv ./*aliasman*/aliasman ...).
2026-03-10 11:46:36 -06:00
AJ ONeal
47f0f7bbb6 fix source archive filenames and download URLs
Use Owner-Repo-Tag naming (e.g. BeyondCodeBootcamp-aliasman-v1.1.2.tar.gz)
and direct codeload.github.com URLs instead of api.github.com tarball_url.

This matches the Node.js behavior for source-only packages (aliasman,
duckdns.sh, serviceman) where the extracted directory name matters for
install script globbing (mv ./*aliasman*/ ...).

Remaining diff: Node.js follows the redirect to get the git short hash
suffix (-0-g{hash}) from Content-Disposition. Go uses the tag name
directly. Both resolve to the same archive content.
2026-03-10 11:44:12 -06:00
AJ ONeal
d22be16a69 fix isMetaAsset and source archive classification
- Add -src.{tar.gz,tar.xz,zip} pattern to isMetaAsset (alongside _src.)
- Set os=posix_2017, arch=* on source archives (no-binary-asset releases)
  instead of leaving them empty. These are shell scripts/vim plugins that
  work on any POSIX system.
- Remove "source" Extra tag from source archives (os/arch tells the story)
2026-03-10 11:31:53 -06:00
AJ ONeal
2e052fa553 wire 9 custom source fetchers into webicached
Add fetch + classify functions for all custom source types:
- chromedist (chromedriver): Chrome for Testing JSON index
- flutterdist (flutter): Google Storage per-OS release indexes
- golang (go): golang.org/dl JSON API
- gpgdist (gpg): SourceForge RSS scraping
- hashicorp (terraform): releases.hashicorp.com product index
- iterm2dist (iterm2): HTML scraping of downloads page
- juliadist (julia): S3 versions.json with platform files
- mariadbdist (mariadb): two-step REST API (majors → releases)
- zigdist (zig): mixed-schema JSON with platform keys

All 9 fetcher packages already existed in internal/releases/ but
were not wired into webicached's fetchRaw/classifyPackage switches.
Now all 103 packages produce classified cache output.
2026-03-10 11:23:41 -06:00
AJ ONeal
b51e9e2998 add comparecache tool and LIVE_cache comparison checklist
- cmd/comparecache: compares Go cache vs Node.js LIVE_cache at filename
  level, categorizes differences (meta-filtering, version depth, source
  tarballs, unsupported sources, real asset differences)
- COMPARISON.md: per-package checklist with 91 live packages categorized
- webicached: add -no-fetch flag to classify from existing raw data only
- GO_WEBI.md: update Phase 1 checkboxes for completed items
2026-03-10 11:17:37 -06:00
AJ ONeal
0e6d90e011 feat: add webicached — release cache daemon
Combines fetch + classify + write into one pipeline:
1. Reads releases.conf to discover packages
2. Fetches raw upstream data to rawcache
3. Classifies assets (OS, arch, libc, format)
4. Applies config transforms (exclude, version prefix strip)
5. Writes to fsstore in Node.js-compatible _cache/ format

Supports github, nodedist, gittag, and gitea sources. Other sources
(golang, zigdist, flutter, etc.) are skipped with a log message —
they'll be added as needed.

Can run as a one-shot (-once) or periodic daemon (-interval 15m).
2026-03-10 10:58:17 -06:00