Commit Graph

1435 Commits

Author SHA1 Message Date
AJ ONeal
afa65bbf87 docs: BLOCKING — cache normalization needed before moving on 2026-03-11 14:34:51 -06:00
AJ ONeal
031a15b0ea feat(webid): wire golib middleware for request logging
Uses therootcompany/golib/http/middleware/v2 to add requestLogger to all
routes except /api/health (too noisy). Logs method, path, status, duration.
2026-03-11 14:31:37 -06:00
AJ ONeal
93786ecfb9 docs: update GOER.md — Phase 4 pgstore complete 2026-03-11 14:29:11 -06:00
AJ ONeal
102be6e635 feat(pgstore): add PostgreSQL storage backend
Implements storage.Store for PostgreSQL using pgx/v5.

Schema uses double-buffered generations per package — write into the
inactive gen, then atomically swap the active pointer on Commit. Readers
always see a complete consistent snapshot.

Write path: BeginRefresh → Put (staged in-memory) → Commit (CopyFrom + swap)
Read path:  Load → reads active gen from webi_packages, fetches assets

Both webid and webicached now accept -pg=<dsn> to use pgstore instead
of fsstore. Schema is applied idempotently on startup.

Also:
- storage.Store interface gains ListPackages(ctx) — fsstore reads the
  directory; pgstore queries webi_packages
- webid.loadAll() uses ListPackages instead of filepath.ReadDir
- Fixed .gitignore: /webid (root binary) was incorrectly matching cmd/webid/
2026-03-11 14:29:01 -06:00
AJ ONeal
5cf9b96c06 docs: concrete example of cache JSON values causing warnings 2026-03-11 14:26:15 -06:00
AJ ONeal
31dc1f114b ref(classify): separate core classifier from legacy backport
Move legacy-specific field translations out of the core classifier into
LegacyBackport(), called by webicached before writing the JSON cache.

Core classifier now outputs canonical values:
- Go dist arm → armv6 (correct per GOARM default)
- ffmpeg Windows .gz → .gz (correct file extension)

LegacyBackport remaps for Node.js compat:
- Go dist armv6 → arm (production keeps raw API value)
- ffmpeg Windows .gz → exe (production releases.js override)

sass armv6→armv7 stays in classifier (Dart Sass genuinely targets ARMv7).
2026-03-11 13:58:59 -06:00
AJ ONeal
de1de7fccd docs: clarify that warnings are cache output values, not data diffs
The comparecache equivalence matching hides the issue. The Node
build-classifier needs normalized values (armv6 not armhf, sunos not
solaris) in the actual cache JSON files.
2026-03-11 13:56:21 -06:00
AJ ONeal
c7f7fd5fe3 docs: update GOER with progress report and top 3 impact fixes 2026-03-11 13:52:42 -06:00
AJ ONeal
f5f74d142a docs: update GOER.md — 98/101 packages match, 3 known diffs remain 2026-03-11 13:52:31 -06:00
AJ ONeal
c4ebd55753 fix(classify): add package-specific overrides for sass, ffmpeg, go arm
- sass: bare arm → armv7 (Dart Sass targets ARMv7, not v6)
- ffmpeg: Windows .gz → ext exe (gzipped bare executables)
- go: keep bare arm as-is from Go dist API (matches production)

Reduces comparecache diffs from 6 packages to 3 (iterm2 channel edge
cases, postgres legacy ext, terraform alpha detection — all understood).
2026-03-11 13:52:06 -06:00
AJ ONeal
e5dc8d973f docs: answer ANYOS question, re-emphasize 9 classification issues 2026-03-11 13:43:38 -06:00
AJ ONeal
e1ed3999e3 docs: update GOER.md with comparecache findings and researcher question 2026-03-11 13:42:49 -06:00
AJ ONeal
c4a91002f6 fix(classify): use Go API structured os/arch for golang releases
The golang dist API provides structured os/arch fields. Using these
instead of filename-based classification fixes:
- illumos/solaris kept distinct (not merged to sunos)
- arm arch correctly mapped per GOARCH convention
- buildmeta: add OSIllumos and OSSolaris constants
2026-03-11 13:42:34 -06:00
AJ ONeal
19c55b0131 docs: add reminder for GOER about 9 classification issues 2026-03-11 13:39:32 -06:00
AJ ONeal
c0f8313a62 fix(comparecache): use equivalence matching for os/arch/ext naming
Replace direct string comparison with canonical equivalence checks so
naming convention differences (darwin/macos, x86_64/amd64, aarch64/arm64)
don't appear as false diffs. Now only real classification disagreements
surface:
- go: illumos/solaris→sunos mapping, arm ambiguity per OS
- sass: bare "arm" should be armv7, not armv6
- ffmpeg: Windows .gz ext classified as exe in prod
- terraform: alpha channel detected correctly by Go, missed by prod
- postgres: legacy EDB ext "tar" vs "tar.gz"
2026-03-11 13:37:24 -06:00
AJ ONeal
aa6df09188 fix(pg): filter to server assets; add field-level cache comparison
- pg/releases.conf: add asset_filter=postgres so pg only returns server
  assets (which include the client), matching production releases.js
- classifypkg: add "pg" to postgres version normalizer switch case
- comparecache: compare os/arch/libc/ext/channel fields on shared assets,
  distinguishing real disagreements (diff-*) from expected fill diffs
  where Go classifies at write time but Node.js leaves fields empty
2026-03-11 12:55:41 -06:00
AJ ONeal
992d50eaca docs: detailed Go cache classification issues from Node tests
9 categories: universal2, solaris/illumos, armhf, armel, windows arm,
android, winx64, minor arch mismatches, sttr pkg misclassification.
Plus broad sweep failures and live-compare known diffs.
2026-03-11 12:48:34 -06:00
AJ ONeal
c22fd35cdf docs: update GO_WEBI.md — phases 1-3 complete 2026-03-11 12:40:33 -06:00
AJ ONeal
d76e93c380 fix(resolver): prefer no-dep builds in libc waterfall
Static (none) first on all platforms — no runtime dependencies.
Linux: none, gnu, musl. Windows: none, msvc (vcredist not bundled).
2026-03-11 12:34:08 -06:00
AJ ONeal
7da1cc0394 docs: update GOER.md with Windows gnu and padding fixes 2026-03-11 12:31:33 -06:00
AJ ONeal
a3685b840b fix: Windows gnu→none, install.sh 8-space padding
- Windows gnu (MinGW) builds are self-contained: classify as libc='none'
- Pad install.sh content to 8 spaces to match production template indent
- Use replaceMarkerLine for both bash and PS1 installer injection
2026-03-11 12:31:17 -06:00
AJ ONeal
2e1d824b27 docs: consolidate ANSWERS.md and add GOER.md for agent communication 2026-03-11 12:26:31 -06:00
AJ ONeal
239e570be0 docs: answer Issue 5 — static musl libc classification fix 2026-03-11 12:19:44 -06:00
AJ ONeal
47419b7eee fix(classifypkg): tag Rust static musl builds as libc='none'
Rust *-unknown-linux-musl builds are statically linked with zero
runtime libc dependency. Detect this pattern in classifyGitHub and
override libc from 'musl' to 'none'. Hard-musl packages (pwsh, bun,
node) use different filename patterns and keep libc='musl'.
2026-03-11 12:19:21 -06:00
AJ ONeal
9095b34c22 feat(render): implement PowerShell installer rendering
Add PowerShell() function to render .ps1 installers by injecting
$Env: variables and splicing install.ps1 content. Wire it into
the webid server for .ps1 extension requests.
2026-03-11 12:05:15 -06:00
AJ ONeal
a76413012f ref(installerconf): remove back-compat aliases for old key names
Remove github_repo, github_source, gitea_repo aliases. Not released
yet — no need for backwards compatibility.
2026-03-11 11:55:06 -06:00
AJ ONeal
23100394ac ref(installerconf): rename config keys and add full URL support
Renames:
- github_repo → github_releases (back-compat kept)
- github_source → github_sources (back-compat kept)
- gitea_repo → gitea_releases (back-compat kept)

New keys:
- gitea_sources, gitlab_releases, gitlab_sources

All keys now accept either owner/repo shorthand or full URLs:
- github_releases = sharkdp/bat
- github_releases = https://github.com/sharkdp/bat
- gitea_releases = https://git.rootprojects.org/root/pathman

Defaults: github → github.com, gitlab → gitlab.com.
Gitea has no default (self-hosted only).

Updated all 73 releases.conf files from github_repo to github_releases.
2026-03-11 11:51:43 -06:00
AJ ONeal
bd3bd85e43 feat(installerconf): github_source packages include git_url for clone fallback
git_url is now a standalone field that can appear alongside any source
type. For githubsource packages, it adds a git clone entry per release
in addition to the tarball and zipball. Updated aliasman, duckdns.sh,
and serviceman configs.
2026-03-11 11:46:59 -06:00
AJ ONeal
bbcaa0f464 docs: update answers with three-strategy fix details 2026-03-11 11:42:59 -06:00
AJ ONeal
0ae4d01d75 fix(classifypkg): separate github, githubsource, and gittag strategies
Three distinct fetch/classify strategies:
- github: binary assets only, no source entries
- githubsource: tarball + zipball from GitHub releases API
- gittag: git clone + tag enumeration (existing)

GitHub binary packages (caddy, jq, shellcheck, etc.) no longer get
spurious .git and source tarball entries for old releases that had
no binary uploads. Source-installable packages (aliasman, duckdns.sh,
serviceman) now use github_source in releases.conf.
2026-03-11 11:42:35 -06:00
AJ ONeal
5858a9fefd docs: confirm .git resolution is a Node.js resolver issue, not cache data 2026-03-11 11:34:49 -06:00
AJ ONeal
a5f2dc87cf fix(comparecache): -sample picks random assets, not packages
-sample N now randomly samples N assets from each package's diff list,
giving a representative view of classification differences instead of
showing only the first alphabetical entries. Implies -windowed -diffs
to filter out version-depth noise and focus on real bugs.
2026-03-11 11:31:58 -06:00
AJ ONeal
47081c6e17 fix(installerconf): align tests with actual config format
Tests were using separate source/owner/repo keys but the parser expects
github_repo=owner/repo, gitea_repo=owner/repo, etc. Fixed all test
configs to match. Also answered Issue 4 (darwin-universal) for other agent.
2026-03-11 11:29:30 -06:00
AJ ONeal
2b488693b0 feat(comparecache): add -sample flag to pick random extra packages
Usage: go run ./cmd/comparecache -sample 8 -diffs
Picks 8 random packages beyond any explicitly named ones, logs which
ones were sampled for reproducibility.
2026-03-11 11:20:43 -06:00
AJ ONeal
5606773945 fix(webid): add missing imports to bootstrap_test.go
The getWithUA helper needs io, net/http, and net/http/httptest imports.
All 4 bootstrap/installer tests pass.
2026-03-11 11:18:43 -06:00
AJ ONeal
c1a5f2485d feat(webid): split bootstrap and installer routes
Production has two separate flows:
1. /{pkg} (curl-pipe bootstrap) — minimal script that sets WEBI_PKG,
   WEBI_HOST, WEBI_CHECKSUM and downloads+runs webi
2. /api/installers/{pkg}.sh — full installer with resolved release
   and embedded install.sh

Previously handleBootstrap served the full installer. Now:
- handleBootstrap: curl-pipe bootstrap (reads curl-pipe-bootstrap.tpl.sh)
- handleInstaller: full installer (/api/installers/{pkg}.sh)

Also:
- Export render.InjectVar for use by bootstrap handler
- Add webi.sh checksum calculation (SHA-1 first 8 chars)
- Add /api/installers/ route to mux and test server
2026-03-11 02:42:46 -06:00
AJ ONeal
d46cb313cb fix(v1api): use proper csv.Writer with tab delimiter instead of commaToTab
The commaToTab byte replacement was fragile — URLs containing commas
would break. Now uses csv.Writer with Comma='\t' as the backend for
csvutil.Encoder, producing correct TSV output regardless of field content.
2026-03-11 02:39:19 -06:00
AJ ONeal
5eab504c3c test(webid): add jq resolve test, skip upstream gaps in resolve tests
- Added TestV1ResolveJQ to verify jq resolves to binary, not git
- Changed upstream gap detection in resolve_cache_test to t.Skipf
  (shellcheck/windows and xz/linux-arm64 don't have upstream builds)
- Updated ANSWERS.md with git assets investigation results
2026-03-11 02:38:08 -06:00
AJ ONeal
ac6b74a5d8 docs: answer inter-agent questions about libc and git assets 2026-03-11 02:35:57 -06:00
AJ ONeal
dd5f941eca feat(webid): add v1 API with TSV-first format and resolver endpoint
New API routes:
- GET /v1/releases/{pkg}.tab — list releases as TSV (with header)
- GET /v1/releases/{pkg}.json — list releases as JSON array
- GET /v1/resolve/{pkg}.tab — resolve best asset for platform (TSV)
- GET /v1/resolve/{pkg}.json — resolve best asset for platform (JSON)

Key design decisions:
- TSV as primary format via csvutil (easy for cut/grep/sort/agents)
- Go-native naming: darwin, x86_64, aarch64 (no legacy mapping)
- No quoted fields — spaces for lists within fields
- Always includes header row in TSV output
- Resolve endpoint returns single best match with triplet info

Query params: os, arch, libc, channel, version, lts, format, variant, limit
2026-03-11 02:34:32 -06:00
AJ ONeal
9269c32b9c fix(webid): match production API format for legacy endpoints
- JSON response returns bare array (not wrapped in {"releases": [...]})
- OS names mapped to Node.js conventions: darwin → macos
- Arch names mapped: x86_64 → amd64, aarch64 → arm64
- Version strings stripped of "v" prefix
- Extension stripped of "." prefix
- Empty libc defaults to "none"
- Tab format uses actual TSV (not comma-separated)
- Tab LTS field uses "lts" / "-" (not "true" / "false")
- Tab shows header row only with ?pretty=true
- Releases sorted newest-first by version (using lexver)
- Added comprehensive format tests and production comparison test
2026-03-11 02:31:04 -06:00
AJ ONeal
a24d361289 feat(render): add installer script renderer and bootstrap route
Renders package-install.tpl.sh with WEBI_* variable injection and
install.sh splicing. Bootstrap route at /{package}@{version} detects
UA, resolves best release, and returns rendered installer script.
2026-03-11 02:03:58 -06:00
AJ ONeal
9d3d28704e feat(webid): add HTTP API server with legacy release routes
Serves /api/releases/{pkg}@{version}.json and .tab matching the
Node.js format. Supports query params for os, arch, libc, channel,
formats, lts, limit. Handles selfhosted packages (install.sh only).

Pre-loads all cached packages on startup. Includes /api/debug for
UA detection and /api/health endpoint.
2026-03-11 02:00:46 -06:00
AJ ONeal
f02b38255b feat(resolver): add new resolver for new API routes
Triplet-based resolution with indexed lookup for fast matching.
Supports channel hierarchy (alpha > beta > rc > stable), LTS filtering,
variant selection, format preferences, and arch fallback via CompatArches.

All 13 unit tests and cache integration tests pass against real data
for 100+ packages.
2026-03-11 01:51:12 -06:00
AJ ONeal
ed38c63e91 docs: add HANDOFF.md for Node.js cache-only migration
Detailed instructions for the next step: making the Node.js server
read only from Go-generated _cache/ files, removing all upstream
API fetching from the Node.js code path.
2026-03-11 01:22:05 -06:00
AJ ONeal
f167f32aa2 docs: update GO_WEBI.md to reflect current state
- releases.conf format updated (source inferred from key)
- Phase 1 checklist complete except resolver
- All release fetchers listed (18 source packages)
- Per-package releases packages documented
- Legacy export filtering description corrected (Variants not Extra)
- Resolved questions updated (rate limiting, config format, normalization)
- Stale open question removed (rate limiting solved via round-robin)
2026-03-11 01:11:55 -06:00
AJ ONeal
86e73937cd ref(installerconf): remove old source/owner/repo fallback
The default branch now only handles one-off dist sources that use
source= with url=. No config file uses owner=/repo= anymore.
2026-03-11 01:07:44 -06:00
AJ ONeal
0861ebc8b8 ref(releases.conf): collapse source/owner/repo into single keys
Source type is now inferred from the primary key:
  github_repo = owner/repo   (was source=github + owner + repo)
  git_url = https://...      (was source=gittag + url)
  gitea_repo = owner/repo    (was source=gitea + owner + repo)
  hashicorp_product = name   (was source=hashicorp + product)

One-off dist sources (nodedist, zigdist, etc.) keep the explicit
source= key since they're already one-liners.

Parser still accepts the old format via the default fallback branch.
2026-03-11 01:05:08 -06:00
AJ ONeal
d0801d0952 fix(classifypkg): handle gittag HEAD entries for legacy cache
Tagless repos (only HEAD, no real version tags): rewrite HEAD version
to Node.js-compatible format (v2023.10.10-18.42.21) with full UTC
datetime.

Repos with real tags + HEAD: tag HEAD entries with "head" variant so
ExportLegacy filters them out (they shouldn't appear in legacy cache).
2026-03-11 00:57:16 -06:00
AJ ONeal
695df60a9d fix(postgres): add legacy EnterpriseDB releases and appendLegacy pipeline step
Hardcode the old 10.12, 10.13, 11.8, 12.3 releases from EnterpriseDB
that predate the bnnanet/postgresql-releases GitHub repo. Both postgres
and psql now match the live cache exactly.
2026-03-11 00:43:18 -06:00