- Elixir 100%
| lib | ||
| scripts | ||
| test | ||
| .formatter.exs | ||
| .gitignore | ||
| mix.exs | ||
| mix.lock | ||
| README.md | ||
Hex Playground
Corpus playground for running local tools against large sets of Hex.pm packages.
Setup
cd ~/Development/hex-playground
mix deps.get
You can run it as a Mix task:
mix hex_playground.fetch --mode latest --limit 300 --concurrency 8
Or build a standalone escript:
mix escript.build
./hex_playground fetch --mode latest --limit 300
Fetch a corpus
Fetch and extract the latest release of packages from the signed Hex repository registry:
mix hex_playground.fetch --mode latest --limit 300 --concurrency 8
This creates:
manifest.json— package metadata, paths, mirror used, and file-extension countssources/<package>-<version>/— extracted package sourcestarballs/<package>-<version>.tar— cached Hex tarballs
Useful modes:
# Latest release of every public Hex package
mix hex_playground.fetch --mode latest --concurrency 16 --prune-non-elixir
# Every public package version. Large: currently ~150k releases.
mix hex_playground.fetch --mode all --concurrency 16
# Top packages by downloads, using the Hex HTTP API for ranking
mix hex_playground.fetch --mode top --limit 1000 --concurrency 16
latest and all use the Hex repository endpoint:
https://repo.hex.pm/versions
Tarballs are downloaded from:
https://repo.hex.pm/tarballs/<name>-<version>.tar
and unpacked with hex_core.
Mirror balancing
Tarball downloads can be balanced across multiple repository mirrors. Registry
discovery still uses --registry-url so the signed Hex.pm registry remains the
source of truth.
mix hex_playground.fetch \
--mode latest \
--limit 1000 \
--concurrency 16 \
--mirror https://repo.hex.pm \
--mirror https://cdn.jsdelivr.net/hex \
--mirror-strategy round_robin
You can also pass mirrors comma-separated:
mix hex_playground.fetch \
--mirror https://repo.hex.pm,https://cdn.jsdelivr.net/hex \
--mirror-strategy random
Available strategies:
round_robin— distribute package tarball attempts across mirrorsrandom— pick a random starting mirror per package
If a mirror fails for a tarball, the downloader falls back to the remaining
mirrors. Only https://repo.hex.pm is the official Hex.pm mirror; other mirrors
are useful for public tarballs but should be treated as untrusted.
Build a serveable Hex.pm-compatible mirror
Mirror the signed Hex registry files and package tarballs into a static-file layout compatible with Hex clients:
mix hex_playground.mirror \
--out mirror \
--concurrency 32 \
--package-concurrency 16 \
--mirror https://repo.hex.pm \
--mirror https://cdn.jsdelivr.net/hex
This creates:
mirror/namesmirror/versionsmirror/public_keymirror/packages/<name>mirror/tarballs/<name>-<version>.tarmirror/.hex_playground/manifest.ndjsonmirror/.hex_playground/failures.ndjsonwhen downloads failmirror/.hex_playground/summary.json
Registry metadata is always fetched from --registry-url, defaulting to the
official https://repo.hex.pm. Tarball downloads are balanced across --mirror
URLs with fallback when a mirror fails. Existing valid tarballs are reused unless
--force is passed.
For a small test run:
mix hex_playground.mirror --out mirror-test --limit 20 --concurrency 4
Serve the mirror with any static HTTP server rooted at mirror/:
cd mirror
python3 -m http.server 8080
The served paths must match Hex's repository paths exactly:
/names
/versions
/public_key
/packages/<name>
/tarballs/<name>-<version>.tar
Verify a completed or partial mirror:
mix hex_playground.mirror.verify --out mirror
The verifier checks required registry files, package metadata files referenced by
the manifest, tarball presence, and tarball unpacking. Hex tarballs with metadata files too large for hex_core's in-memory unpack safety limit are treated as valid, because they are still serveable by a mirror and fetchable by Hex clients. It writes
mirror/.hex_playground/verify-summary.json.
To use the mirror as a drop-in replacement for the default Hex repo in an isolated Mix home:
MIX_HOME=/tmp/hex-mirror-mix \
mix hex.repo set hexpm \
--url http://localhost:8080 \
--public-key mirror/public_key
Then ordinary Hex commands use the mirror:
MIX_HOME=/tmp/hex-mirror-mix mix hex.package fetch a1 0.25.0
If you add the mirror under a new repo name instead of overriding hexpm, Hex
will reject upstream registry metadata unless you set
HEX_NO_VERIFY_REPO_ORIGIN=1, because the signed package records still declare
their origin as hexpm.
Run tools against every package
Use scripts/run_tool.exs with a command after --. Placeholders:
{name}— Hex package name{version}— package version{path}— relative source path{abs_path}— absolute source path
Examples:
./scripts/run_tool.exs --limit 20 -- elixir -e 'IO.puts(System.get_env("HEX_PLAYGROUND_PACKAGE"))'
./scripts/run_tool.exs --limit 300 -- bash -lc 'find lib src -type f 2>/dev/null | wc -l'
./scripts/run_tool.exs --limit 300 -- bash -lc 'mix ex_dna --format json 2>/dev/null || true'
Each run writes:
runs/<timestamp>/results.ndjsonruns/<timestamp>/summary.json- one log file per package
Notes
This directory is intentionally data-heavy. Keep generated corpus data out of git unless explicitly needed.