[$ xmrhost] _

$ man node-gpu

[$ ] node/gpu — no-KYC offshore GPU servers, Iceland and Romania

// NAME

node/gpu — no-KYC offshore GPU servers, no-KYC crypto billing (XMR / BTC / Lightning / LTC / ETH / USDT via OxaPay), deployed in Iceland and Romania.

// SYNOPSIS

xmrhost-cli provision --type=<slug> --region=<is|ro>

$ xmrhost-cli list --type=gpu

// 3 plans returned. all xmr-billed.

slug name spec $/mo notes
gpu-lite GPU Lite — RTX 4090 8c 64GBDDR5 $489 Offshore RTX 4090 for AI inference & rendering. gpu-pro GPU Pro — RTX A6000 16c 128GBDDR5 $1099 Workstation-class GPU for ML training offshore. gpu-beast GPU Beast — H100 32c 256GBDDR5 $2899 Datacenter-grade H100 for serious LLM workloads.

// WHEN TO PICK GPU

$ man -k workload-fit

GPU is the right pick for open-weight LLM inference (Llama 3.x, Mistral, Mixtral, Qwen, DeepSeek), Stable-Diffusion image / video pipelines, embedding-model serving, fine-tuning runs that fit in the available VRAM, and red-team / jailbreak-evaluation workloads that some hyperscaler clouds gate on policy. The xmrhost GPU catalog is Iceland-only (hydroelectric / geothermal power, RIPE PI), spec'd around RTX 4090 24 GB at the lite/pro tiers and H100 80 GB SXM at the beast tier.

GPU is not the right pick for: training runs that need multi-node NCCL across 8+ GPUs (the xmrhost catalog does not currently offer multi-node clusters; on the operator roadmap), pure CPU workloads (use /node/vps or /node/dedicated), workloads that need access to the closed-weight cloud APIs (the operator runs your inference on hardware you control; you bring the model). The trade-off vs hyperscaler GPU is documented at /playbook/ai-inference.

What you get vs cloud GPU: jurisdictional independence (Iceland is outside the US export-control perimeter for the workloads that hyperscaler ToS gates), billing privacy (XMR / no-KYC vs cloud-card-on-file), and a clean Linux machine with the inference stack preinstalled (vLLM, Ollama, llama.cpp, PyTorch + transformers, CUDA 12.x, cuDNN, NCCL). The operator does not run a hosted-LLM gateway; you run yours.

// TIER COMPARISON

$ diff /etc/xmrhost/tiers.d/*

gpu-lite (single RTX 4090 24 GB, 32 GB system RAM) fits 7B-13B models in FP16 with comfortable batch size, or 70B in 4-bit GGUF via llama.cpp. gpu-pro (2× RTX 4090 24 GB, 64 GB) fits 70B in FP16 across both GPUs via vLLM tensor-parallelism, or runs two separate 13B endpoints. gpu-beast (H100 80 GB SXM, 128 GB) fits 70B in FP16 single-card with batch headroom, or fits a smaller fine-tuning run.

All tiers ship with the same software stack — what differs is the silicon. The operator does not artificially limit the higher tiers (no driver-level batch caps, no clock-speed throttles); the customer's vLLM / Ollama / training script runs at the hardware's documented spec.

// FAQ

$ faq -t gpu

Q.What inference engines are preinstalled?

A.vLLM (latest stable), Ollama (with the model-registry mirror configured), llama.cpp (CUDA-compiled), PyTorch with CUDA support, the HuggingFace transformers stack. Customers needing other engines (TensorRT-LLM, mlc-llm, exllama2, sglang) install via the package manager or build from source — the box is a normal Linux machine with NVIDIA driver + container runtime support.

Q.Can I serve a public LLM endpoint?

A.Yes — the AUP (/legal/aup) does not restrict serving open-weight LLM inference endpoints. The operator does not provide a hosted gateway; you front your own with Caddy / Nginx and configure your own rate-limits + authentication. The /playbook/ai-inference walkthrough covers the common deployment shape.

Q.Is the H100 SXM in the beast tier real or marketing?

A.Real, but procurement-sensitive. If the listed accelerator is unavailable at order time the operator surfaces the substitute via /contact before charging. The operator does not silently downgrade hardware without informing the customer; misrepresenting the spec on the order receipt is the explicit failure mode the operator commits to avoid.

Q.Where is the GPU server hosted?

A.Iceland (Reykjavik). Hydroelectric / geothermal power is the operator's preference for GPU workloads on cost and emissions footprint; the Romania racks do not have the GPU-density power / cooling provisioned. Jurisdictional posture: /location/is.

Q.Do I need to pay in Monero for GPU hosting?

A.No — same payment rails as the rest of the catalog (XMR recommended; BTC / Lightning / LTC / ETH / USDT accepted via OxaPay). For an operator running an LLM endpoint the consistent threat-model choice is XMR; the rationale at /why-monero applies.

// SEE ALSO

$ ls /usr/share/doc/xmrhost/related

// no-kyc crypto billing (xmr recommended; btc / ltn / ltc / eth / usdt accepted) — why-monero covers the rationale, payments the flow.