Open-source ML.
Ready to run.

Browse the Hub. Download with a click. Run any task on your own hardware: detection, segmentation, VLMs, speech, diffusion, and more.
No cloud. No upload. Your data never leaves your GPU.

DETR SAM 2 SigLIP 2 Qwen-VL Whisper FLUX SegFormer Llama 3 SDXL
LocalML
LocalML
Recent sessions
detr-resnet-50
Object detection · 2m
sam2-hiera-large
Mask generation · 1h
Qwen2.5-VL-3B-Instruct
Image-text-to-text · 3h
segformer-b5-cityscapes
Segmentation · yesterday
Model Hub
Browse Hugging Face · download and run locally
All VLM Text Segmentation SAM Detection Diffusion ASR
detr-resnet-50
facebook/detr-resnet-50
object-detectiontransformers
Download · 159 MB
sam-vit-base
facebook/sam-vit-base
mask-generationtransformers
Open
Qwen2.5-VL-3B-Instruct
Qwen/Qwen2.5-VL-3B-Instruct
image-text-to-texttransformers
Download · 7.6 GB
segformer-b0-finetuned-ade
nvidia/segformer-b0…
image-segmentationtransformers
Download · 113 MB

Not just LLMs.

Eleven task workspaces. Every major modality.

Detection

DETR, YOLOS, RT-DETR, D-FINE, Table Transformer. Draws labeled boxes server-side.

Segmentation

SegFormer, Mask2Former, OneFormer, EoMT. Panoptic, instance, semantic — composited overlays.

Mask generation

SAM v1, SAM 2, SAM 2.1, SAM 3. Auto grid-sampling mode, full multi-region output.

VLMs

Qwen-VL, LLaVA, Florence-2, Moondream, PaliGemma. Ask anything about an image.

Speech

Whisper, Wav2Vec2, MMS for ASR. SpeechT5, Bark, VITS for TTS. Both directions, long-audio aware.

Classification

ViT, ResNet, ConvNeXt, BEiT, SigLIP, CLIP. Image, zero-shot, audio — confidence-ranked labels.

Diffusion

Stable Diffusion, SDXL, FLUX, Kandinsky, PixArt. Text-to-image, img2img, inpaint.

Text generation

Llama, Mistral, Qwen, Gemma, Phi, DeepSeek. Chat-template aware, reasoning-model aware.

Embeddings

BGE, E5, Jina, Nomic, Snowflake, GTE. Dense vectors for RAG, search, similarity.

Depth

DPT, MiDaS, ZoeDepth, Depth Anything v1/v2, Depth Pro. Single image → colorized depth map.

Documents · OCR

TrOCR, Donut, LayoutLMv3, Pix2Struct. Read scanned pages, receipts, forms — ask questions about them.

Everything in the Hub, ready to run.

200+ model families, each one verified against our architecture whitelist. If it shows up in LocalML, it loads — no broken downloads, no missing packages, no guesswork.

Detection

DETRYOLOSRT-DETRRT-DETRv2D-FINEConditional-DETRDeformable-DETRTable-TransformerOWL-ViTOWLv2Grounding-DINO

Segmentation

SegFormerMaskFormerMask2FormerOneFormerEoMTUperNetBEiTDPTDETR-panopticMobileViT

Mask generation

SAMSAM 2SAM 2.1SAM 3MedSAM

VLMs

Qwen-VLQwen2.5-VLQwen3-VLLLaVALLaVA-NextFlorence-2MoondreamPaliGemmaIdefics2/3SmolVLMKosmos-2

Text generation

Llama 3MistralQwen 2/3Gemma 2/3/4Phi 3/4DeepSeekSmolLMStarCoder 2CohereGraniteMiniMax

ASR · TTS

WhisperDistil-WhisperWav2Vec2MMSMoonshineParakeetSpeechT5BarkVITS

Diffusion

SD 1.5SD 2.1SDXLSD 3 / 3.5FLUX.1KandinskyPixArtSanaKolors

Embeddings

BGEE5Jina v2/v3NomicSnowflake ArcticGTEmxbaiall-MiniLM

Classification

ViTDeiTSwinConvNeXtBEiTResNetEfficientNetMobileNetCLIPSigLIPSigLIP 2

Depth

DPTGLPNZoeDepthDepth AnythingDepth Anything v2Depth ProMiDaS

Documents · OCR

TrOCRDonutLayoutLMLayoutLMv2LayoutLMv3Pix2Struct

Runs everywhere you do.

Native installers for Windows, macOS, and Linux. CUDA · Apple MPS · CPU.