Skip Navigation

Any of you have a self-hosted AI "hub"? (e.g. for LLM, stable-diffusion, ...)

I've been looking into self-hosting LLMs or stable diffusion models using something like LocalAI and / or Ollama and LibreChat.

Some questions to get a nice discussion going:

  • Any of you have experience with this?
  • What are your motivations?
  • What are you using in terms of hardware?
  • Considerations regarding energy efficiency and associated costs?
  • What about renting a GPU? Privacy implications?
22 comments
  • I've installed Ollama on my Gaming Rig (RTX4090 with 128GB ram), M3 MacBook Pro, and M2 MacBook Air. I'm running Open WebUI on my server which can connect to multiple Ollama instances. Open WebUI has it's own Ollama compatible API which I use for projects. I'll only boot up my gaming rig if I need to use larger models, otherwise the M3 MacBook Pro can handle most tasks.

    • Is that 128GB of VRAM? Because normal RAM doesn't matter unless you want to run the model on the CPU, which is much slower.

      • That's 128GB RAM, the GPU has 24GB VRAM. Ollama has gotten pretty smart with resource allocation. Smaller models can fit soley on my VRAM but I can still run larger models on RAM.

  • I have a Asus laptop with a GTX 1660 ti with 6GB VRAM. I use Jan for LLMs, only the 7B models or lower are small enough for my hardware though, and Krita with the AI Image Generation plugin for image generation, most things work in it, except it fails with an 'out of VRAM' error if I try to inpaint an area more than about 1/8 of my canvas size.

  • I run ollama on my laptop in a VM with open web UI. It works great and I have plenty of models to choose from.

    I recently was playing around with TTS and it is pretty solid as well. I am thinking about taking the smaller phi models and throwing it onto my pine64 quartz64 for a portable AI assistant while traveling. The only potential problem is the time it takes to process.

22 comments