Skip Navigation

Posts
6
Comments
22
Joined
4 mo. ago

I’m just here for the moral superiority.🌱Mainly interested in FOSSCurrently in uni and working part-time as a developer and system administrator.

PC SpecsCPU: 7800X3DGPU: 7900XTXMemory: 64GBSystem: Arch

  • I’m really not fond of the profiling by automated means, but it seems like an inevitable consequence of the design of the threadiverse. Everything is public and easily accessible by anyone that would like to profile you.

    I certainly disapprove of moderation based on ideology. Moderation should be based on quality of the content and if it fits in the publicly readable rules. Definitely not some hidden analytics or if the user completely fits in the in-group of the moderator.

    I will admit that this might be a good way to find and filter out LLM based bots that are only there to promote or manipulate the conversation. But it should still be done according to public rules.

  • Is this post written by an LLM?

  • I trust them as much as Google, Meta, or any other big tech company. I won’t use their cloud services, but I do run there local models.

  • I’m no expert, but basically the way to unlock higher/full bandwidth for HDMI 2.1. This will allow the use of higher refresh rate, resolution, and bit depth + HDR. Right now you need to make sacrifices in at least one category with HDMI

  • What is the difference between this implementation and the reverse engineered patches that were published a few months ago by Michał Kopeć and Tomasz Pakuła?

    Edit: apparently it’s not the same patch, but Tomasz was CC’ed in the patch set so the timing might not be accidental.

  • I think I saw a similar comment on here last month. It was a user saying that Gemma claimed to send his chats to Google. Which is clearly a hallucination.

    I’m not a professional or expert on anything security and/or AI related but this is my take:

    • In general there will not be data sent anywhere if you use the big/trustworthy open-source backends.
    • Unless there are bigger security issues the model files shouldn’t contain such code.
    • Data could be sent using MCP/tool calling but you can see each tool call as it is happening so it can’t be hidden.

    If you really don’t trust something you can always try to use a network sniffer

  • I’m European and had to do the same, so it’s based on something else.

  • Don’t know about Ubuntu specifically but for all software I actually want to work, I wait for the first point release upon a major release.

  • LocalLLaMA @sh.itjust.works

    DeepSeek-V4 Pro (1.6T-A49) and Flash (284B-A13)

    huggingface.co /collections/deepseek-ai/deepseek-v4
  • Artificial Analysis just posted their results and there seems to be a similar increase in output token usage as the 35B model.

  • LocalLLaMA @sh.itjust.works

    Qwen3.6 27B released

    huggingface.co /Qwen/Qwen3.6-27B
  • Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.

  • I've been using it for the past few days and the output quality seems to be on par or slightly better than 3.5 27b. The biggest issue is the token usage that has exploded with this revision. It can easily reason for 20k-25k tokens on a question where the qwen3.5 models used 10k. Since it runs more than 3 times faster, it still finished earlier than the 27b, but I won't have any context/vram left to ask multiple questions.

    Artificial Analysis has similar findings.

  • I agree with the suggestion of the other commenters, just wanted to add that I personally run llama.cpp directly with the build in llama-server. For a single-user server this seems to work great and is almost always at the forefront of model support.

  • I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~120-130* token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

    *Edit: I had a configuration issue on my llama.cpp that reduced the performance. It was limited to 85 tk/s but that was user error on my part.

  • LocalLLaMA @sh.itjust.works

    Qwen3.6-35B-A3B released

    huggingface.co /Qwen/Qwen3.6-35B-A3B
  • Such a huge increase compared to previous months, with most of it coming from ‘64 bit’ and ‘0 64 bit’ seems suspicious. Don’t give me false hope…

  • Can confirm that I can’t open piefed.zip from Voyager now. Lemmy.zip works fine.

  • I got some weird specialised hardware over USB working via WinBoat. Might be an option for some.

  • vegan @lemmy.world

    Joey Carbstrong 3 hour interview with Gary Yourofsky

  • Unfortunately, the AI community prefers rushed buggy development over proper, tested releases, so the quants and maybe the PR weren’t fully working.

    As of 3 hours ago, unsloth was still updating their quants and guide. I don’t have time to test now but I wouldn’t judge the base model performance in the first few days when the bugs are still being worked out.

    They also recommend some unconventional parameters in the Unsloth guide.

    It could also be that the model is truly shit of course.

    Edit I just took a look at the llama.cpp repo and there are still issues with the implementation as well.

  • LocalLLaMA @sh.itjust.works

    30B-A3B GLM-4.7-Flash Released

    huggingface.co /zai-org/GLM-4.7-Flash
  • Free Open-Source Artificial Intelligence @lemmy.world

    30B-A3B GLM-4.7-Flash Released

    huggingface.co /zai-org/GLM-4.7-Flash
  • What features are still missing after this gets merged? Right now, I still use a glitchy adapter that randomly drops out every few minutes and sometimes crashes my whole Hyprland WM due to an unsteady state when no displays are recognised/connected.

    I would love to use my GPU and display with the features I paid for.