现在你可以在本地设备上运行通义千问3代码解释器啦！

前沿技术 · 2025-8-1 14:38:48

嘿，大家！可能你们还不知道，通义千问发布了Qwen3Coder，这是一款在编码和智能体任务方面能与GPT4.1以及Claude 4Sonnet相媲美的前沿模型。我们把这个拥有4800亿参数的模型体积缩小到了仅150GB（原来是512GB）。而且，它能在100万个上下文长度下运行。如果你想以全精度运行该模型，可以使用我们的Q8量化方案。在150GB统一内存，或者135GB随机存取存储器（RAM）加上16GB显存（VRAM）的配置下，每秒能处理超过6个令牌。运行Qwen3Coder的GGUF文件链接：https://huggingface.co/unsloth/Qwen3Coder480BA35BInstructGGUF祝大家运行顺利！别忘了查看我们关于Qwen3Coder的教程，里面介绍了如何通过优化设置和配置来实现快速推理：https://docs.unsloth.ai/basics/qwen3coder

热美云子 · 2025-8-1 14:45:25

很高兴听到（你的回复）。我之所以这么问，是因为我正计划围绕一款显存上限为128GB的Strix Halo芯片搭建一个大语言模型（LLM）系统。谢谢！

帅蓝树儿 · 2025-8-1 14:47:54

谢谢！

kui2004 · 2025-8-1 14:52:07

原句拼写有误，正确的句子应该是 “Would appreciate it” ，意思是：会很感激（此事）。但在日常中文习惯里，更常表达为：
会很感激。

fdsgsg · 2025-8-1 16:01:10

我也是个新手。当我尝试用这个在openwebui中拉取模型时，遇到了以下错误。我用的是最新版的ollama主分支。hf.co/unsloth/Qwen3Coder480BA35BInstructGGUF拉取模型清单时出错：400 错误，提示信息为 {"error":"指定的存储库包含分片的GGUF。Ollama目前还不支持这个。更多信息请关注此问题：https://github.com/ollama/ollama/issues/5245"}

绿望光 · 2025-8-1 16:29:06

上下文长度会是多少？我用的配置完全一样。

凝视 · 2025-8-1 16:38:06

在这么小的量化级别下，性能下降情况如何？它还能用吗？和比如Llama 3.3 70B相比怎么样？

酷紫电子 · 2025-8-1 16:55:07

你觉得用24GB显存的RX7900 XTX显卡和128GB内存（这样总体能达到150GB）来运行值得一试吗？还是说对于实际的编程工作来说，速度会慢得让人难以忍受？

zsz8868 · 2025-8-4 13:39:07

刚刚试了hf.co/unsloth/Llama3.370BInstructGGUF
:IQ2_XXS
这模型和量化效果非常不错。真不敢相信我一直都忽视了小量化模型。再次感谢。

wuchao · 2025-8-4 13:54:06

哇！这太神奇了，从512GB缩减到150GB却没有丢失任何东西？太厉害了！我得亲眼看看才能相信！

橙花月 · 2025-8-4 15:08:07

感谢你尝试使用它们，非常感激，也感谢你的支持。顺便说一下，通常我们会推荐 Q2_K_XL 及以上型号。

xinzhyu · 2025-8-4 15:37:11

这取决于你有多认真，但模型优化引擎（MOE）的密集区域通常设计为适配 24GB 的显存，而专业人士会让其适配系统内存（CPU 内存）。所以，“侦察兵”（Scout）和“小牛”（Maverick）之间的差异在于所需的系统内存。你的瓶颈可能仍然是 CPU 和内存总线速度。因此，最大化 CPU 和内存带宽将极大地提升性能。这时，更多的内存通道，比如 4 通道、8 通道或 12 通道，会显著增加你的带宽。不过，CPU 也得能跟得上，所以你需要了解在 CPU 上加速推理需要哪些技术，最好购买一款针对推理有最佳加速功能的产品。具备相应指令集的核心越多越好。

lhczyc · 2025-8-5 08:00:14

是的

稍息立正 · 2025-8-6 10:55:22

太棒了

让爱飞翔 · 2025-8-6 11:17:14

作为一个刚接触这个，并且试图在本地运行它们的人，我需要什么样的硬件呢？我经常听说 3090 是处理这类事情的不错显卡。那么，搭配大容量内存能行吗？对 CPU 有最低要求吗？i9 可以吗？具体是哪一代呢？

热树虎 · 2025-8-6 11:26:16

我正在尝试找到一个命令行界面（CLI）系统，它也能在Studio Ultra M3上使用这个模型。到目前为止，不知出于什么原因，Opencode在处理它时就是会出问题。我正在通过LM Studio，使用MLX来运行服务。而Qwen Code（双子座模型的一个分支）勉强能运行一点，但经常出错，在工具使用上也会出乱子，而且速度非常慢。

123sdf123sd · 2025-8-6 13:29:05

我觉得这个是正确的。试试看。{% if tools %}
{{ <|im_start|>system
}}
{% if messages[0].role == system %}
{{ messages[0].content +

}}
{% endif %}
{{ " 工具

你可以调用一个或多个函数来辅助处理用户查询。

在 <tools></tools> XML 标签内为你提供了函数签名：
<tools>" }}
{% for tool in tools %}
{{ "
" }}
{{ tool | tojson }}
{% endfor %}
{{ "
</tools>

对于每次函数调用，在 <tool_call></tool_call> XML 标签内返回一个包含函数名和参数的 JSON 对象：
<tool_call>
{"name": <函数名>, "arguments": <参数 JSON 对象>}
</tool_call><|im_end|>
" }}
{% else %}
{% if messages[0].role == system %}
      {{ <|im_start|>system
+ messages[0].content + <|im_end|>
}}
{% endif %}
{% endif %}
{% set ns = namespace(multi_step_tool=true, last_query_index=messages|length  1) %}
{% for forward_message in messages %}
{% set index = (messages|length  1)  loop.index0 %}
{% set message = messages[index] %}
{% set current_content = message.content if message.content is defined and message.content is not none else  %}
{% set tool_start = <tool_response> %}
{% set tool_start_length = tool_start|length %}
{% set start_of_message = current_content[:tool_start_length] %}
{% set tool_end = </tool_response> %}
{% set tool_end_length = tool_end|length %}
{% set start_pos = (current_content|length)  tool_end_length %}
{% if start_pos < 0 %}
      {% set start_pos = 0 %}
{% endif %}
{% set end_of_message = current_content[start_pos:] %}
{% if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
      {% set ns.multi_step_tool = false %}
      {% set ns.last_query_index = index %}
{% endif %}
{% endfor %}
{% for message in messages %}
{% if (message.role == "user") or (message.role == "system" and not loop.first) %}
      {{ <|im_start|> + message.role +
+ message.content + <|im_end|> +
}}
{% elif message.role == "assistant" %}
      {% set m_content = message.content if message.content is defined and message.content is not none else  %}
      {% set content = m_content %}
      {% set reasoning_content =  %}
      {% if message.reasoning_content is defined and message.reasoning_content is not none %}
         {% set reasoning_content = message.reasoning_content %}
      {% else %}
         {% if </think> in m_content %}
            {% set content = (m_content.split(</think>)|last).lstrip(
) %}
            {% set reasoning_content = (m_content.split(</think>)|first).rstrip(
) %}
            {% set reasoning_content = (reasoning_content.split(<think>)|last).lstrip(
) %}
         {% endif %}
      {% endif %}
      {% if loop.index0 > ns.last_query_index %}
         {% if loop.last or (not loop.last and (not reasoning_content.strip() == )) %}
            {{ <|im_start|> + message.role +
<think>
+ reasoning_content.strip(
) +
</think>

+ content.lstrip(
) }}
         {% else %}
            {{ <|im_start|> + message.role +
+ content }}
         {% endif %}
      {% else %}
         {{ <|im_start|> + message.role +
+ content }}
      {% endif %}
      {% if message.tool_calls %}
         {% for tool_call in message.tool_calls %}
            {% if (loop.first and content) or (not loop.first) %}
                  {{
}}
            {% endif %}
            {% if tool_call.function %}
                  {% set tool_call = tool_call.function %}
            {% endif %}
            {{ <tool_call>
{"name": " }}
            {{ tool_call.name }}
            {{ ", "arguments":  }}
            {% if tool_call.arguments is string %}
                  {{ tool_call.arguments }}
            {% else %}
                  {{ tool_call.arguments | tojson }}
            {% endif %}
            {{ }
</tool_call> }}
         {% endfor %}
      {% endif %}
      {{ <|im_end|>
}}
{% elif message.role == "tool" %}
      {% if loop.first or (messages[loop.index0  1].role != "tool") %}
         {{ <|im_start|>user }}
      {% endif %}
      {{
<tool_response>
}}
      {{ message.content }}
      {{
</tool_response> }}
      {% if loop.last or (messages[loop.index0 + 1].role != "tool") %}
         {{ <|im_end|>
}}
      {% endif %}
{% endif %}
{% endfor %}
{% if add_generation_prompt %}
{{ <|im_start|>assistant
}}
{% if enable_thinking is defined and enable_thinking is false %}
      {{ <think>

</think>

}}
{% endif %}
{% endif %}

heronylee · 2025-8-6 13:29:19

我可能会试试这些选项……我的96GB显存的RTX Pro 6000在这儿能吃得开吗？350亿的活跃参数听起来完全在它的能力范围内，但4800亿的就不行了。对我来说，运行这个模型的最佳方式是什么呢？合并（模型）对我有用吗？还是说我为什么要直接用llama.cpp而不用ollama呢？谢谢！

慢情星 · 2025-8-6 15:35:35

在Linux系统上它的速度难道达不到100以上吗？

冰河世纪 · 2025-8-7 09:42:28

谢谢你的分享

有块砖头 · 2025-8-8 08:01:33

抱歉问个菜鸟问题，我们能用这个搭配LM studio 或 Ollama 使用吗？

xiaojin · 2025-8-9 15:20:30

localLLama和localLLM什么时候才能再次涉及我们真正能在本地运行的模型呢？如今全是营销手段……

𠀡生悪忎 · 2025-8-9 15:51:03

哎呀，不确定为什么链接不起作用，不过链接在这儿：https://huggingface.co/unsloth/Qwen3Coder480BA35BInstructGGUF/discussions/8

忄_bEZgX · 2025-8-10 11:26:32

你一直在下载“不懒惰”模型还是其他什么东西呀？：）

黑乐鱼 · 2025-8-11 09:26:40

更小的模型。你可以运行任何参数为160亿及以下的模型。你可以在此处查看我们支持的所有模型：https://docs.unsloth.ai/getstarted/allourmodels

L_LiBIy · 2025-8-15 15:03:01

当然可以！以下是用口语化中文重新表达的内容，更符合中国人的阅读习惯，并保留了换行：谢谢！这就是你想表达的内容，我已经按照要求处理啦～如果你还有别的内容需要优化，欢迎继续发我！

扯淡 · 2025-8-18 09:30:38

它跟qwen332b高精度版比怎么样？以前我觉得4bit以下的模型好像损失挺大的。

		自动登录	找回密码
密码			立即注册

现在你可以在本地设备上运行通义千问3代码解释器啦！

相关帖子

26 回复