如何有效运行自主长期编码智能体：Opus 4.8 + GPT-5.5 实践

精选理由

Opus 4.8 规划 + GPT-5.5 执行，长期智能体实战配方

AI 摘要

Elvis 在讨论中分享了运行自主长期编码智能体的经验，指出大多数模型难以协调长期任务，容易过早暂停或出现奖励黑客行为。他建议使用 Opus 4.8 进行规划，GPT-5.5 执行任务，并用 Deepseek、Qwen、Kimi 等模型作为评估器。强调多模态目标比纯文本目标更有效，能帮助智能体保持方向。

原文 · elvis

Find the recording here: https://t.co/cWgyDGjqCG More coming next week.

Find the recording here: x.com/omarsar0/statu… More coming next week. elvis @omarsar0 How to effectively run autonomous long-running coding agents? This is one of the most exciting discussions on agents I've ever had. I recorded it and am making it freely available. (bookmark it) The idea of autonomous long-running agents is a real thing. We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next. One interesting discussion was around how to make the agent run for longer while ensuring it stays on track. Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking). What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this. But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator (via /goal), I am often using something like Deepseek or the latest models from Qwen, Kimi, and MiniMax, etc. Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals. Watch here: academy.dair.ai/events/cmplo7v… 🔗 View Quoted Tweet 💬 1 🔄 0 ❤️ 2 👀 1632 📊 2 ⚡

swyx (AI Engineer)06-12 05:31原文
Decoder06-12 17:10原文
marktechpost06-13 08:15原文
宝玉06-14 02:12原文
AI Will06-15 05:27原文

查看原推