Red Team Skill — 深度分析报告

weixin_45880675

101人浏览 · 2026-07-02 18:45:22

weixin_45880675 · 2026-07-02 18:45:22 发布

分析对象：https://clawhub.ai/retrodigio/skills/red-team

一、作者背景分析

retrodigio 是一个活跃的开发者/安全研究员：

项目	方向	备注
clawster	Agent 编排器 — 通过 Telegram 管理 Claude Code Agent 集群	最核心项目
claude-channel-slack ⭐2	Slack Channel 插件桥接 Claude Code	社区反馈最好
mjproxy ⭐1	自托管 MidJourney API Proxy	实用工具
letta-claude-proxy ⭐1	Letta Server 代理	AI infra
openclaw-skills	自定义 OpenClaw 技能仓库	本技能来源
claude-code-plugins	Claude Code 插件市场	生态贡献
gastown (private)	多 Agent 工作区管理器	可能是生产项目
ralph-tui (private)	TUI 工具	-

关键发现： retrodigio 聚焦在 Agent 编排 + Claude Code 生态 + 安全 交叉领域，和 FletcherFrimpong 可能是同一人或团队（两个 clawhub 账号分别在 fletcherfrimpong 和 retrodigio 下发布了同名 red-team skill，且 retrodigio 的 GitHub 于 2024-09 注册，较新）。

二、技能全景能力分析

2.1 核心定位

对抗式多 Agent 辩论引擎 — 让多个不同立场的 AI Agent 针对一个问题展开辩论，最终合成决策建议。

这不是传统意义上的安全测试工具，而是一个决策压力测试框架。

2.2 能力矩阵

能力维度	具体能力	实现方式
👤 角色模拟	12 个预设人格	`PERSONAS` 字典，每个设置独立的 system prompt
👤 角色模拟	自定义人格扩展	`--custom-personas` 加载 JSON
💬 多轮辩论	N 轮批评循环	`Critique Round 1..N`，每轮互相批评+修正
💬 多轮辩论	立场修正	每轮后根据批评调整
📊 量化评分	信心分数 (0-100)	每个 Agent 给所有立场打分
📊 量化评分	风险矩阵	合成阶段输出
📝 结构化输出	最终决策报告	执行摘要 + 共识点 + 分歧 + 风险 + 建议
🔌 多后端支持	Claude/Codex/Gemini CLI	3 种后端可切换
📁 上下文注入	上下文文件支持	`--context-file` 传入
🚀 快速模式	1 轮速查	`--rounds 1`

2.3 人格列表深度解析

人格键	名称	类型	典型场景
`bull`	The Bull	乐观派	产品发布、投资
`bear`	The Bear	风险防范	安全审计、风控
`contrarian`	The Contrarian	反向思考	破解群体思维
`operator`	The Operator	务实派	执行评估、落地分析
`economist`	The Economist	宏观视角	投资、战略
`local-realist`	The Local Realist	本地化视角	本地部署、区域性决策
`cash-flow`	The Cash Flow Analyst	财务视角	成本分析、ROI
`regulator`	The Regulator	合规视角	合规审计
`technologist`	The Technologist	技术视角	技术选型、架构评估
`customer`	The Customer	客户视角	产品、需求
`ethicist`	The Ethicist	伦理视角	AI 伦理、数据隐私
`historian`	The Historian	历史视角	借鉴历史模式

2.4 输出流程

Step 1: Initial Proposals
  ↓ 每个 Agent 独立给出初始立场
Step 2: Critique Rounds (x N)
  ↓ Agent 互相批评+修正立场
Step 3: Refinement
  ↓ 各 Agent 基于批评更新立场
Step 4: Conviction Scoring
  ↓ 每个 Agent 给所有立场打分
Step 5: Synthesis
  ↓ 合成最终决策报告

三、技术实现分析

3.1 架构

┌────────────────────────────────────┐
│    red-team.py (主引擎)             │
│                                     │
│  ┌─ 1. 加载 PERSONAS ──────────┐   │
│  │   (built-in + custom)       │   │
│  └─────────────────────────────┘   │
│                                     │
│  ┌─ 2. 调用 CLI 后端 ───────────┐  │
│  │   claude/codex/gemini --print │  │
│  │   通过 subprocess.Popen       │  │
│  └─────────────────────────────┘   │
│                                     │
│  ┌─ 3. 辩论循环 ────────────────┐  │
│  │   Round 1..N:                │  │
│  │   - 每个 Agent 读前一轮输出   │  │
│  │   - 输出批评/修正后立场       │  │
│  └─────────────────────────────┘   │
│                                     │
│  ┌─ 4. 合成器 ─────────────────┐   │
│  │   生成结构化 Markdown 报告   │   │
│  └─────────────────────────────┘   │
└────────────────────────────────────┘

3.2 依赖

Python 3（标准库，无第三方依赖）
claude CLI / codex CLI / gemini CLI（三选一）
通过 subprocess.Popen 调用 CLI 的 --print 模式获取纯文本输出

3.3 辩论机制

关键设计亮点：每个 Agent 可以看到其他 Agent 上一轮的输出，然后基于此展开批评和修正。这不是简单的"每人说一句"，而是：

Agent A 输出初始立场
Agent B 阅读 A 的立场，发表批评
Agent B 同时阅读自己的前一轮立场，进行自我修正
裁判视角（最后一轮）检查所有辩论历史，给出最终裁定

四、适用场景分析

✅ 最适合

场景	为什么用	推荐人格
产品决策 “该不该上线这个功能？”	多视角压力测试	bull, bear, customer, operator
安全策略 “这个架构有什么隐患？”	攻击者+防守者对抗	bear, regulator, technologist
技术选型 “MySQL vs PostgreSQL？”	多维度评估	operator, economist, technologist
投资分析 “这笔投不投？”	财务+风险视角	bull, bear, cash-flow, economist
合规评估 “能通过 ISO 审计吗？”	法规+运营视角	regulator, operator, bear
战略方向 “该不该转型做 SaaS？”	宏观+历史视角	economist, historian, contrarian

❌ 不适合

简单事实性问题（“今天几号？”）
时间敏感紧急决策（“服务器挂了怎么办？”）
已经做出的决策（事后验证不叫压力测试）
纯个人情感选择

五、学习应用方案

Phase 1：快速上手（15 分钟）

# 1. 确认脚本存在
ls -la ~/.openclaw/workspace/skills/red-team/scripts/red-team.py

# 2. 列出所有人格
python3 ~/.openclaw/workspace/skills/red-team/scripts/red-team.py --list-personas

# 3. 跑一个快速测试（1 轮）
python3 ~/.openclaw/workspace/skills/red-team/scripts/red-team.py \
  -q "这台内网测试机需要修防火墙吗？" \
  -p "bull,bear,operator" \
  -r 1

⚠️ 需要安装 claude/codex/gemini CLI 后端。当前环境未安装，无法直接运行。建议先安装：
npm i -g @anthropic-ai/claude-code   # 或用 codex, gemini

Phase 2：实战场景模板（按领域）

场景 A — 服务器安全决策

python3 red-team.py \
  -q "10.36.72.118 当前安全基线评 3/10，紧急修复 NOPASSWD sudo 是否优先于防火墙？" \
  -p "bear,operator,regulator" \
  -r 2 \
  -o assessment/red-team-sudo-vs-firewall.md

场景 B — 产品 Go/No-Go

python3 red-team.py \
  -q "我们该不该把内部工具产品化推向市场？" \
  -p "bull,customer,operator,technologist" \
  -r 3 \
  -c product-context.md \
  -o decision/product-go-decision.md

场景 C — 技术架构评审

python3 red-team.py \
  -q "从单体迁移到微服务架构是否值得？" \
  -p "operator,economist,technologist,historian" \
  -r 2 \
  -c current-architecture.md

Phase 3：自定义人格（进阶）

{
  "pentester": {
    "name": "The Pentester",
    "description": "Security-first, finds vulnerabilities in every plan",
    "system": "You are The Pentester — you see every idea through a security lens. You identify attack vectors, data exposure risks, authentication flaws, and compliance gaps. You don't trust default assumptions about security."
  },
  "ops-engineer": {
    "name": "The Ops Engineer",
    "description": "Runtime reality check, maintenance burden",
    "system": "You are The Ops Engineer — you think about what happens AFTER deployment. Monitoring, logging, backups, incident response, on-call burden. You've seen too many systems that look good on paper but fall apart in production."
  }
}

Phase 4：与已有 skill 联动

已有 skill	联动方式
`cyber-security-engineer`	扫描结果作为 red-team 的 context，让 3 个 agent 辩论"应该先修哪个漏洞？"
`github-research`	搜索竞品/开源方案，结果作为 context 输入辩论
日常决策	Agent 自动将需要判断的问题提交 red-team 辩论

六、总结评价

维度	评分	说明
设计理念	⭐⭐⭐⭐⭐	多 Agent 对抗式辩论是决策质量的有效提升手段
实现质量	⭐⭐⭐⭐	Python 标准库无依赖，但依赖外部 CLI 后端
易用性	⭐⭐⭐⭐⭐	接口简洁，人格预配置齐全
扩展性	⭐⭐⭐⭐	自定义人格支持 JSON，但无法修改辩论流程
实用价值	⭐⭐⭐⭐⭐	安全决策、技术选型、合规评估都能用
当前可用性	⭐⭐	需要安装 claude/codex/gemini CLI 才能跑

一句话评价：一个设计优雅的多 Agent 决策压力测试框架，把"红队思维"从前沿安全领域带到通用决策场景。

https://edu.csdn.net/learn/39067/627173?utm_source=2019755004

汇聚全球AI编程工具，助力开发者即刻编程。

更多推荐

AI编程工具怎么选？5款主流工具半年深度体验的实战建议

从那以后我的习惯是：AI生成的代码必须人工审核、涉及业务逻辑的代码多验证几个边界场景、如果AI给出的建议你不太确定，先去查文档而不是直接采纳。但半年实际使用下来，我的结论是：不存在"最好的"，只存在"最适合你的"。比起自己做review，AI会看得更细：变量命名不规范、潜在的空指针、遗漏的边界判断、可能的性能问题。真正需要Claude Code的，是那些需要长周期开发、大量上下文记忆的复杂项目。这