ADD-0001: Template
| Metadata | Value |
|---|
| Status | Draft | In Review | Approved | Deployed | Deprecated |
| Created | YYYY-MM-DD |
| Author(s) | @username |
| RFC | RFC-NNNN (if applicable) |
| Model | GPT-4o | Claude 3.5 Sonnet | etc. |
Summary
One paragraph description: What does this agent do? Who is it for?
Agent Persona
Identity
1
2
3
4
| Name: [Agent Name]
Role: [e.g., "Code Review Assistant", "Customer Support Agent"]
Personality: [e.g., "Helpful, concise, technically accurate"]
Voice: [e.g., "Professional but approachable"]
|
Core Purpose
What is the primary job this agent is designed to do?
Target Users
| User Type | Use Case |
|---|
| Developers | Code review feedback |
| Support Team | Ticket triage |
Capabilities
| Tool | Description | Risk Level |
|---|
search_codebase | Search for code patterns | Low |
read_file | Read file contents | Low |
create_pr_comment | Post review comments | Medium |
approve_pr | Approve pull request | High |
1
2
3
4
5
6
7
8
9
10
11
12
13
| interface SearchCodebase {
query: string;
filePattern?: string;
maxResults?: number;
}
interface CreatePRComment {
prNumber: number;
file: string;
line: number;
body: string;
severity: 'suggestion' | 'warning' | 'error';
}
|
Supported Actions
System Prompt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| You are [Agent Name], a [role] for [company/project].
## Your Purpose
[Primary purpose statement]
## Guidelines
- [Guideline 1]
- [Guideline 2]
- [Guideline 3]
## Constraints
- Never [constraint 1]
- Always [constraint 2]
## Response Format
[Expected output format]
|
Guardrails & Safety
Hard Constraints (Never Violate)
| Constraint | Rationale | Enforcement |
|---|
| No code execution | Security risk | Tool not provided |
| No PII in logs | Privacy compliance | Output filtering |
| No external API calls | Data leakage risk | Network isolation |
Soft Constraints (Prefer to Follow)
| Constraint | Rationale | Override Condition |
|---|
| Max 3 suggestions per file | Avoid noise | Critical security issue |
| Response under 500 tokens | Readability | Complex explanation needed |
Content Filtering
- Input: [Describe input validation/sanitization]
- Output: [Describe output filtering rules]
Rate Limiting
| Scope | Limit | Window |
|---|
| Per user | 100 requests | 1 hour |
| Per repo | 500 requests | 1 hour |
| Global | 10,000 requests | 1 hour |
Human-in-the-Loop
Escalation Triggers
| Trigger | Action | SLA |
|---|
| Confidence < 70% | Request human review | - |
| Security finding | Alert security team | 1 hour |
| User disputes result | Escalate to maintainer | 24 hours |
Approval Requirements
| Action | Approval Required |
|---|
| Read code | None |
| Comment on PR | None |
| Request changes | None |
| Approve PR | Human co-approval |
| Merge PR | Not permitted |
Evaluation & Metrics
Success Metrics
| Metric | Target | Measurement |
|---|
| Accuracy | > 90% | Manual review sample |
| Helpfulness rating | > 4/5 | User feedback |
| False positive rate | < 5% | Disputed suggestions |
| Response time | < 30s | p95 latency |
Evaluation Dataset
Describe or link to the evaluation dataset used to test the agent.
| Category | Examples | Expected Behavior |
|---|
| Happy path | [link] | Provide accurate review |
| Edge cases | [link] | Graceful degradation |
| Adversarial | [link] | Refuse and explain |
A/B Testing Plan
If applicable, describe the rollout and testing strategy.
Error Handling
Failure Modes
| Failure | Detection | Recovery |
|---|
| Model timeout | 30s threshold | Retry with backoff |
| Rate limit | 429 response | Queue and retry |
| Invalid output | Schema validation | Fallback response |
| Model refusal | Content filter | Human escalation |
Fallback Behavior
What happens when the agent can’t complete its task?
1. Log the failure with context
2. Notify user: "I couldn't complete this analysis. A human reviewer will follow up."
3. Create ticket for human review
4. Continue processing other items
Observability
Logging
| Event | Log Level | Data Captured |
|---|
| Request received | INFO | user_id, repo, pr_number |
| Tool invocation | DEBUG | tool_name, params (redacted) |
| Response sent | INFO | response_time, token_count |
| Error | ERROR | error_type, stack_trace |
Monitoring Dashboards
- Latency: p50, p95, p99 response times
- Volume: Requests per hour/day
- Errors: Error rate by type
- Quality: User feedback scores
Alerting
| Condition | Severity | Action |
|---|
| Error rate > 5% | Warning | Slack notification |
| Error rate > 20% | Critical | PagerDuty + auto-disable |
| Latency p95 > 60s | Warning | Slack notification |
Implementation Notes
Dependencies
| Dependency | Version | Purpose |
|---|
| openai | ^4.0 | LLM API client |
| langchain | ^0.1 | Agent framework |
| tiktoken | ^0.5 | Token counting |
Configuration
1
2
3
4
5
6
7
8
9
10
11
| agent:
model: gpt-4o
temperature: 0.3
max_tokens: 1024
timeout_seconds: 30
tools:
search_codebase:
max_results: 50
read_file:
max_size_kb: 100
|
Rollout Plan
| Phase | Scope | Duration | Success Criteria |
|---|
| Alpha | Internal team | 2 weeks | No critical bugs |
| Beta | 10% of users | 2 weeks | Positive feedback |
| GA | All users | - | Metrics met |
References