ADD-0001: Template

Metadata	Value
Status	`Draft` \| `In Review` \| `Approved` \| `Deployed` \| `Deprecated`
Created	YYYY-MM-DD
Author(s)	@username
RFC	RFC-NNNN (if applicable)
Model	GPT-4o \| Claude 3.5 Sonnet \| etc.

Summary

One paragraph description: What does this agent do? Who is it for?

Agent Persona

Identity

1
2
3
4
Name: [Agent Name]
Role: [e.g., "Code Review Assistant", "Customer Support Agent"]
Personality: [e.g., "Helpful, concise, technically accurate"]
Voice: [e.g., "Professional but approachable"]

Core Purpose

What is the primary job this agent is designed to do?

Target Users

User Type	Use Case
Developers	Code review feedback
Support Team	Ticket triage

Capabilities

Tools / Functions

Tool	Description	Risk Level
`search_codebase`	Search for code patterns	Low
`read_file`	Read file contents	Low
`create_pr_comment`	Post review comments	Medium
`approve_pr`	Approve pull request	High

Tool Definitions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
interface SearchCodebase {
  query: string;
  filePattern?: string;
  maxResults?: number;
}

interface CreatePRComment {
  prNumber: number;
  file: string;
  line: number;
  body: string;
  severity: 'suggestion' | 'warning' | 'error';
}

Supported Actions

Read and analyze code
Provide suggestions and explanations
Create comments on pull requests
Modify code directly
Merge pull requests
Access external systems

System Prompt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
You are [Agent Name], a [role] for [company/project].

## Your Purpose
[Primary purpose statement]

## Guidelines
- [Guideline 1]
- [Guideline 2]
- [Guideline 3]

## Constraints
- Never [constraint 1]
- Always [constraint 2]

## Response Format
[Expected output format]

Guardrails & Safety

Hard Constraints (Never Violate)

Constraint	Rationale	Enforcement
No code execution	Security risk	Tool not provided
No PII in logs	Privacy compliance	Output filtering
No external API calls	Data leakage risk	Network isolation

Soft Constraints (Prefer to Follow)

Constraint	Rationale	Override Condition
Max 3 suggestions per file	Avoid noise	Critical security issue
Response under 500 tokens	Readability	Complex explanation needed

Content Filtering

Input: [Describe input validation/sanitization]
Output: [Describe output filtering rules]

Rate Limiting

Scope	Limit	Window
Per user	100 requests	1 hour
Per repo	500 requests	1 hour
Global	10,000 requests	1 hour

Human-in-the-Loop

Escalation Triggers

Trigger	Action	SLA
Confidence < 70%	Request human review	-
Security finding	Alert security team	1 hour
User disputes result	Escalate to maintainer	24 hours

Approval Requirements

Action	Approval Required
Read code	None
Comment on PR	None
Request changes	None
Approve PR	Human co-approval
Merge PR	Not permitted

Evaluation & Metrics

Success Metrics

Metric	Target	Measurement
Accuracy	> 90%	Manual review sample
Helpfulness rating	> 4/5	User feedback
False positive rate	< 5%	Disputed suggestions
Response time	< 30s	p95 latency

Evaluation Dataset

Describe or link to the evaluation dataset used to test the agent.

Category	Examples	Expected Behavior
Happy path	[link]	Provide accurate review
Edge cases	[link]	Graceful degradation
Adversarial	[link]	Refuse and explain

A/B Testing Plan

If applicable, describe the rollout and testing strategy.

Error Handling

Failure Modes

Failure	Detection	Recovery
Model timeout	30s threshold	Retry with backoff
Rate limit	429 response	Queue and retry
Invalid output	Schema validation	Fallback response
Model refusal	Content filter	Human escalation

Fallback Behavior

What happens when the agent can’t complete its task?

1. Log the failure with context
2. Notify user: "I couldn't complete this analysis. A human reviewer will follow up."
3. Create ticket for human review
4. Continue processing other items

Observability

Logging

Event	Log Level	Data Captured
Request received	INFO	user_id, repo, pr_number
Tool invocation	DEBUG	tool_name, params (redacted)
Response sent	INFO	response_time, token_count
Error	ERROR	error_type, stack_trace

Monitoring Dashboards

Latency: p50, p95, p99 response times
Volume: Requests per hour/day
Errors: Error rate by type
Quality: User feedback scores

Alerting

Condition	Severity	Action
Error rate > 5%	Warning	Slack notification
Error rate > 20%	Critical	PagerDuty + auto-disable
Latency p95 > 60s	Warning	Slack notification

Implementation Notes

Dependencies

Dependency	Version	Purpose
openai	^4.0	LLM API client
langchain	^0.1	Agent framework
tiktoken	^0.5	Token counting

Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
agent:
  model: gpt-4o
  temperature: 0.3
  max_tokens: 1024
  timeout_seconds: 30

tools:
  search_codebase:
    max_results: 50
  read_file:
    max_size_kb: 100

Rollout Plan

Phase	Scope	Duration	Success Criteria
Alpha	Internal team	2 weeks	No critical bugs
Beta	10% of users	2 weeks	Positive feedback
GA	All users	-	Metrics met