Architecture Decision Records (ADRs): How We Reduced Bugs by 15%

“Why did we choose MongoDB over PostgreSQL for this?”

“Who decided to use WebSockets here?”

“What were the alternatives to this approach?”

Six months into maintaining a codebase, these questions become impossible to answer. The original developer left, Slack messages are buried, and the decision context is lost forever.

This was our reality until we implemented Architecture Decision Records (ADRs). This simple practice reduced our bug rate by 15%, improved onboarding time by 40%, and elevated our entire engineering culture.

What Are ADRs?

Architecture Decision Records document why technical decisions were made, not just what was decided.

A typical ADR includes:

  • Context: What problem are we solving?
  • Decision: What did we choose?
  • Alternatives: What else did we consider?
  • Consequences: What are the tradeoffs?
  • Status: Accepted, rejected, superseded?

Example: Simple ADR

# ADR-001: Use PostgreSQL for User Data Storage
 
**Date**: 2024-03-15
**Status**: Accepted
**Deciders**: @aryem, @alice, @bob
 
## Context
 
We need to choose a database for storing user accounts, course data, and
relationships between entities. Requirements:
- ACID transactions (critical for billing)
- Complex queries with joins
- Strong consistency
- Team familiarity
 
## Decision
 
Use PostgreSQL as our primary database.
 
## Alternatives Considered
 
### MongoDB
**Pros**: Flexible schema, horizontal scaling
**Cons**: Weak transaction support, less mature for complex queries
**Verdict**: Not suitable for our relational data model
 
### MySQL
**Pros**: Wide adoption, team familiarity
**Cons**: Less powerful query features, licensing concerns
**Verdict**: PostgreSQL offers better features for similar complexity
 
### DynamoDB
**Pros**: Serverless, highly scalable
**Cons**: Complex query limitations, vendor lock-in
**Verdict**: Overkill for current scale, limits flexibility
 
## Consequences
 
**Positive**:
- Mature ecosystem and tooling
- Excellent performance for our use case
- Strong ACID guarantees
- JSON support for semi-structured data
- Team already experienced with it
 
**Negative**:
- Vertical scaling limitations (not concern at current scale)
- Requires more operational overhead than managed NoSQL
- Migration complexity if needs change dramatically
 
**Neutral**:
- Will use AWS RDS for managed hosting
- Plan for read replicas when needed
 
## Related Decisions
 
- ADR-002: Use Redis for session storage
- ADR-003: Use S3 for media storage

Why ADRs Matter

1. Institutional Knowledge Retention

Before ADRs, knowledge lived in people’s heads:

  • Developer leaves → context lost
  • New joiners ask the same questions repeatedly
  • Decisions get reversed unknowingly

After ADRs:

  • History is documented
  • Context is preserved
  • Onboarding is self-service

2. Better Decision Making

The act of writing an ADR forces you to:

  • Articulate the problem clearly
  • Research alternatives thoroughly
  • Consider tradeoffs systematically
  • Get team feedback before committing

Result: Higher quality decisions.

3. Reduced Technical Debt

ADRs help you understand when decisions are outdated:

# ADR-005: Use Redis for Caching
 
**Status**: Superseded by ADR-045
**Superseded Date**: 2024-09-10
 
## Update
 
This decision is no longer valid. We migrated to CloudFront
for edge caching. See ADR-045 for details.

Without this, you’d find Redis references in code and documentation, waste time investigating, and potentially make inconsistent choices.

4. Faster Code Reviews

ADR references in pull requests provide instant context:

## Pull Request: Implement Real-Time Notifications
 
Uses WebSockets per ADR-023. See that ADR for why we chose
WebSockets over Server-Sent Events.
 
Changes:
- Added Socket.io server
- Implemented notification event handlers
- Client-side connection management

Reviewers don’t need to ask “why WebSockets?” - the ADR explains it.

Our Implementation Journey

Phase 1: Template Creation (Week 1)

Created a simple template:

# ADR-XXX: [Title]
 
**Date**: YYYY-MM-DD
**Status**: [Proposed | Accepted | Rejected | Deprecated | Superseded]
**Deciders**: [List of people involved]
 
## Context
 
[What is the issue we're addressing?]
[What factors are we considering?]
[What constraints exist?]
 
## Decision
 
[What we decided to do]
[Be specific and actionable]
 
## Alternatives Considered
 
### Alternative 1
- Pros:
- Cons:
- Why rejected:
 
### Alternative 2
- Pros:
- Cons:
- Why rejected:
 
## Consequences
 
**Positive**:
- [Good consequence 1]
 
**Negative**:
- [Bad consequence 1]
 
**Neutral**:
- [Neither good nor bad]
 
## Related Decisions
 
- ADR-XXX: [Related decision]
 
## Notes
 
[Any additional context, references, or future considerations]

Phase 2: Retrospective Documentation (Week 2)

Documented existing major decisions:

  • Database choices
  • Cloud provider selection
  • Authentication approach
  • API design (REST vs GraphQL)

Total: 12 retrospective ADRs in first week

This immediately helped new team members understand our architecture.

Phase 3: Process Integration (Week 3-4)

Made ADRs part of our workflow:

Rule: Any PR that introduces new dependencies, patterns, or significant architecture changes requires an ADR.

Process:

  1. Open ADR PR first (before implementation)
  2. Team reviews and discusses
  3. Revise based on feedback
  4. Merge ADR
  5. Then implement the change

Phase 4: Tooling (Week 5)

Created helper scripts:

#!/bin/bash
# scripts/new-adr.sh
 
ADR_DIR="docs/architecture/decisions"
NEXT_NUM=$(ls $ADR_DIR | grep -E "^[0-9]+" | sed 's/-.*//' | sort -n | tail -1)
NEXT_NUM=$((NEXT_NUM + 1))
NEXT_NUM=$(printf "%03d" $NEXT_NUM)
 
TITLE="$1"
FILENAME="${NEXT_NUM}-${TITLE}.md"
 
cp $ADR_DIR/template.md "$ADR_DIR/$FILENAME"
sed -i '' "s/ADR-XXX/ADR-$NEXT_NUM/" "$ADR_DIR/$FILENAME"
sed -i '' "s/YYYY-MM-DD/$(date +%Y-%m-%d)/" "$ADR_DIR/$FILENAME"
 
echo "Created $FILENAME"
open "$ADR_DIR/$FILENAME"

Usage:

./scripts/new-adr.sh "use-graphql-for-api"
# Creates docs/architecture/decisions/013-use-graphql-for-api.md

Phase 5: Cultural Adoption (Month 2-3)

Challenge: Getting the team to actually write ADRs

Strategies that worked:

  1. Lead by example: Senior engineers wrote ADRs first
  2. Make it easy: Scripts, templates, clear process
  3. Public praise: Celebrated good ADRs in team meetings
  4. Gentle enforcement: Blocked PRs without ADRs when needed
  5. Show value: Referenced ADRs constantly in discussions

Real-World Examples from Our Codebase

ADR-015: Use React Over Vue for New Features

# ADR-015: Use React for New Feature Development
 
**Date**: 2024-04-20
**Status**: Accepted
**Deciders**: Engineering Team
 
## Context
 
Our application currently uses Vue.js 2 for the frontend. We're
planning major new features and need to decide whether to:
1. Continue with Vue.js
2. Migrate to React
3. Use both (micro-frontends)
 
## Decision
 
All new feature development will use React. Existing Vue code
will be gradually migrated.
 
## Alternatives Considered
 
### Continue with Vue.js
**Pros**:
- No migration cost
- Team already knows it
- Consistency
 
**Cons**:
- Vue 2 approaching end-of-life
- Smaller ecosystem for enterprise components
- Difficulty hiring (React more common)
- Limited TypeScript support in Vue 2
 
**Verdict**: Continuing with Vue 2 creates technical debt
 
### Migrate to Vue 3
**Pros**:
- Easier migration path
- Modern features
 
**Cons**:
- Still smaller ecosystem than React
- Less enterprise component libraries
- Team wants to learn React
 
**Verdict**: Solves technical debt but not strategic concerns
 
### Use Both (Micro-frontends)
**Pros**:
- Gradual migration
- No disruption
 
**Cons**:
- Complex architecture
- Bundle size overhead
- Two mental models
 
**Verdict**: Unnecessary complexity for our team size
 
## Consequences
 
**Positive**:
- Access to React ecosystem (Material-UI, Recharts, etc.)
- Better TypeScript integration
- Easier hiring
- Modern development experience
- Active community and long-term support
 
**Negative**:
- 3-month learning curve for team
- Migration effort for existing features
- Temporary inconsistency in codebase
 
**Mitigation**:
- Training budget allocated
- Migration plan: new features first, then incrementally rewrite
- Shared component library to maintain consistency
 
## Timeline
 
- **Month 1-2**: Team training, setup tooling
- **Month 3-6**: New features in React
- **Month 7-12**: Migrate critical paths
- **Month 13-18**: Complete migration
 
## Related Decisions
 
- ADR-016: Use TypeScript for all new code
- ADR-017: Use Material-UI as component library

ADR-023: WebSockets for Real-Time Notifications

# ADR-023: Use WebSockets for Real-Time Notifications
 
**Date**: 2024-06-15
**Status**: Accepted
**Deciders**: @aryem, Backend Team
 
## Context
 
Users need real-time updates for:
- Collaborative editing (multiple users on same course)
- Notification center (course published, comments, etc.)
- Live progress tracking
 
Requirements:
- Bi-directional communication
- Low latency (<100ms)
- Scale to 10,000 concurrent users
- Mobile and web support
 
## Decision
 
Use Socket.io (WebSocket library) with Redis adapter for
horizontal scaling.
 
## Alternatives Considered
 
### Server-Sent Events (SSE)
**Pros**:
- Simpler than WebSockets
- Built into browsers
- Automatic reconnection
 
**Cons**:
- Unidirectional (server → client only)
- Doesn't work well through corporate proxies
- HTTP/1.1 connection limits
 
**Verdict**: Insufficient for collaborative editing
 
### Long Polling
**Pros**:
- Works everywhere (HTTP)
- Simple fallback
 
**Cons**:
- High latency
- Inefficient (constant requests)
- Scalability concerns
 
**Verdict**: Outdated approach
 
### GraphQL Subscriptions
**Pros**:
- Consistent with our GraphQL API
- Type-safe
 
**Cons**:
- Still uses WebSockets under the hood
- Adds complexity without benefits for our use case
- Overkill for simple pub/sub
 
**Verdict**: Unnecessary abstraction
 
### Firebase Cloud Messaging
**Pros**:
- Managed service
- Mobile push notifications included
 
**Cons**:
- Vendor lock-in
- Doesn't support web well
- Data goes through Google servers (privacy concern)
 
**Verdict**: Not suitable for in-app real-time features
 
## Consequences
 
**Positive**:
- True real-time, bi-directional communication
- Socket.io handles fallbacks automatically
- Redis adapter enables horizontal scaling
- Rich ecosystem and community
- Same solution for web and mobile (Socket.io clients)
 
**Negative**:
- Stateful connections (requires sticky sessions)
- More complex than HTTP-only architecture
- Slightly harder to debug
- Additional infrastructure (Redis)
 
**Operational Considerations**:
- Use AWS Application Load Balancer with sticky sessions
- Redis ElastiCache cluster for pub/sub
- Monitor connection count and memory usage
- Plan for graceful shutdown during deploys
 
## Implementation Plan
 
1. **Phase 1**: Basic infrastructure
   - Socket.io server with Redis adapter
   - Authentication middleware
   - Connection lifecycle management
 
2. **Phase 2**: Notification features
   - Real-time notification delivery
   - Presence indicators ("user X is editing")
 
3. **Phase 3**: Collaborative features
   - Operational Transform for concurrent editing
   - Conflict resolution
 
## Performance Targets
 
- Connection time: <500ms
- Message latency: <100ms
- Support 10,000 concurrent connections per instance
- Graceful degradation to polling if WebSocket fails
 
## Related Decisions
 
- ADR-024: Use Redis for Socket.io adapter
- ADR-025: Operational Transform for collaborative editing

Measuring Impact

We tracked metrics before/after ADR adoption:

Before ADRs (6 months baseline)

  • Bug rate: 2.3 bugs per 1000 lines of code
  • Architecture reversal rate: 3 decisions reversed
  • Onboarding time: 4 weeks to first meaningful contribution
  • Code review time: Average 24 hours per PR
  • “Why?” questions in Slack: 50+ per month

After ADRs (6 months)

  • Bug rate: 1.95 bugs per 1000 lines (15% reduction)
  • Architecture reversal rate: 0 decisions reversed
  • Onboarding time: 2.4 weeks (40% faster)
  • Code review time: Average 16 hours per PR (33% faster)
  • “Why?” questions: 12 per month (76% reduction)

Qualitative improvements:

  • Higher confidence in decisions
  • Better cross-team alignment
  • Clearer technical vision
  • Easier to justify decisions to stakeholders

Best Practices We Learned

1. Write ADRs at the Right Time

Too early: Don’t write ADRs for exploratory spikes Just right: Write during implementation planning Too late: Don’t retroactively document obvious choices

2. Keep Them Concise

Target: 1-2 pages maximum

  • Focus on decision and alternatives
  • Skip implementation details (those go in docs)
  • Link to related resources

3. Use Consistent Numbering

docs/
  architecture/
    decisions/
      001-use-postgresql.md
      002-use-redis-cache.md
      003-use-react.md
      ...

Sequential numbers make references easy: “See ADR-003”

4. Update Status, Don’t Delete

When a decision is superseded:

# ADR-005: Use Redux for State Management
 
**Status**: Superseded by ADR-045 (Zustand adoption)
**Original Date**: 2024-01-10
**Superseded Date**: 2024-09-15
 
## Why Superseded
 
Redux became too verbose for our use cases. Zustand provides
similar benefits with less boilerplate. Migration plan in ADR-045.
 
[Original content preserved below...]
  • PRs: “Implements ADR-023 (WebSockets)”
  • Code comments: // Using Redis pub/sub per ADR-024
  • Documentation: “Architecture Decision Records
  • Design docs: Reference relevant ADRs

6. Review ADRs Periodically

Quarterly review:

  • Are any decisions outdated?
  • What’s the status of “proposed” ADRs?
  • Should we revisit any choices?

Common Pitfalls

1. Too Much Detail

Bad: 10-page ADR with implementation code Good: 2-page ADR with link to implementation guide

ADRs document decisions, not implementations.

2. Not Updating Status

Forgetting to mark superseded ADRs causes confusion.

Solution: Make status updates part of the new ADR process.

3. Treating ADRs as Immutable

ADRs can be amended! If you learn something new:

## Amendment (2024-08-20)
 
Since this decision, we learned that Socket.io has
connection stability issues in our AWS environment.
We're evaluating alternatives (see ADR-067-proposed).

4. Writing ADRs After Implementation

Post-facto ADRs are biased and incomplete. Write them before committing.

5. Analysis Paralysis

Don’t let ADR writing block progress. Timebox it:

  • 2 hours to research and write
  • 24 hours for team review
  • Then decide and move on

Tools and Integrations

GitHub Integration

# .github/workflows/adr-check.yml
name: Check ADR Status
 
on: pull_request
 
jobs:
  adr-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
 
      - name: Check for ADR when needed
        run: |
          # Check if PR adds new dependencies
          if git diff --name-only origin/main | grep -q "package.json\|go.mod\|requirements.txt"; then
            # Look for ADR reference in PR description
            if ! gh pr view ${{ github.event.pull_request.number }} --json body | grep -q "ADR-"; then
              echo "Warning: New dependencies detected but no ADR referenced"
              echo "Consider documenting this architectural decision"
              exit 1
            fi
          fi

Documentation Site

Generate browsable ADR site with adr-tools or log4brains:

npx log4brains preview
# Launches web UI to browse ADRs

Conclusion

Architecture Decision Records are a simple practice with outsized impact. By documenting why decisions were made, not just what, we:

  • Reduced bugs by preserving context
  • Accelerated onboarding with self-service knowledge
  • Improved decision quality through structured thinking
  • Elevated engineering culture with thoughtful discourse

The ROI is clear:

  • Low cost: ~2 hours per ADR
  • High value: Thousands of hours saved answering “why?” questions

If your team isn’t using ADRs yet, start today. Create one ADR for your last major decision, and build from there.


Interested in improving engineering practices at your organization? Let’s discuss strategies for documentation and knowledge sharing. Connect on LinkedIn.