How to Use Claude with NemoClaw (Complete Integration Guide)
NemoClaw's privacy router lets you use both local Nemotron models and cloud APIs like Claude simultaneously. Sensitive queries stay on-premise. Complex queries route to Claude for best-in-class reasoning. Here's how to set it up.
Founder, SnapIT Software
Why Use Claude with NemoClaw?
The Problem
- Local models (Nemotron-4-340B) are capable but not frontier-class
- Claude 4.5 Sonnet excels at complex reasoning, coding, and analysis
- But sending customer PII to cloud APIs violates HIPAA, SOC 2, and PCI-DSS
The Solution: Privacy Routing
NemoClaw analyzes each query before it leaves your infrastructure and routes based on two factors:
- Sensitivity: Does the query contain PII, PHI, or financial data?
- Complexity: Does it need frontier reasoning or is it a simple lookup?
The result is a clean split:
- Sensitive queries -- Local Nemotron. Zero cloud egress.
- Complex queries -- Claude API. Best quality reasoning.
- Simple queries -- Local Nemotron. Fastest response, $0 cost.
Architecture
Here is how the privacy router sits between your application and the model backends:
+------------------+
| Your Application |
+--------+---------+
|
v
+-----------+-----------+
| NemoClaw Privacy |
| Router |
+---+-----+-------+----+
| | |
+----------+ +--+--+ +-+----------+
| | | |
v v v v
+--------+---+ +-----+--+ +--+-----+ +-----+-----+
| PII Scanner | | Query | | Cost | | Audit |
| (regex + | | Scorer | | Limiter| | Logger |
| NER model) | | | | | | |
+--------+---+ +-----+--+ +--+-----+ +-----+-----+
| | |
+------+------+-------+
|
+--------+--------+
| |
v v
+--------+------+ +------+--------+
| Local Nemotron| | Claude API |
| (on-premise) | | (Anthropic) |
+---------------+ +---------------+
Every query passes through the PII scanner first. If PII is detected, the query never leaves your network. If the query is clean and complex enough to benefit from frontier reasoning, it routes to Claude via the Anthropic API.
Setup (5 Minutes)
Step 1: Install NemoClaw
NemoClaw requires an NVIDIA GPU with at least 24GB VRAM for the local Nemotron model. Install via pip:
# Install NemoClaw
pip install nemoclaw
# Verify GPU is detected
nemoclaw doctor
The nemoclaw doctor command checks your GPU, CUDA drivers, and available VRAM. You should see output confirming Nemotron-4-340B can load on your hardware.
Step 2: Get Your Claude API Key
- Go to console.anthropic.com
- Navigate to API Keys in the left sidebar
- Click Create Key and give it a descriptive name (e.g., "nemoclaw-production")
- Copy the key -- you will not see it again
- Store it in your environment or secrets manager:
export ANTHROPIC_API_KEY=sk-ant-...
Step 3: Configure the Privacy Router
Create a file called nemoclaw.yaml in your project root. This is the full configuration:
# nemoclaw.yaml -- Privacy Router Configuration
router:
mode: privacy
default_backend: local
backends:
local:
model: nemotron-4-340b
device: cuda:0
max_tokens: 4096
temperature: 0.7
cloud:
provider: anthropic
model: claude-sonnet-4-5-20250514
api_key: ${ANTHROPIC_API_KEY}
max_tokens: 4096
temperature: 0.7
privacy:
pii_detection:
enabled: true
patterns:
- ssn # Social Security numbers
- credit_card # Credit card numbers
- email # Email addresses
- phone # Phone numbers
- dob # Dates of birth
- address # Physical addresses
ner_model: en_core_web_trf # spaCy transformer NER
action: route_local # If PII detected, use local model
routing_rules:
- condition: pii_detected
backend: local
reason: "PII found -- keeping query on-premise"
- condition: complexity_score > 0.7
backend: cloud
reason: "Complex query -- routing to Claude"
- condition: default
backend: local
reason: "Simple query -- using local model"
allowed_domains:
- api.anthropic.com
cost:
monthly_limit_usd: 200
alert_threshold_usd: 150
per_query_limit_usd: 0.50
logging:
level: info
audit_file: /var/log/nemoclaw/audit.jsonl
The key section is privacy.routing_rules. Queries containing PII always stay local. Clean queries with high complexity scores route to Claude. Everything else defaults to local Nemotron.
Step 4: Run the Agent
# Start NemoClaw with privacy routing
nemoclaw serve --config nemoclaw.yaml --port 8080
NemoClaw is now running on port 8080. Send queries to http://localhost:8080/v1/chat and the router handles the rest. Your application code does not need to know which backend is handling any given query.
Example Routing Decisions
Here are three real queries and how NemoClaw handles each one:
| Query | PII? | Route | Latency | Cost |
|---|---|---|---|---|
| "Look up account for john.doe@acme.com" | Yes (email) | Local Nemotron | 0.8s | $0.00 |
| "Design a microservices architecture for real-time fraud detection" | No | Claude 4.5 Sonnet | 2.1s | $0.03 |
| "What are your business hours?" | No | Local Nemotron | 0.3s | $0.00 |
The first query contains an email address -- PII detected, routed locally. The second query is complex (architecture design) with no PII -- routed to Claude. The third is a simple FAQ -- handled locally in 300 milliseconds at zero cost.
Cost Analysis (Real-World Example)
Let's look at a realistic scenario: a customer support operation handling 10,000 queries per month.
| Metric | All Claude (No Routing) | NemoClaw (Privacy Routing) |
|---|---|---|
| Total queries | 10,000 | 10,000 |
| Queries to Claude | 10,000 | 3,500 (35%) |
| Queries to local | 0 | 6,500 (65%) |
| Claude API cost | $150.00/mo | $52.50/mo |
| Local compute cost | $0.00 | $17.00/mo (GPU electricity) |
| Total monthly cost | $150.00 | $69.50 |
| Savings | -- | 54% reduction ($80.50/mo saved) |
The savings come from two places. First, simple queries that do not need frontier-class reasoning stay on your local GPU -- no API cost at all. Second, PII-containing queries must stay local anyway for compliance, so you are not paying for cloud processing on data you could never safely send.
The cost savings are real, but the compliance story is the bigger win. With NemoClaw, you can prove in an audit that no PII ever left your infrastructure -- while still getting Claude-quality answers for the queries that need it.
Advanced: Custom PII Detection
The default PII patterns cover most use cases, but every industry has domain-specific sensitive data. You can add custom patterns to the privacy configuration:
# Custom PII patterns for healthcare and finance
privacy:
pii_detection:
enabled: true
patterns:
- ssn
- credit_card
- email
- phone
- dob
- address
custom_patterns:
- name: medical_record_number
regex: "MRN[:\\s]?\\d{6,10}"
description: "Hospital medical record numbers"
- name: insurance_policy
regex: "POL[- ]?\\d{8,12}"
description: "Insurance policy numbers"
- name: internal_account_id
regex: "ACCT-[A-Z]{2}-\\d{6}"
description: "Internal account identifiers"
- name: iban
regex: "[A-Z]{2}\\d{2}[A-Z0-9]{4}\\d{7}([A-Z0-9]?){0,16}"
description: "International Bank Account Numbers"
ner_model: en_core_web_trf
action: route_local
Custom patterns use standard regular expressions. When NemoClaw detects a match, the query is treated the same as any other PII hit -- routed to the local model with an audit log entry explaining what was detected and why the query stayed on-premise.
Monitoring Router Decisions
Every routing decision NemoClaw makes is logged to an audit file. Configure the logging section in your YAML:
logging:
level: info
audit_file: /var/log/nemoclaw/audit.jsonl
include_query_hash: true
include_pii_types: true
include_latency: true
rotation:
max_size_mb: 100
max_files: 10
compress: true
Each log entry is a single JSON line, making it easy to ingest into your existing logging pipeline (ELK, Datadog, Splunk). Here is a sample entry:
{
"timestamp": "2026-03-20T14:32:01.847Z",
"query_hash": "a3f8c1d2e5b7",
"pii_detected": true,
"pii_types": ["email", "phone"],
"complexity_score": 0.42,
"routed_to": "local",
"reason": "PII found -- keeping query on-premise",
"latency_ms": 812,
"tokens_in": 47,
"tokens_out": 203,
"cost_usd": 0.00
}
Note that the actual query text is never logged -- only a hash. This means your audit logs themselves are safe to store in cloud logging systems without creating a secondary PII exposure risk.
Fallback Behavior
What happens if the Claude API is down? NemoClaw handles this gracefully with fallback configuration:
fallback:
cloud_unavailable:
action: route_local
max_retries: 2
retry_delay_ms: 500
notify:
webhook: https://your-ops-tool.com/alerts
message: "Claude API unreachable -- all queries routed locally"
local_unavailable:
action: queue
queue_max_size: 1000
queue_ttl_seconds: 300
notify:
webhook: https://your-ops-tool.com/alerts
message: "Local model unavailable -- queries queued"
If the Claude API returns an error or times out, NemoClaw retries twice with a 500ms delay. If it still fails, all queries automatically fall back to the local Nemotron model. Your application never sees a hard failure -- it gets a response from whichever backend is available.
If the local model goes down (GPU failure, OOM), queries are queued for up to 5 minutes while the model restarts. This is rare in practice but important for production deployments.
Performance Comparison
Here is how the three routing modes compare across key metrics:
| Metric | Local Only (Nemotron) | Cloud Only (Claude 4.5) | NemoClaw (Hybrid) |
|---|---|---|---|
| Avg latency | 0.5s | 2.0s | 0.9s |
| Complex task accuracy | 72% | 94% | 91% |
| Simple task accuracy | 95% | 97% | 95% |
| PII exposure risk | None | High | None |
| Cost per 10K queries | $17 (electricity) | $150 | $69.50 |
| Compliance-ready | Yes | No (PII in transit) | Yes |
The hybrid approach gets you 91% accuracy on complex tasks (close to Claude's 94%) with none of the PII exposure risk. The 3% accuracy gap comes from the small number of complex queries that also contain PII -- those get handled locally instead of being routed to Claude.
Common Mistakes
Mistake 1: Allowing All Domains
Leaving allowed_domains empty or set to a wildcard means any outbound request could be made from your network. This defeats the purpose of privacy routing.
Bad configuration:
privacy:
allowed_domains: ["*"] # DO NOT do this
Correct configuration:
privacy:
allowed_domains:
- api.anthropic.com # Only Anthropic's API endpoint
Mistake 2: No Cost Limits
Without cost limits, a sudden spike in complex queries could run up a large API bill before anyone notices.
Bad configuration:
cost:
monthly_limit_usd: 0 # No limit -- dangerous
Correct configuration:
cost:
monthly_limit_usd: 200
alert_threshold_usd: 150
per_query_limit_usd: 0.50
Set the monthly limit based on your expected usage with a reasonable buffer. The alert threshold triggers a notification before you hit the hard cap. The per-query limit catches runaway prompts that could consume your entire budget in a single request.
Mistake 3: Ignoring Audit Logs
NemoClaw generates detailed audit logs for every routing decision. Ignoring them means you miss PII detection drift, unusual routing patterns, and cost anomalies.
What to monitor:
# Check PII detection rate (should be stable)
cat /var/log/nemoclaw/audit.jsonl | \
jq -r 'select(.pii_detected == true)' | wc -l
# Check cloud routing percentage
cat /var/log/nemoclaw/audit.jsonl | \
jq -r 'select(.routed_to == "cloud")' | wc -l
# Check for queries near the cost limit
cat /var/log/nemoclaw/audit.jsonl | \
jq -r 'select(.cost_usd > 0.40)'
Review these metrics weekly. If your PII detection rate suddenly drops, your custom patterns may need updating. If cloud routing percentage spikes, you may be receiving more complex queries -- or your complexity threshold needs adjustment.
When NOT to Use Privacy Routing
Privacy routing adds a layer of complexity. Skip it if any of these apply:
- No PII in your data. If your queries never contain personally identifiable information (e.g., you are building a code assistant for open-source projects), just use the Claude API directly. The privacy router adds latency with no benefit.
- Extreme low-latency requirements. The PII scanning step adds 50-100ms to every query. If you need sub-200ms responses for every request, the overhead may be unacceptable. Use local models only.
- No NVIDIA GPU available. NemoClaw requires an NVIDIA GPU with at least 24GB VRAM for the local Nemotron model. If you do not have GPU infrastructure, you cannot run the local backend. Use the Claude API directly with a data processing agreement (DPA) from Anthropic.
For most production use cases involving customer data, privacy routing is worth the setup. The compliance benefits and cost savings justify the minimal added complexity.
Next Steps
You now have a working NemoClaw + Claude integration with privacy routing. Your sensitive data stays on-premise while complex queries get frontier-quality responses from Claude.
To go deeper:
- OpenClaw vs NemoClaw: Complete Comparison -- Understand the differences between the two routers and which one fits your use case.
- Is NemoClaw Free? Pricing Breakdown -- Full breakdown of NemoClaw's pricing tiers, including the free community edition.
Want the full playbook?
Download our free ebook: "Privacy-First AI: Building Compliant LLM Applications" -- covering routing architectures, compliance checklists, and production deployment patterns for HIPAA, SOC 2, and PCI-DSS environments.
Founder, SnapIT Software
Terrell builds AI-powered SaaS products on AWS. He's shipped 20+ products across the SnapIT Software portfolio, including Sphinx Agent (AI chatbot platform), SnapIT Forms (form builder), and SnapIT Analytics (website analytics). Based in Austin, TX.
Share this article
Build privacy-first AI agents
Deploy AI agents that keep sensitive data local while leveraging frontier models for complex queries. Free plan available -- no credit card required.
Start Free Trial