Sphinx Agent logo Sphinx Agent
Integration Guide 12 min read

How to Use Claude with NemoClaw (Complete Integration Guide)

NemoClaw's privacy router lets you use both local Nemotron models and cloud APIs like Claude simultaneously. Sensitive queries stay on-premise. Complex queries route to Claude for best-in-class reasoning. Here's how to set it up.

Why Use Claude with NemoClaw?

The Problem

  • Local models (Nemotron-4-340B) are capable but not frontier-class
  • Claude 4.5 Sonnet excels at complex reasoning, coding, and analysis
  • But sending customer PII to cloud APIs violates HIPAA, SOC 2, and PCI-DSS

The Solution: Privacy Routing

NemoClaw analyzes each query before it leaves your infrastructure and routes based on two factors:

  1. Sensitivity: Does the query contain PII, PHI, or financial data?
  2. Complexity: Does it need frontier reasoning or is it a simple lookup?

The result is a clean split:

  • Sensitive queries -- Local Nemotron. Zero cloud egress.
  • Complex queries -- Claude API. Best quality reasoning.
  • Simple queries -- Local Nemotron. Fastest response, $0 cost.

Architecture

Here is how the privacy router sits between your application and the model backends:

                        +------------------+
                        |  Your Application |
                        +--------+---------+
                                 |
                                 v
                     +-----------+-----------+
                     |   NemoClaw Privacy    |
                     |       Router          |
                     +---+-----+-------+----+
                         |     |       |
              +----------+  +--+--+  +-+----------+
              |             |     |               |
              v             v     v               v
     +--------+---+  +-----+--+ +--+-----+ +-----+-----+
     | PII Scanner |  | Query  | | Cost   | | Audit     |
     | (regex +    |  | Scorer | | Limiter| | Logger    |
     |  NER model) |  |        | |        | |           |
     +--------+---+  +-----+--+ +--+-----+ +-----+-----+
              |             |       |
              +------+------+-------+
                     |
            +--------+--------+
            |                 |
            v                 v
   +--------+------+  +------+--------+
   | Local Nemotron|  | Claude API    |
   | (on-premise)  |  | (Anthropic)   |
   +---------------+  +---------------+

Every query passes through the PII scanner first. If PII is detected, the query never leaves your network. If the query is clean and complex enough to benefit from frontier reasoning, it routes to Claude via the Anthropic API.

Setup (5 Minutes)

Step 1: Install NemoClaw

NemoClaw requires an NVIDIA GPU with at least 24GB VRAM for the local Nemotron model. Install via pip:

# Install NemoClaw
pip install nemoclaw

# Verify GPU is detected
nemoclaw doctor

The nemoclaw doctor command checks your GPU, CUDA drivers, and available VRAM. You should see output confirming Nemotron-4-340B can load on your hardware.

Step 2: Get Your Claude API Key

  1. Go to console.anthropic.com
  2. Navigate to API Keys in the left sidebar
  3. Click Create Key and give it a descriptive name (e.g., "nemoclaw-production")
  4. Copy the key -- you will not see it again
  5. Store it in your environment or secrets manager: export ANTHROPIC_API_KEY=sk-ant-...

Step 3: Configure the Privacy Router

Create a file called nemoclaw.yaml in your project root. This is the full configuration:

# nemoclaw.yaml -- Privacy Router Configuration
router:
  mode: privacy
  default_backend: local

backends:
  local:
    model: nemotron-4-340b
    device: cuda:0
    max_tokens: 4096
    temperature: 0.7

  cloud:
    provider: anthropic
    model: claude-sonnet-4-5-20250514
    api_key: ${ANTHROPIC_API_KEY}
    max_tokens: 4096
    temperature: 0.7

privacy:
  pii_detection:
    enabled: true
    patterns:
      - ssn          # Social Security numbers
      - credit_card  # Credit card numbers
      - email        # Email addresses
      - phone        # Phone numbers
      - dob          # Dates of birth
      - address      # Physical addresses
    ner_model: en_core_web_trf   # spaCy transformer NER
    action: route_local          # If PII detected, use local model

  routing_rules:
    - condition: pii_detected
      backend: local
      reason: "PII found -- keeping query on-premise"
    - condition: complexity_score > 0.7
      backend: cloud
      reason: "Complex query -- routing to Claude"
    - condition: default
      backend: local
      reason: "Simple query -- using local model"

  allowed_domains:
    - api.anthropic.com

cost:
  monthly_limit_usd: 200
  alert_threshold_usd: 150
  per_query_limit_usd: 0.50

logging:
  level: info
  audit_file: /var/log/nemoclaw/audit.jsonl

The key section is privacy.routing_rules. Queries containing PII always stay local. Clean queries with high complexity scores route to Claude. Everything else defaults to local Nemotron.

Step 4: Run the Agent

# Start NemoClaw with privacy routing
nemoclaw serve --config nemoclaw.yaml --port 8080

NemoClaw is now running on port 8080. Send queries to http://localhost:8080/v1/chat and the router handles the rest. Your application code does not need to know which backend is handling any given query.

Example Routing Decisions

Here are three real queries and how NemoClaw handles each one:

Query PII? Route Latency Cost
"Look up account for john.doe@acme.com" Yes (email) Local Nemotron 0.8s $0.00
"Design a microservices architecture for real-time fraud detection" No Claude 4.5 Sonnet 2.1s $0.03
"What are your business hours?" No Local Nemotron 0.3s $0.00

The first query contains an email address -- PII detected, routed locally. The second query is complex (architecture design) with no PII -- routed to Claude. The third is a simple FAQ -- handled locally in 300 milliseconds at zero cost.

Cost Analysis (Real-World Example)

Let's look at a realistic scenario: a customer support operation handling 10,000 queries per month.

Metric All Claude (No Routing) NemoClaw (Privacy Routing)
Total queries 10,000 10,000
Queries to Claude 10,000 3,500 (35%)
Queries to local 0 6,500 (65%)
Claude API cost $150.00/mo $52.50/mo
Local compute cost $0.00 $17.00/mo (GPU electricity)
Total monthly cost $150.00 $69.50
Savings -- 54% reduction ($80.50/mo saved)

The savings come from two places. First, simple queries that do not need frontier-class reasoning stay on your local GPU -- no API cost at all. Second, PII-containing queries must stay local anyway for compliance, so you are not paying for cloud processing on data you could never safely send.

The cost savings are real, but the compliance story is the bigger win. With NemoClaw, you can prove in an audit that no PII ever left your infrastructure -- while still getting Claude-quality answers for the queries that need it.

Advanced: Custom PII Detection

The default PII patterns cover most use cases, but every industry has domain-specific sensitive data. You can add custom patterns to the privacy configuration:

# Custom PII patterns for healthcare and finance
privacy:
  pii_detection:
    enabled: true
    patterns:
      - ssn
      - credit_card
      - email
      - phone
      - dob
      - address
    custom_patterns:
      - name: medical_record_number
        regex: "MRN[:\\s]?\\d{6,10}"
        description: "Hospital medical record numbers"
      - name: insurance_policy
        regex: "POL[- ]?\\d{8,12}"
        description: "Insurance policy numbers"
      - name: internal_account_id
        regex: "ACCT-[A-Z]{2}-\\d{6}"
        description: "Internal account identifiers"
      - name: iban
        regex: "[A-Z]{2}\\d{2}[A-Z0-9]{4}\\d{7}([A-Z0-9]?){0,16}"
        description: "International Bank Account Numbers"
    ner_model: en_core_web_trf
    action: route_local

Custom patterns use standard regular expressions. When NemoClaw detects a match, the query is treated the same as any other PII hit -- routed to the local model with an audit log entry explaining what was detected and why the query stayed on-premise.

Monitoring Router Decisions

Every routing decision NemoClaw makes is logged to an audit file. Configure the logging section in your YAML:

logging:
  level: info
  audit_file: /var/log/nemoclaw/audit.jsonl
  include_query_hash: true
  include_pii_types: true
  include_latency: true
  rotation:
    max_size_mb: 100
    max_files: 10
    compress: true

Each log entry is a single JSON line, making it easy to ingest into your existing logging pipeline (ELK, Datadog, Splunk). Here is a sample entry:

{
  "timestamp": "2026-03-20T14:32:01.847Z",
  "query_hash": "a3f8c1d2e5b7",
  "pii_detected": true,
  "pii_types": ["email", "phone"],
  "complexity_score": 0.42,
  "routed_to": "local",
  "reason": "PII found -- keeping query on-premise",
  "latency_ms": 812,
  "tokens_in": 47,
  "tokens_out": 203,
  "cost_usd": 0.00
}

Note that the actual query text is never logged -- only a hash. This means your audit logs themselves are safe to store in cloud logging systems without creating a secondary PII exposure risk.

Fallback Behavior

What happens if the Claude API is down? NemoClaw handles this gracefully with fallback configuration:

fallback:
  cloud_unavailable:
    action: route_local
    max_retries: 2
    retry_delay_ms: 500
    notify:
      webhook: https://your-ops-tool.com/alerts
      message: "Claude API unreachable -- all queries routed locally"

  local_unavailable:
    action: queue
    queue_max_size: 1000
    queue_ttl_seconds: 300
    notify:
      webhook: https://your-ops-tool.com/alerts
      message: "Local model unavailable -- queries queued"

If the Claude API returns an error or times out, NemoClaw retries twice with a 500ms delay. If it still fails, all queries automatically fall back to the local Nemotron model. Your application never sees a hard failure -- it gets a response from whichever backend is available.

If the local model goes down (GPU failure, OOM), queries are queued for up to 5 minutes while the model restarts. This is rare in practice but important for production deployments.

Performance Comparison

Here is how the three routing modes compare across key metrics:

Metric Local Only (Nemotron) Cloud Only (Claude 4.5) NemoClaw (Hybrid)
Avg latency 0.5s 2.0s 0.9s
Complex task accuracy 72% 94% 91%
Simple task accuracy 95% 97% 95%
PII exposure risk None High None
Cost per 10K queries $17 (electricity) $150 $69.50
Compliance-ready Yes No (PII in transit) Yes

The hybrid approach gets you 91% accuracy on complex tasks (close to Claude's 94%) with none of the PII exposure risk. The 3% accuracy gap comes from the small number of complex queries that also contain PII -- those get handled locally instead of being routed to Claude.

Common Mistakes

Mistake 1: Allowing All Domains

Leaving allowed_domains empty or set to a wildcard means any outbound request could be made from your network. This defeats the purpose of privacy routing.

Bad configuration:

privacy:
  allowed_domains: ["*"]  # DO NOT do this

Correct configuration:

privacy:
  allowed_domains:
    - api.anthropic.com   # Only Anthropic's API endpoint

Mistake 2: No Cost Limits

Without cost limits, a sudden spike in complex queries could run up a large API bill before anyone notices.

Bad configuration:

cost:
  monthly_limit_usd: 0  # No limit -- dangerous

Correct configuration:

cost:
  monthly_limit_usd: 200
  alert_threshold_usd: 150
  per_query_limit_usd: 0.50

Set the monthly limit based on your expected usage with a reasonable buffer. The alert threshold triggers a notification before you hit the hard cap. The per-query limit catches runaway prompts that could consume your entire budget in a single request.

Mistake 3: Ignoring Audit Logs

NemoClaw generates detailed audit logs for every routing decision. Ignoring them means you miss PII detection drift, unusual routing patterns, and cost anomalies.

What to monitor:

# Check PII detection rate (should be stable)
cat /var/log/nemoclaw/audit.jsonl | \
  jq -r 'select(.pii_detected == true)' | wc -l

# Check cloud routing percentage
cat /var/log/nemoclaw/audit.jsonl | \
  jq -r 'select(.routed_to == "cloud")' | wc -l

# Check for queries near the cost limit
cat /var/log/nemoclaw/audit.jsonl | \
  jq -r 'select(.cost_usd > 0.40)'

Review these metrics weekly. If your PII detection rate suddenly drops, your custom patterns may need updating. If cloud routing percentage spikes, you may be receiving more complex queries -- or your complexity threshold needs adjustment.

When NOT to Use Privacy Routing

Privacy routing adds a layer of complexity. Skip it if any of these apply:

  • No PII in your data. If your queries never contain personally identifiable information (e.g., you are building a code assistant for open-source projects), just use the Claude API directly. The privacy router adds latency with no benefit.
  • Extreme low-latency requirements. The PII scanning step adds 50-100ms to every query. If you need sub-200ms responses for every request, the overhead may be unacceptable. Use local models only.
  • No NVIDIA GPU available. NemoClaw requires an NVIDIA GPU with at least 24GB VRAM for the local Nemotron model. If you do not have GPU infrastructure, you cannot run the local backend. Use the Claude API directly with a data processing agreement (DPA) from Anthropic.

For most production use cases involving customer data, privacy routing is worth the setup. The compliance benefits and cost savings justify the minimal added complexity.

Next Steps

You now have a working NemoClaw + Claude integration with privacy routing. Your sensitive data stays on-premise while complex queries get frontier-quality responses from Claude.

To go deeper:

Want the full playbook?

Download our free ebook: "Privacy-First AI: Building Compliant LLM Applications" -- covering routing architectures, compliance checklists, and production deployment patterns for HIPAA, SOC 2, and PCI-DSS environments.

Terrell K. Flautt

Terrell K. Flautt

Founder, SnapIT Software

Terrell builds AI-powered SaaS products on AWS. He's shipped 20+ products across the SnapIT Software portfolio, including Sphinx Agent (AI chatbot platform), SnapIT Forms (form builder), and SnapIT Analytics (website analytics). Based in Austin, TX.

Build privacy-first AI agents

Deploy AI agents that keep sensitive data local while leveraging frontier models for complex queries. Free plan available -- no credit card required.

Start Free Trial

Related Articles