🌐 ITENFRESDE

Hermes Agent and GDPR: how to deploy it on-premise for corporate data compliance

· Hermes Agent Experts

One of the most frequent questions we receive at Hermes Agent Experts is: “Does my data leave the company?”. The short answer is no, if you deploy Hermes Agent on-premise. In this article, we look at how to configure Hermes Agent for GDPR compliance, NIS2, and regulated industries (healthcare, finance, public sector, defense), with a hands-on focus on what is truly needed in production.

The Problem: Corporate Data and Cloud Models

The adoption of generative AI in European enterprises is held back by one specific issue: corporate data is an invaluable asset, and sending it to an American API (OpenAI, Anthropic, Google) or a Chinese API (DeepSeek, Qwen) presents significant risks:

  • GDPR Violations (transfer of data to third-party countries without adequate protections under Schrems II)
  • NDA Violations with clients and partners
  • Loss of Competitive Advantage (prompts and corporate datasets can be used for model training, even when vendors claim otherwise)
  • Non-compliance with Sector-specific Regulations (healthcare, finance, public sector, and defense have strict in-house restrictions)

Hermes Agent was built specifically to solve this: it is an open-source agent that runs wherever you decide, speaks to any model (local or API-based), and lets you build an AI infrastructure completely under your control.

What “On-Premise” Means for Hermes Agent

“On-premise” in the context of Hermes Agent refers to three concrete aspects:

  1. The agent runs on your server (physical, VPS, private cloud, or hybrid). No API calls to proprietary orchestration services.
  2. The model runs on your hardware (or your private cloud). Self-hosted via vLLM, Ollama, llama.cpp, or TGI.
  3. Data remains in your storage (PostgreSQL, Qdrant, local file system, or on-prem S3-compatible systems like MinIO).

There are no “phone home” components by default. No hidden telemetry. The codebase is open-source (MIT-like license) and completely auditable.

On-Premise Reference Architecture

[ User PC ] ←HTTPS→ [ Reverse proxy (Caddy/Traefik) ]

              [ Hermes Agent Core (Docker/K8s) ]

       ┌──────────┬──────────┬──────────┬──────────┐
       ↓          ↓          ↓          ↓          ↓
   [ LLM    ]  [ Vector ]  [ SQL DB ]  [ Tools ]  [ MCP    ]
   [ self-h.]  [ Store  ]  [          ]  [        ]  [ servers]
       ↓          ↓          ↓          ↓          ↓
   [ GPU     ]  [ /data  ]  [ /db    ]  [ /tmp  ]  [        ]
   [ server  ]  [        ]  [        ]  [        ]  [        ]

Everything sits inside your corporate perimeter. The reverse proxy handles TLS, authentication (SSO/LDAP/OAuth), and rate limiting. Hermes Agent communicates with self-hosted models via a local API (OpenAI-compatible), with the vector store via a private API, and with corporate tools via API/MCP.

Minimum Requirements for an On-Premise Deployment

Hardware

ComponentMinimumRecommended for Production
CPU8 vCPUs (for small models)32+ vCPUs
RAM32 GB128+ GB
GPUOptional (quantized models on CPU)1-4× NVIDIA L40S / A100 / H100
Storage500 GB SSD2-10 TB NVMe (for vector store and logging)
Network1 Gbps10 Gbps

Software

  • OS: Ubuntu 22.04 LTS or 24.04 LTS, RHEL 9, Rocky Linux 9
  • Containers: Docker + Docker Compose, or Kubernetes (Rancher, OpenShift, K3s)
  • LLM Serving: vLLM (recommended), Ollama (dev), TGI (HuggingFace), or llama.cpp (edge)
  • Vector Store: Qdrant (recommended), Weaviate, pgvector
  • Database: PostgreSQL 15+
  • Observability: Prometheus + Grafana + Loki (or Elastic Stack)
  • Auth: Keycloak, Authentik, or your corporate IdP
  • TLS: Internal Let’s Encrypt or corporate PKI
ModelParametersUse caseMinimum VRAM
Llama 3.3 70B Instruct70B (Q4)General purpose, top-tier quality48 GB
Qwen 2.5 72B Instruct72B (Q4)Multilingual (IT/EN/FR/ES/DE/CN)48 GB
Mistral Large 2 (123B)123B (Q4)Complex reasoning, coding80 GB
DeepSeek-V367B MoECoding + reasoning, highly cost-effective48 GB
Mixtral 8x22B141B (MoE active 39B)Balanced quality and inference speed48 GB
Llama 3.1 8B Instruct8B (Q4)Edge, simple tasks, ultra-low VRAM8 GB
Phi-3 Medium14BCompact, outstanding quality12 GB

For the European market, Qwen 2.5 72B is regularly the best choice: excellent quality in English, Italian, German, French, and Spanish, performing comparable to Llama 3 70B on MMLU benchmarks and remaining much more cost-effective to host.

GDPR Checklist for Deployments

Here is the operational checklist we execute for every client deployment:

1. Data Processing Mapping

  • Agent Scope: comprehensive list of finalities (customer support, data analysis, RAG, etc.)
  • Processed Data Types: classifications (PII, health data, financial data, code, internal documentation)
  • Legal Basis: contractual execution, legitimate interest, or consent
  • Retention: established storage durations for logs, vector stores, and backup structures
  • Registry of Processing Activities (Art. 30 GDPR) updated

2. Technical Measures (Art. 32 GDPR)

  • Encryption at rest for all disks, databases, vector stores, and backups (LUKS, AES-256)
  • Encryption in transit (TLS 1.3, internal mTLS)
  • Data Segregation by corporate department/project boundaries
  • Access Control: RBAC, principle of least privilege, and MFA policies for all administrators
  • Comprehensive Audit Logging: signed and retained for compliance mandates
  • Encrypted Backups, regularly tested and kept off-site
  • Annual Penetration Testing
  • Continuous Vulnerability Scanning (Trivy, Snyk, OpenSCAP)

3. Organizational Measures

  • DPO Appointment (where mandated by industry sector)
  • DPA (Data Processing Agreements) signed with all subprocessors (e.g., hosting providers, LLM companies if API used)
  • Employee Training on secure AI usage patterns
  • Acceptable Use Policy signed by all active users
  • Automated DSR Workflows (handling rights of access, rectification, erasure, portability)
  • Data Breach Notification Procedure configured for triggers within 72 hours (Art. 33 GDPR)

4. Model Governance

  • Authorized Models Registry listing allowed models with engineering justification
  • Change Management protocols governing model updates
  • Model Traceability: linking specific generated outputs back to the engine version/weight hash (essential for auditing)
  • Human-in-the-loop Sample Reviews to ensure quality and compliance
  • Periodic Red Teaming exercises targeting prompt injections and jailbreaks

Hands-On: Minimal docker-compose.yml

Here is a starting docker-compose.yml for a secure, compliant on-premise installation:

version: '3.8'

services:
  # Self-hosted LLM (vLLM with quantized Qwen 2.5 72B)
  llm:
    image: vllm/vllm-openai:latest
    runtime: nvidia
    environment:
      - MODEL=Qwen/Qwen2.5-72B-Instruct-AWQ
    volumes:
      - /opt/models:/models
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

  # Hermes Agent Core
  hermes:
    image: nousresearch/hermes-agent:latest
    environment:
      - HERMES_LLM_BASE_URL=http://llm:8000/v1
      - HERMES_VECTOR_STORE=qdrant
      - HERMES_QDRANT_URL=http://qdrant:6333
      - HERMES_DB_URL=postgresql://hermes:***@db:5432/hermes
    volumes:
      - /opt/hermes/data:/data
      - /opt/hermes/skills:/skills
    ports:
      - "8080:8080"
    depends_on:
      - llm
      - qdrant
      - db

  # Secure Vector Store
  qdrant:
    image: qdrant/qdrant:latest
    volumes:
      - /opt/qdrant:/qdrant/storage
    ports:
      - "6333:6333"

  # Core Database
  db:
    image: postgres:15
    environment:
      - POSTGRES_DB=hermes
      - POSTGRES_USER=hermes
      - POSTGRES_PASSWORD=***
    volumes:
      - /opt/pgdata:/var/lib/postgresql/data

  # TLS Reverse Proxy
  proxy:
    image: caddy:2
    volumes:
      - /opt/caddy/Caddyfile:/etc/caddy/Caddyfile
      - /opt/caddy/data:/data
    ports:
      - "443:443"

All micro-frontend and engine traffic stays enclosed within the internal container network. Only the secure TLS reverse proxy is exposed externally, enforcing corporate SSO.

Audit Logs: What to Trace

Hermes Agent details can be configured for granular tracing. The following events must be stored:

  • User Identity: who executed the instruction (User ID, IP address, timestamp, user agent string)
  • User input: actual prompt (subject to internal DLP - data loss prevention filters)
  • Tool calls: which tools were invoked along with their specific arguments
  • Agent response: exact generated outputs
  • Model engine: exact model variant hash used
  • Inference Latency: total execution/response times
  • Status Outcome: success, error code, fallback, or human escalation event
  • Token consumption tracking: total input/output tokens and execution cost metrics

These application audit logs should be automatically routed directly into your corporate SIEM (Splunk, Elastic, Datadog) for log correlation.

Exercising the Right to be Forgotten (DSR)

When a customer or employee exercises their right to erasure (Art. 17 GDPR), we must execute workflows to securely scrape all associated records from:

  1. Authentication systems (deactivating their profile, purging active tokens)
  2. Vector index store (removing all document chunks mapped to their user ID)
  3. Chat history databases (purging conversation threads)
  4. Application logs (running GDPR pseudonymization scripts on logs)
  5. Backups (data is overwritten according to standard backup rotation schedules)

We build automated, idempotent scripts that execute these purges cleanly, leaving an auditable validation of completed deletion.

NIS2 and Highly Regulated Environments

For sectors under the NIS2 Directive (energy, transportation, health, digital infrastructure, public sector, etc.), Hermes Agent can be hardened to guarantee:

  • Incident notifications within 24 hours (early alerting) and 72 hours (comprehensive notification) to target regulators
  • Cyber security risk posture tracking: treating the AI controller as a critical corporate asset
  • Business continuity plans: highly available active-active clusters, cold recovery procedures, precise RTO/RPO metrics
  • Supply chain verification: complete SBOM analysis of all core models, deep dependencies, and system container files

When Hybrid Cloud Makes Sense (And When It Doesn’t)

Pure On-Premise deployments are essential for:

  • Healthcare, finance, public sector, and defense industries
  • Entities legally bound by national data sovereignty regulations
  • Workflows that process trade secrets, patents, or intellectual property

Hybrid Cloud deployments (self-hosted models paired with highly specific vendor APIs) make sense for:

  • SMBs looking to minimize upfront server investments
  • Highly complex multi-modal tasks requiring global frontier models
  • Non-sensitive corporate tasks (e.g., market research, generating marketing copy, public documentation)

Pure, Commercial Cloud APIs are only appropriate for:

  • Open-access public data assets
  • Early sandbox experimentation
  • Companies utilizing verified enterprise non-training configurations residing in adequacy-approved countries

Licensing and Implementation Costs

An on-premises deployment represents an upfront investment (servers or dedicated private cloud resources) and maintenance effort. However, the 3-year TCO is regularly highly competitive with per-seat private cloud subscriptions, especially for organizations with steady high workloads.

Every deployment is quoted on a bespoke basis to align perfectly with the target infrastructure, integrations, and compliance requirements.

Studio Synapse offers a free 30-minute technical assessment and consulting session without any obligation. This session is designed to explore your use case, outline an appropriate deployment architecture, and provide a tailor-made quote. Contact us directly at contatti@hermesagentexperts.com.

Frequently Asked Questions

Is Hermes Agent GDPR compliant? Hermes Agent is built to be GDPR-friendly by design: it can be installed completely on-premise with self-hosted models, so personal data never leaves the client's network. Full GDPR compliance is a factor of overall system setup: establishing processing registers, implementing strict access boundaries, defining data life cycles, and setting up DSR procedures. We deliver the entire compliant stack out of the box.
Can I use Hermes Agent with healthcare data? Yes. In fully on-premise deployments with local self-hosted models. For health data ( EHR, diagnosis scripts, doctor reports) strict compliance mandates that data remains completely in-house and cannot be processed via third-party web APIs. Hermes Agent was constructed precisely for such restricted use cases.
Can Hermes Agent run within a VPS, or does it require dedicated local servers? Hermes Agent is infrastructure-agnostic. It runs seamlessly on cloud VPS interfaces, private clouds, physical bare metal hardware, or hybrid configs. The ultimate choice depends on your security profile: defense, healthcare, and finance sectors frequently demand physical, offline private servers; for mid-market SMBs, a trusted EU-based cloud VPS hosting provider (such as Hetzner or OVH) with encrypted backups is excellent.
Which models can I run locally with Hermes Agent? Any open-weights model compatible with standard hosting architectures: Llama 3 (Meta), Mistral, Mixtral, Qwen 2.5 (Alibaba), DeepSeek-V3, Gemma (Google), or Phi-3 (Microsoft). Performance matches closed commercial GPT-4 models on typical enterprise tasks. For edge queries requiring frontier models, we can leverage zero-data-retention APIs from OpenAI, Anthropic, or Google.
How is data retention handled in Hermes Agent? By default, Hermes Agent does not store user data in permanent memory unless you explicitly enable long-term user memory tracking. Chat threads are ephemeral. Log tracing retention is configurable (typically 30 to 90 days), and index datasets within the vector DB follow custom retention policies, strictly respecting the GDPR minimization principle.
Does the agent support Right to be Forgotten (DSR) workflows? Yes, via comprehensive, preconfigured cleanup scripts. These workflows scrub the user from authentication systems, purge their chunked history inside vector databases, delete relational database logs, and cleanly pseudonymize operational logs. We deliver a traceable script process for verified auditing.
What details are tracked in the audit logs? The agent logs every key transaction: auth token context, user prompts, called core tools, read/write actions, model engines, processing latency, and token totals. Logs are cryptographically signed and archived. They connect natively to enterprise SIEM platforms such as Splunk, Elastic, or Datadog.
Is Hermes Agent approved for regulated sectors (banking, public sector, insurance)? Yes, with custom configurations. We match Bank of Italy regulations (Circular 285) and international DORA mandates for financial institutions, IVASS policies for insurance groups, and the Italian Cloud strategy / AgID guidelines for Government installations. Our mix of self-hosted engines, encrypted storage, and end-to-end security satisfies strict compliance profiles.
What is the cost of an enterprise on-premise installation? Every project is quoted on a bespoke basis according to integration complexity, legacy systems, security parameters, and requested SLA tiers. Studio Synapse provides detailed estimates only following an initial technical discovery. We offer a free 30-minute technical assessment and consulting session to map your project requirements.
What compliance standards do your installation processes follow? At Studio Synapse, we function under processes mapped to ISO/IEC 27001 standards and execute according to OWASP and NIST CSF cybersecurity frameworks. We happily sign NDAs, DPAs (Data Processing Agreements), and support external compliance audits. For custom regional or medical requirements, we work alongside specialist cybersecurity legal counsels.

Conclusion

Hermes Agent is, today, one of the most effective technical answers to the data sovereignty challenge in the era of generative AI. On-premise, self-hosted, open-source, and model-agnostic: a combined feature set that enables European companies to implement AI without making concessions on compliance.

At Hermes Agent Experts (a brand of Studio Synapse), we deliver, configure, and manage on-premise Hermes Agent clusters for clients across Europe with an unwavering focus on compliance by design. For an honest evaluation of your specific operational goals, contact us directly at contatti@hermesagentexperts.com — we respond within one business day.


Richiedi una consulenza gratuita →

← Torna al blog