AML_LUX_DATASET v2.0.0

Reference legal dataset for auditing and evaluating AI systems in AML/CFT in LuxembourgA French-language dataset
A structured legal dataset exclusively based on an official Luxembourg regulatory corpus, designed to test, audit, and benchmark artificial intelligence systems in regulated environments.

Why a dedicated AML legal dataset?

General-purpose AI systems are neither designed nor trained to reliably meet the specific legal and regulatory requirements of anti-money laundering and counter-terrorist financing (AML/CFT).

AML_LUX_DATASET v2.0.0 addresses a precise need:

  • to assess the actual legal compliance of AI-generated responses,
  • to measure the risk of out-of-corpus hallucinations,
  • to objectively compare multiple models or RAG configurations,
  • to document AI governance within an AI Act / internal control framework.

This dataset is not a simple question–answer collection:
it is a legally constrained AI audit tool.

Regulatory scope covered

The dataset is entirely based on a documented Luxembourg and European legal corpus, including in particular:

  • Amended Law of 12 November 2004 (AML/CFT)
  • Laws governing the Financial Intelligence Unit (FIU) and criminal sanctions
  • CSSF circulars (12/02, 17/650, 18/702, etc.)
  • European AML Directives (4th, 5th, and 6th AML Directives)
  • International standards and recommendations (FATF)

📌 No response is produced outside this defined corpus.

Technical features of the dataset

Grounded and traceable dataset

Each answer is:

  • generated under strict corpus constraints,
  • supported by explicit citations,
  • linked to normalized legal sources,
  • structured for machine use (JSONL format).

Hallucination mitigation

The dataset includes:

  • insufficient-context cases,
  • deliberately blocked responses,
  • an explicit and documented refusal logic.

➡️ Ideal for testing whether an AI system knows when not to answer.

Main use cases

🔍 Audit and benchmarking of legal AI systems

  • Compare GPT, Claude, Mistral, and internal LLMs
  • Test different RAG architectures
  • Measure the regulatory robustness of AI outputs

🧠 Training and evaluation of local models

  • Controlled fine-tuning
  • Post-training evaluation
  • Detection of out-of-corpus drift

💬 Compliance & AML chatbots

  • Internal compliance assistants
  • AML regulatory assistants
  • Decision-support tools (non-decision-making)

📊 AI governance & AI Act compliance

  • Documentation of AI risks
  • Proof of information perimeter control
  • Support for internal and external audits

Format & integration

  • Format: JSONL
  • Language: French legal language
  • Version: v2.0.0 (frozen dataset)
  • Compatibility:
    • RAG frameworks (Chroma, FAISS, Pinecone…)
    • Internal AI pipelines
    • BULORA.ai audit tools

License & conditions of use

  • Internal professional use only
  • Redistribution prohibited
  • No training of public AI models
  • Contractual license provided with the dataset

➡️ See the Offers & Licenses page

Integration with BULORA.ai

AML_LUX_DATASET v2.0.0 is natively compatible with BULORA.ai modules:

  • Reasoning
  • Source
  • Robustness
  • Temporal
  • Disagreement

It can also be used independently of the platform.

Access & demonstration

Would you like to test this dataset on your own models or use cases?

➡️ Contact us for a demo: contact(@)bulora.ai
➡️ Request an evaluation access: contact(@)bulora.ai