diff --git a/nemo/Evaluator/end2end_tutorial.ipynb b/nemo/Evaluator/end2end_tutorial.ipynb
new file mode 100644
index 000000000..b68167e10
--- /dev/null
+++ b/nemo/Evaluator/end2end_tutorial.ipynb
@@ -0,0 +1,4478 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b9268128-30cd-4f08-a718-1f8e73a980e7",
+   "metadata": {},
+   "source": [
+    "# Nemo Evaluator Microservice Tutorial"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d52988a9-7676-44ab-9f53-7c4d6dd0d494",
+   "metadata": {},
+   "source": [
+    "## About this tutorial\n",
+    "\n",
+    "In this tutorial, we will cover the following evaluation types using NeMo Evaluator:\n",
+    "\n",
+    "- **Agentic Evaluation**\n",
+    "- **LLM Evaluation on Academic Benchmarks**\n",
+    "- **Custom Evaluations**\n",
+    "  - **Similarity Metrics Evaluation**\n",
+    "  - **LLM-as-Judge Evaluation**\n",
+    "  - **Tool Calling Evaluation**\n",
+    "- **Retriever Pipeline Evaluation**\n",
+    "- **RAG Pipeline Evaluation**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a052618-c8b6-424b-8d7b-d00c26346296",
+   "metadata": {},
+   "source": [
+    "## 1. Prerequisites"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3541975-e515-4909-af80-9c59a807ccc1",
+   "metadata": {},
+   "source": [
+    "### 1.1 Install Nemo Microservices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e03e7c89-2b52-4eab-b41e-bf1e9b7156d3",
+   "metadata": {},
+   "source": [
+    "While this tutorial focuses on NeMo Evaluator, we recommend installing the NeMo Microservices platform using the [NeMo Microservices Platform Helm Chart](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/helm-charts/nemo-microservices-helm-chart) to avoid manually managing dependencies. For step by step installation guide, refer to the [Demo Cluster Setup Guide](https://docs.nvidia.com/nemo/microservices/latest/get-started/setup/index.html). This tutorial was done on a single-node cluster with two A100 GPUs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d728ee2-9ef9-49ac-b992-c5cf9ecf8ae3",
+   "metadata": {},
+   "source": [
+    "Check the pods to make sure all required microservices is running before proceeding."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "278624c3-13ba-4305-9141-945d6ee3b6a1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "NAME                                                          READY   STATUS      RESTARTS      AGE\n",
+      "model-downloader-meta-llama-3-1-8b-instruct-2-0-28trx         0/1     Completed   0             10m\n",
+      "model-downloader-meta-llama-3-2-1b-instruct-2-0-b6scn         0/1     Completed   0             10m\n",
+      "modeldeployment-meta-llama-3-1-8b-instruct-6b64d56fdc-slctn   1/1     Running     0             9m26s\n",
+      "nemo-argo-workflows-server-655f8d755-svgn2                    1/1     Running     0             12m\n",
+      "nemo-argo-workflows-workflow-controller-8f8877cd4-8t2tf       1/1     Running     0             12m\n",
+      "nemo-customizer-5d8554fcf6-rhwfp                              1/1     Running     2 (11m ago)   12m\n",
+      "nemo-customizerdb-0                                           1/1     Running     0             12m\n",
+      "nemo-data-store-795ccbb97b-nwcf2                              1/1     Running     0             12m\n",
+      "nemo-deployment-management-646cc67c-l67lq                     1/1     Running     0             12m\n",
+      "nemo-entity-store-6bfd7c4bd9-p4s4b                            1/1     Running     0             12m\n",
+      "nemo-entity-storedb-0                                         1/1     Running     0             12m\n",
+      "nemo-evaluator-68c67f6798-f4sqh                               1/1     Running     0             12m\n",
+      "nemo-evaluatordb-0                                            1/1     Running     0             12m\n",
+      "nemo-guardrails-5985f8c6d4-bkkq2                              1/1     Running     0             12m\n",
+      "nemo-milvus-standalone-57cc6c46fc-5rctk                       1/1     Running     0             12m\n",
+      "nemo-nemo-operator-controller-manager-59c98d97f9-7gfsw        2/2     Running     0             12m\n",
+      "nemo-nim-operator-765cbf686b-w9n6c                            1/1     Running     0             12m\n",
+      "nemo-nim-proxy-6fbccbfb45-8qvmz                               1/1     Running     0             12m\n",
+      "nemo-opentelemetry-collector-6f46646fd9-5tnrf                 1/1     Running     0             12m\n",
+      "nemo-postgresql-0                                             1/1     Running     0             12m\n"
+     ]
+    }
+   ],
+   "source": [
+    "! kubectl get pods"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58628300-b5c1-4577-ab0c-c2bc92e60611",
+   "metadata": {},
+   "source": [
+    "Install `huggingface_hub` which is required to interact with NeMo Data Store:\n",
+    "\n",
+    "```bash\n",
+    "pip install -U \"huggingface_hub[cli]\"\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f49ed4c1-25be-4c52-9fa4-154b88729de8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/ubuntu/.local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "import os\n",
+    "from pprint import pp\n",
+    "from huggingface_hub import HfApi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9159cb76-5239-4b31-ba6b-673562d46e9f",
+   "metadata": {},
+   "source": [
+    "Specify the Namespace and API endpoints:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "1ee4a79f-eadf-4c1c-aef0-b59a30a61bf9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "NDS_URL = \"http://data-store.test\" # Data Store\n",
+    "NEMO_URL = \"http://nemo.test\" # Customizer, Entity Store, Evaluator\n",
+    "NIM_URL = \"http://nim.test\" # NIM Proxy\n",
+    "NMS_NAMESPACE = \"nemo-eval-tutorial\"\n",
+    "\n",
+    "target_url = f\"{NEMO_URL}/v1/evaluation/targets\"\n",
+    "config_url = f\"{NEMO_URL}/v1/evaluation/configs\"\n",
+    "job_url = f\"{NEMO_URL}/v1/evaluation/jobs\"\n",
+    "llm_chat_completion_url = f\"{NIM_URL}/v1/chat/completions\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e27807-e8c6-44cf-9183-c77c7d0390cf",
+   "metadata": {},
+   "source": [
+    "### 1.2 Deploy NIM for LLMs\n",
+    "\n",
+    "This tutorial will use the `Llama-3.1-8b-instruct` model as the LLM to be evaluated. You can either choose to deploy a `Llama-3.1-8b-instruct` NIM locally or use the NIM hosted remotely. As part of the Nemo Microservices Platform, the NeMo Deployment Management service provides an API to deploy NIM on a Kubernetes cluster and manage them through the NIM Operator microservice. The below cells show how to deploy the `Llama-3.1-8b-instruct` NIM and run inference using NIM Proxy service.\n",
+    "\n",
+    "**Note**: if you use see a pod named `modeldeployment-meta-llama-3-1-8b-instruct-xxx` from the list of pods above, then a Llama-3.1-8b-instruct NIM has already been deployed and you can skip this step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93deb6ef-a033-4243-8824-8bf74830271e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "deployment_url = f\"{NEMO_URL}/v1/deployment/model-deployments\"\n",
+    "\n",
+    "payload = {\n",
+    "    \"name\": \"llama-3.1-8b-instruct\",\n",
+    "    \"namespace\": \"meta\",\n",
+    "    \"config\": {\n",
+    "        \"model\": \"meta/llama-3.1-8b-instruct\",\n",
+    "        \"nim_deployment\": {\n",
+    "            \"image_name\": \"nvcr.io/nim/meta/llama-3.1-8b-instruct\",\n",
+    "            \"image_tag\": \"1.8.3\",\n",
+    "            \"pvc_size\": \"25Gi\",\n",
+    "            \"gpu\": 1,\n",
+    "            \"additional_envs\": {\n",
+    "                \"NIM_GUIDED_DECODING_BACKEND\": \"outlines\"\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "headers = {\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(deployment_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fa7ceb9-7644-4110-8e76-0e75db6deb50",
+   "metadata": {},
+   "source": [
+    "Check the deployment status and make sure the status is 'ready' before proceeding."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2e582eb-b90e-4286-b0f5-ff283c4f06ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct\", json=payload)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd959837-6507-4680-a4ac-4f4dcf53bbc6",
+   "metadata": {},
+   "source": [
+    "We can test LLM inferences to the NIM endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "61ead8b5-fb62-4c1b-97e4-a8e607746c33",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'role': 'assistant',\n",
+      " 'content': 'LLM (Large Language Model) token log probabilities are used to '\n",
+      "            \"represent the model's confidence in its predictions, with lower \"\n",
+      "            'probabilities indicating less likely or less confident '\n",
+      "            'predictions.'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"model\": \"meta/llama-3.1-8b-instruct\",\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"What is the purpose of LLM token log probabilities? Answer with a single sentence.\"\n",
+    "        }\n",
+    "    ],\n",
+    "    \"stream\": False,\n",
+    "    \"temperature\": 0.0\n",
+    "}\n",
+    "\n",
+    "headers = {\n",
+    "    \"Content-Type\": \"application/json\",\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(llm_chat_completion_url, json=payload, headers=headers)\n",
+    "pp(resp.json()['choices'][0]['message'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcffc6e6-2f09-417a-a35b-520a52401685",
+   "metadata": {},
+   "source": [
+    "### 1.3 Set Up API Key and Access Token\n",
+    "\n",
+    "This tutorial uses a remote LLM hosted on [build.nvidia.com](https://build.nvidia.com/) for evaluations that requires a LLM as judge. For this, we will need to set up the API key to access the models. You can generate an NVIDIA API key at [Manage API Keys](https://build.nvidia.com/settings/api-keys)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e355d4d7-6d24-4a3d-99e4-476916696cc0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Enter your NVIDIA API Key ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "from getpass import getpass\n",
+    "\n",
+    "os.environ['NVIDIA_API_KEY'] = getpass(\"Enter your NVIDIA API Key\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05d2e5d0-5169-4401-9c82-5bec2c0f8eb3",
+   "metadata": {},
+   "source": [
+    "Next we need to set up the Hugging Face Access Token which needs to have access to Meta's Llama-3.1-8B-instruct model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "efe54825-6497-4edb-b1ac-3289a8fa9cbf",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Enter your Hugging Face Token ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "os.environ['HF_Token'] = getpass(\"Enter your Hugging Face Token\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c67c1f20",
+   "metadata": {},
+   "source": [
+    "### 1.4 Running Evaluation Jobs with NeMo Evaluator\n",
+    "\n",
+    "Before running evaluations, it is important to understand the typical NeMo Evaluator Workflow:\n",
+    "\n",
+    "1.\t(Optional) Upload your custom dataset to the NeMo Data Store if you’re not using a built-in dataset.\n",
+    "2.\tCreate an evaluation configuration in NeMo Evaluator.\n",
+    "3.\tDefine an evaluation target (the model to evaluate).\n",
+    "4.\tSubmit an evaluation job to NeMo Evaluator. The following steps occur automatically:\n",
+    "\n",
+    "\ta. NeMo Evaluator retrieves any required custom data from the NeMo Data Store.\n",
+    "\n",
+    "\tb. It runs inference using NIM, supporting LLMs, embeddings, and reranking tasks.\n",
+    "\n",
+    "\tc. Results, including generations, logs, and metrics, are written to the NeMo Data Store.\n",
+    "\n",
+    "\td. The results are returned to the user.\n",
+    "\n",
+    "5.\tReview evaluation results."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6aa8733b-8f48-4448-bf0d-73cd19a5c19e",
+   "metadata": {},
+   "source": [
+    "## 2. Agentic Evaluation\n",
+    "\n",
+    "Agentic evaluation uses RAGAS metrics to score agent outputs. RAGAS is a library for evaluating retrieval-augmented generation and agentic workflows using standardized, research-backed metrics.\n",
+    "\n",
+    "Each task contains a set of metrics relevant to the Agentic evaluation, such as topic adherence, tool call accuracy, agent goal accuracy, or answer accuracy, depending on the metric selected in the job configuration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65cfece4-764b-415f-a980-be35dc3eb259",
+   "metadata": {},
+   "source": [
+    "### 2.1 Upload Custom Data to Nemo Data Store\n",
+    "\n",
+    "Before uploding the data, we first need to create namespace in NeMo Data Store and Entity Store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1e8e44cd-d0fc-408e-bf24-120dab22bf58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from helpers import create_namespaces, setup_dataset_repo\n",
+    "\n",
+    "DATASET_NAME = \"agent_eval\"\n",
+    "create_namespaces(NEMO_URL, NDS_URL, NMS_NAMESPACE)\n",
+    "HF_API = HfApi(endpoint=f\"{NDS_URL}/v1/hf\", token=\"\")\n",
+    "repo_id = setup_dataset_repo(HF_API, NMS_NAMESPACE, DATASET_NAME, NEMO_URL)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4140a67b-8e54-453c-9baa-1929ea81bbb9",
+   "metadata": {},
+   "source": [
+    "Follow this [dataset format](https://docs.nvidia.com/nemo/microservices/latest/evaluate/evaluation-types/agentic.html#options) to prepare the dataset for agentic evaluation, example datasets are provided at `./eval_dataset/agent_data`. Next, we will upload these example datasets to Data Store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "9f48cdb0-1416-4f58-9ebe-2ee12fef6e38",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "agent_tool_call_data.jsonl: 100%|██████████| 3.20k/3.20k [00:00<00:00, 709kB/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='', commit_message='Upload agent_tool_call_data.jsonl with huggingface_hub', commit_description='', oid='659c7516a96482859db4f4707a32b4a8c5b78ba9', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "HF_API.upload_file(path_or_fileobj=os.path.join(\"./eval_dataset/agent_data\", \"agent_goal_data.jsonl\"),\n",
+    "    path_in_repo=\"agent_goal_data.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")\n",
+    "\n",
+    "HF_API.upload_file(path_or_fileobj=os.path.join(\"./eval_dataset/agent_data\", \"agent_tool_call_data.jsonl\"),\n",
+    "    path_in_repo=\"agent_tool_call_data.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2aaa0731-70b0-4db5-82b0-9169038d4fa8",
+   "metadata": {},
+   "source": [
+    "### 2.2 Create Evaluation Targets"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffa00862-1b7e-49a0-b67e-f9e2417ccac7",
+   "metadata": {},
+   "source": [
+    "Currently, agentic evaluation only works with `cached_outputs` targets which point to files that are stored in NeMo Data Store and that contain pre-generated answers.\n",
+    "\n",
+    "We will create two evaluation targets for agentic evaluations: one for agent goal accuracy and one for tool calling accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "f50ddadc-6ce6-4d60-9e62-b6d17643db77",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:18:37.063092',\n",
+      " 'updated_at': '2025-07-08T19:18:37.063095',\n",
+      " 'name': 'agent-goal-target',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'cached_outputs',\n",
+      " 'cached_outputs': {'files_url': 'hf://datasets/nemo-eval-tutorial/agent_eval/agent_goal_data.jsonl'},\n",
+      " 'id': 'eval-target-GDJ771rB6W3xaowbpE4Bsh',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"type\": \"cached_outputs\",\n",
+    "    \"name\": \"agent-goal-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"cached_outputs\": {\n",
+    "        \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/agent_goal_data.jsonl\",\n",
+    "        }\n",
+    "}\n",
+    "\n",
+    "headers = {\n",
+    "    \"Content-Type\": \"application/json\",\n",
+    "}\n",
+    "\n",
+    "\n",
+    "resp = requests.post(target_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "656576d0-c06d-4d20-8c9f-f6fcd01423ad",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:18:38.848137',\n",
+      " 'updated_at': '2025-07-08T19:18:38.848149',\n",
+      " 'name': 'agent-tool-call-target',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'cached_outputs',\n",
+      " 'cached_outputs': {'files_url': 'hf://datasets/nemo-eval-tutorial/agent_eval/agent_tool_call_data.jsonl'},\n",
+      " 'id': 'eval-target-HBLMJtD4MhBzV6S3AGor2g',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"type\": \"cached_outputs\",\n",
+    "    \"name\": \"agent-tool-call-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"cached_outputs\": {\n",
+    "        \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/agent_tool_call_data.jsonl\",\n",
+    "        }\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(target_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30cf1d52-6940-43ff-bf76-e11d258fa2e4",
+   "metadata": {},
+   "source": [
+    "### 2.3 Create Evaluation Configs\n",
+    "\n",
+    "Similarily, we will create two evaluation configs: one for agent goal accuracy and one for tool calling accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "49aae7e8-4718-413b-91ae-f7ba6269880e",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:19:13.374904',\n",
+      " 'updated_at': '2025-07-08T19:19:13.374906',\n",
+      " 'name': 'agentic-goal-accuracy',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'agentic',\n",
+      " 'tasks': {'goal-accuracy': {'type': 'agent_goal_accuracy_with_reference',\n",
+      "                             'params': {'judge': {'model': {'url': 'https://integrate.api.nvidia.com/v1',\n",
+      "                                                            'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                            'api_key': '******'},\n",
+      "                                                  'inference_params': {'max_new_tokens': 4024,\n",
+      "                                                                       'max_retries': 10,\n",
+      "                                                                       'request_timeout': 10,\n",
+      "                                                                       'temperature': 0.1}}}}},\n",
+      " 'id': 'eval-config-MCeuvELVrXYuZQLZWE2tkV',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "  \"type\": \"agentic\",\n",
+    "  \"name\": \"agentic-goal-accuracy\",\n",
+    "  \"namespace\": NMS_NAMESPACE,\n",
+    "  \"tasks\": {\n",
+    "    \"goal-accuracy\": {\n",
+    "      \"type\": \"agent_goal_accuracy_with_reference\",\n",
+    "      \"params\": {\n",
+    "        \"judge\": {\n",
+    "          \"model\": {\n",
+    "            \"url\": \"https://integrate.api.nvidia.com/v1\",\n",
+    "            \"model_id\": \"meta/llama-3.3-70b-instruct\",\n",
+    "            \"api_key\": os.environ[\"NVIDIA_API_KEY\"]\n",
+    "          },\n",
+    "          \"inference_params\": {\n",
+    "            \"max_new_tokens\": 4024,\n",
+    "            \"max_retries\": 10,\n",
+    "            \"request_timeout\": 10,\n",
+    "            \"temperature\": 0.1\n",
+    "          }\n",
+    "        }\n",
+    "      }\n",
+    "    }\n",
+    "  }\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "0e6f59d8-be94-4d61-956e-6967c6828c62",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:19:30.170735',\n",
+      " 'updated_at': '2025-07-08T19:19:30.170738',\n",
+      " 'name': 'agentic-tool-call-accuracy',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'agentic',\n",
+      " 'tasks': {'tool-call-accuracy': {'type': 'tool_call_accuracy'}},\n",
+      " 'id': 'eval-config-Jr4uE5uUonn3YyfwUHKKSa',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "  \"type\": \"agentic\",\n",
+    "  \"name\": \"agentic-tool-call-accuracy\",\n",
+    "  \"namespace\": NMS_NAMESPACE,\n",
+    "  \"tasks\": {\n",
+    "    \"tool-call-accuracy\": {\n",
+    "      \"type\": \"tool_call_accuracy\",\n",
+    "    }\n",
+    "  }\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73806bf9-8caf-4eb9-9bfb-77222637869e",
+   "metadata": {},
+   "source": [
+    "### 2.4 Submit Evaluation Job\n",
+    "\n",
+    "To launch the evaluation job, we simply send a request with the previously created evaluation targets and configs to the `/jobs` API endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "783386ae-b5c1-4c29-bfb3-9b94df32bdd4",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:28:46.640661',\n",
+      " 'updated_at': '2025-07-08T19:28:46.640663',\n",
+      " 'id': 'eval-PLUWikEJpyH7WkcYj6sWZb',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-GDJ771rB6W3xaowbpE4Bsh',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:18:37.063092',\n",
+      "            'updated_at': '2025-07-08T19:18:37.063095',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'agent-goal-target',\n",
+      "            'type': 'cached_outputs',\n",
+      "            'cached_outputs': {'files_url': 'hf://datasets/nemo-eval-tutorial/agent_eval/agent_goal_data.jsonl'},\n",
+      "            'model': None,\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-MCeuvELVrXYuZQLZWE2tkV',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:19:13.374904',\n",
+      "            'updated_at': '2025-07-08T19:19:13.374906',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'agentic-goal-accuracy',\n",
+      "            'type': 'agentic',\n",
+      "            'params': None,\n",
+      "            'tasks': {'goal-accuracy': {'type': 'agent_goal_accuracy_with_reference',\n",
+      "                                        'params': {'judge': {'model': {'url': 'https://integrate.api.nvidia.com/v1',\n",
+      "                                                                       'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                       'api_key': '******'},\n",
+      "                                                             'inference_params': {'max_new_tokens': 4024,\n",
+      "                                                                                  'max_retries': 10,\n",
+      "                                                                                  'request_timeout': 10,\n",
+      "                                                                                  'temperature': 0.1}}},\n",
+      "                                        'metrics': None,\n",
+      "                                        'dataset': None}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/agent-goal-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/agentic-goal-accuracy\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "agent_goal_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "48683921-cb41-4668-bba5-485f2df43831",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:27:35.584648',\n",
+      " 'updated_at': '2025-07-08T19:27:35.584650',\n",
+      " 'id': 'eval-N9HsrJbRXjWKefR6ZSi3oo',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-HBLMJtD4MhBzV6S3AGor2g',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:18:38.848137',\n",
+      "            'updated_at': '2025-07-08T19:18:38.848149',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'agent-tool-call-target',\n",
+      "            'type': 'cached_outputs',\n",
+      "            'cached_outputs': {'files_url': 'hf://datasets/nemo-eval-tutorial/agent_eval/agent_tool_call_data.jsonl'},\n",
+      "            'model': None,\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-Jr4uE5uUonn3YyfwUHKKSa',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:19:30.170735',\n",
+      "            'updated_at': '2025-07-08T19:19:30.170738',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'agentic-tool-call-accuracy',\n",
+      "            'type': 'agentic',\n",
+      "            'params': None,\n",
+      "            'tasks': {'tool-call-accuracy': {'type': 'tool_call_accuracy',\n",
+      "                                             'params': None,\n",
+      "                                             'metrics': None,\n",
+      "                                             'dataset': None}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/agent-tool-call-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/agentic-tool-call-accuracy\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "agent_tool_call_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6844ba1d-a491-43c4-8b64-947eb88f72f3",
+   "metadata": {},
+   "source": [
+    "### 2.5 Monitoring job status and get evaluation results\n",
+    "\n",
+    "We can monitor job status and get the evaluation results by sending request with the Job ID to the `/status` and `/results` API, respectively."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "192c9156-8df3-4611-9687-7356937fe389",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': None, 'task_status': {}, 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{agent_goal_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "20af57a2-7fb6-49db-a3a5-a8fecf75ce37",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'goal-accuracy': {'metrics': {'agent_goal_accuracy': {'scores': {'agent_goal_accuracy': {'value': 1.0}}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{agent_goal_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "a942d665-4f3e-4b10-8efe-e4bf02f81b67",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': None, 'task_status': {}, 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{agent_tool_call_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "366eba53-1b82-4d5e-85a2-6fb2c923f113",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'tool-call-accuracy': {'metrics': {'tool_call_accuracy': {'scores': {'tool_call_accuracy': {'value': 1.0}}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{agent_tool_call_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87aaef5c-5d97-4107-8339-833faaeaaeef",
+   "metadata": {},
+   "source": [
+    "**Download Evaluation Eesults**: downloads a directory that contains the configuration files, logs, and evaluation results for a specific evaluation job."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b5e9abd9-104c-4a8b-ad91-982d0070ec17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -X \"GET\" \"{NEMO_URL}/v1/evaluation/jobs/{agent_eval_job_id}/download-results\" \\\n",
+    "-H 'accept: application/json' \\\n",
+    "-o result.zip"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb03af9c-d330-4579-9b71-2883c32ad80a",
+   "metadata": {},
+   "source": [
+    "## 3. LLM Evaluation on Academic Benchmarks"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b22d8f7-acb4-4782-8b6b-794a808da8a5",
+   "metadata": {},
+   "source": [
+    "**Create Evaluation Config**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "86a9e0ce-1a6c-4964-8425-ef0427a7ecc2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:39:55.734565',\n",
+      " 'updated_at': '2025-07-08T19:39:55.734568',\n",
+      " 'name': 'gsm8k-chat-config',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'gsm8k',\n",
+      " 'params': {'max_tokens': 256,\n",
+      "            'temperature': 1e-05,\n",
+      "            'top_p': 1e-05,\n",
+      "            'stop': ['<|eot|>'],\n",
+      "            'extra': {'num_fewshot': 8,\n",
+      "                      'batch_size': 16,\n",
+      "                      'bootstrap_iters': 100000,\n",
+      "                      'dataset_seed': 42,\n",
+      "                      'use_greedy': True,\n",
+      "                      'top_k': 1,\n",
+      "                      'hf_token': '******',\n",
+      "                      'tokenizer_backend': 'hf',\n",
+      "                      'tokenizer': 'meta-llama/llama-3.1-8B-Instruct',\n",
+      "                      'apply_chat_template': True,\n",
+      "                      'fewshot_as_multiturn': True}},\n",
+      " 'id': 'eval-config-6pTREGQfYHFQzq7oWHATgt',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"type\": \"gsm8k\",\n",
+    "    \"name\": \"gsm8k-chat-config\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"params\": {\n",
+    "        \"temperature\": 0.00001,\n",
+    "        \"top_p\": 0.00001,\n",
+    "        \"max_tokens\": 256,\n",
+    "        \"stop\": [\"<|eot|>\"],\n",
+    "        \"extra\": {\n",
+    "            \"num_fewshot\": 8,\n",
+    "            \"batch_size\": 16,\n",
+    "            \"bootstrap_iters\": 100000,\n",
+    "            \"dataset_seed\": 42,\n",
+    "            \"use_greedy\": True,\n",
+    "            \"top_k\": 1,\n",
+    "            \"hf_token\": os.environ['HF_Token'],\n",
+    "            \"tokenizer_backend\": \"hf\",\n",
+    "            \"tokenizer\": \"meta-llama/llama-3.1-8B-Instruct\",\n",
+    "            \"apply_chat_template\": True,\n",
+    "            \"fewshot_as_multiturn\": True\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1c1856a-b816-49f4-900a-7de0b38506e6",
+   "metadata": {},
+   "source": [
+    "**Create Evaluation Target**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "3bd2ace9-dbda-4120-bba9-644a7651b75d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:40:13.061228',\n",
+      " 'updated_at': '2025-07-08T19:40:13.061229',\n",
+      " 'name': 'llama-chat-target',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'model',\n",
+      " 'model': {'schema_version': '1.0',\n",
+      "           'id': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "           'type_prefix': 'model',\n",
+      "           'namespace': 'default',\n",
+      "           'created_at': '2025-07-08T19:40:13.061012',\n",
+      "           'updated_at': '2025-07-08T19:40:13.061015',\n",
+      "           'custom_fields': {},\n",
+      "           'name': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "           'version_id': 'main',\n",
+      "           'version_tags': [],\n",
+      "           'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                            'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                            'format': 'openai'}},\n",
+      " 'id': 'eval-target-GaJYgnmDA4Ta2TN23sV9Wo',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"type\": \"model\",\n",
+    "    \"name\": \"llama-chat-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"model\": {\n",
+    "        \"api_endpoint\": {\n",
+    "            \"url\": llm_chat_completion_url ,\n",
+    "            \"model_id\": \"meta/llama-3.1-8b-instruct\",\n",
+    "            \"format\": \"openai\"\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(target_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f8b84ac-ff4e-4237-aace-3e6756de4b9a",
+   "metadata": {},
+   "source": [
+    "**Submit Evaluation Job**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "2db8db92-4040-455d-abb1-a6be1d6d9ec3",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T19:40:19.984771',\n",
+      " 'updated_at': '2025-07-08T19:40:19.984774',\n",
+      " 'id': 'eval-WF5xprkaQpjTNZLpsNb1Y1',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-GaJYgnmDA4Ta2TN23sV9Wo',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:40:13.061228',\n",
+      "            'updated_at': '2025-07-08T19:40:13.061229',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'llama-chat-target',\n",
+      "            'type': 'model',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': {'schema_version': '1.0',\n",
+      "                      'id': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'description': None,\n",
+      "                      'type_prefix': 'model',\n",
+      "                      'namespace': 'default',\n",
+      "                      'project': None,\n",
+      "                      'created_at': '2025-07-08T19:40:13.061012',\n",
+      "                      'updated_at': '2025-07-08T19:40:13.061015',\n",
+      "                      'custom_fields': {},\n",
+      "                      'ownership': None,\n",
+      "                      'name': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'version_id': 'main',\n",
+      "                      'version_tags': [],\n",
+      "                      'spec': None,\n",
+      "                      'artifact': None,\n",
+      "                      'base_model': None,\n",
+      "                      'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                       'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                       'api_key': None,\n",
+      "                                       'format': 'openai'},\n",
+      "                      'peft': None,\n",
+      "                      'prompt': None,\n",
+      "                      'guardrails': None},\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-6pTREGQfYHFQzq7oWHATgt',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:39:55.734565',\n",
+      "            'updated_at': '2025-07-08T19:39:55.734568',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'gsm8k-chat-config',\n",
+      "            'type': 'gsm8k',\n",
+      "            'params': {'parallelism': None,\n",
+      "                       'request_timeout': None,\n",
+      "                       'max_retries': None,\n",
+      "                       'limit_samples': None,\n",
+      "                       'max_tokens': 256,\n",
+      "                       'temperature': 1e-05,\n",
+      "                       'top_p': 1e-05,\n",
+      "                       'stop': ['<|eot|>'],\n",
+      "                       'extra': {'num_fewshot': 8,\n",
+      "                                 'batch_size': 16,\n",
+      "                                 'bootstrap_iters': 100000,\n",
+      "                                 'dataset_seed': 42,\n",
+      "                                 'use_greedy': True,\n",
+      "                                 'top_k': 1,\n",
+      "                                 'hf_token': '******',\n",
+      "                                 'tokenizer_backend': 'hf',\n",
+      "                                 'tokenizer': 'meta-llama/llama-3.1-8B-Instruct',\n",
+      "                                 'apply_chat_template': True,\n",
+      "                                 'fewshot_as_multiturn': True}},\n",
+      "            'tasks': None,\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/llama-chat-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/gsm8k-chat-config\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "gsm8k_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b27d234-77dd-4505-bae7-d4ed2e2a5348",
+   "metadata": {},
+   "source": [
+    "We can check the status of the job using the status API. **Note that the status in the API is only regularly updated for custom evaluation. So as long as the status is saying running, it's actually running.**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "29a416f7-a18a-4c40-b6ea-be351d483f07",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully', 'task_status': {}, 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{gsm8k_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cfcc3b0-8d3c-4498-9384-4b1213977a7e",
+   "metadata": {},
+   "source": [
+    "Once the job is completed, we can check the evaluation results using the results endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "49938fe8-bb24-4498-a152-84ccc944bcd2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'metrics': {'exact_match': {'scores': {'gsm8k-metric_ranking-1': {'value': 0.7664897649734648},\n",
+      "                                        'gsm8k-metric_ranking-3': {'value': 0.821076573161486}}}}}\n",
+      "{'metrics': {'exact_match_stderr': {'scores': {'gsm8k-metric_ranking-2': {'value': 0.011653286808791036},\n",
+      "                                               'gsm8k-metric_ranking-4': {'value': 0.010557661392901296}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{gsm8k_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks']['exact_match'])\n",
+    "pp(resp.json()['tasks']['exact_match_stderr'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90262e05-61a9-4dfa-9dee-5a8d579d531e",
+   "metadata": {},
+   "source": [
+    "## 4. Custom Evaluations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "325349e3-c240-4c6e-9cf6-913039b92735",
+   "metadata": {},
+   "source": [
+    "### 4.1 Similarity Metrics Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f409363c-99ed-4e2b-b5b2-1d50c2790d14",
+   "metadata": {},
+   "source": [
+    "**Upload Custom Data to Nemo Data Store**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "b4b4bd96-9e35-41b2-8dce-95a88d8eff1a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "inputs.jsonl: 100%|██████████| 149k/149k [00:00<00:00, 28.9MB/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='', commit_message='Upload similarity_metrics/inputs.jsonl with huggingface_hub', commit_description='', oid='6220905767521b90b36e77360dc7213c3dbbb99e', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 43,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# set up dataset repo\n",
+    "DATASET_NAME = \"similarity_eval\"\n",
+    "HF_API = HfApi(endpoint=f\"{NDS_URL}/v1/hf\", token=\"\")\n",
+    "repo_id = setup_dataset_repo(HF_API, NMS_NAMESPACE, DATASET_NAME, NEMO_URL)\n",
+    "# upload dataset\n",
+    "HF_API.upload_file(path_or_fileobj=os.path.join(\"./eval_dataset/similarity_metrics_data\", \"inputs.jsonl\"),\n",
+    "    path_in_repo=\"similarity_metrics/inputs.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "733118e3-7cdf-4b0c-9d57-51e5c7baf91e",
+   "metadata": {},
+   "source": [
+    "**Create Evaluation Config**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "7195efda-6785-404a-b910-9d5826176692",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T20:31:57.416755',\n",
+      " 'updated_at': '2025-07-08T20:31:57.416756',\n",
+      " 'name': 'similarity-configuration',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'similarity_metrics',\n",
+      " 'params': {'max_tokens': 200, 'temperature': 0.7, 'extra': {'top_k': 20}},\n",
+      " 'tasks': {'my-similarity-metrics-task': {'type': 'default',\n",
+      "                                          'metrics': {'accuracy': {'type': 'accuracy'},\n",
+      "                                                      'bleu': {'type': 'bleu'},\n",
+      "                                                      'rouge': {'type': 'rouge'},\n",
+      "                                                      'em': {'type': 'em'},\n",
+      "                                                      'f1': {'type': 'f1'}},\n",
+      "                                          'dataset': {'schema_version': '1.0',\n",
+      "                                                      'id': 'dataset-DALY8efJ78UgkXHcrfACGZ',\n",
+      "                                                      'namespace': 'default',\n",
+      "                                                      'created_at': '2025-07-08T20:31:57.416530',\n",
+      "                                                      'updated_at': '2025-07-08T20:31:57.416532',\n",
+      "                                                      'custom_fields': {},\n",
+      "                                                      'name': 'dataset-DALY8efJ78UgkXHcrfACGZ',\n",
+      "                                                      'version_id': 'main',\n",
+      "                                                      'version_tags': [],\n",
+      "                                                      'files_url': 'hf://datasets/nemo-eval-tutorial/similarity_eval/similarity_metrics/inputs.jsonl'}}},\n",
+      " 'id': 'eval-config-XQpRHrKVucSQzLDXFdZqjq',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"type\": \"similarity_metrics\",\n",
+    "    \"name\": \"similarity-configuration\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"params\": {\n",
+    "        \"max_tokens\": 200,\n",
+    "        \"temperature\": 0.7,\n",
+    "        \"extra\": {\n",
+    "            \"top_k\": 20\n",
+    "        }\n",
+    "    },\n",
+    "    \"tasks\": {\n",
+    "        \"my-similarity-metrics-task\": {\n",
+    "            \"type\": \"default\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/similarity_metrics/inputs.jsonl\",\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"accuracy\": {\"type\": \"accuracy\"},\n",
+    "                \"bleu\": {\"type\": \"bleu\"},\n",
+    "                \"rouge\": {\"type\": \"rouge\"},\n",
+    "                \"em\": {\"type\": \"em\"},\n",
+    "                \"f1\": {\"type\": \"f1\"}\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1571c3e-fbe5-4374-accf-7bbb1c36b63d",
+   "metadata": {},
+   "source": [
+    "**Launch Evaluation Job**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "261f2366-8e90-48d4-90ff-08275b5f0ac4",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T20:32:03.445393',\n",
+      " 'updated_at': '2025-07-08T20:32:03.445396',\n",
+      " 'id': 'eval-TVYBnUrD4XhjDV4UCdXJ6y',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-GaJYgnmDA4Ta2TN23sV9Wo',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:40:13.061228',\n",
+      "            'updated_at': '2025-07-08T19:40:13.061229',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'llama-chat-target',\n",
+      "            'type': 'model',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': {'schema_version': '1.0',\n",
+      "                      'id': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'description': None,\n",
+      "                      'type_prefix': 'model',\n",
+      "                      'namespace': 'default',\n",
+      "                      'project': None,\n",
+      "                      'created_at': '2025-07-08T19:40:13.061012',\n",
+      "                      'updated_at': '2025-07-08T19:40:13.061015',\n",
+      "                      'custom_fields': {},\n",
+      "                      'ownership': None,\n",
+      "                      'name': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'version_id': 'main',\n",
+      "                      'version_tags': [],\n",
+      "                      'spec': None,\n",
+      "                      'artifact': None,\n",
+      "                      'base_model': None,\n",
+      "                      'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                       'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                       'api_key': None,\n",
+      "                                       'format': 'openai'},\n",
+      "                      'peft': None,\n",
+      "                      'prompt': None,\n",
+      "                      'guardrails': None},\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-XQpRHrKVucSQzLDXFdZqjq',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T20:31:57.416755',\n",
+      "            'updated_at': '2025-07-08T20:31:57.416756',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'similarity-configuration',\n",
+      "            'type': 'similarity_metrics',\n",
+      "            'params': {'parallelism': None,\n",
+      "                       'request_timeout': None,\n",
+      "                       'max_retries': None,\n",
+      "                       'limit_samples': None,\n",
+      "                       'max_tokens': 200,\n",
+      "                       'temperature': 0.7,\n",
+      "                       'top_p': None,\n",
+      "                       'stop': None,\n",
+      "                       'extra': {'top_k': 20}},\n",
+      "            'tasks': {'my-similarity-metrics-task': {'type': 'default',\n",
+      "                                                     'params': None,\n",
+      "                                                     'metrics': {'accuracy': {'type': 'accuracy',\n",
+      "                                                                              'params': None},\n",
+      "                                                                 'bleu': {'type': 'bleu',\n",
+      "                                                                          'params': None},\n",
+      "                                                                 'rouge': {'type': 'rouge',\n",
+      "                                                                           'params': None},\n",
+      "                                                                 'em': {'type': 'em',\n",
+      "                                                                        'params': None},\n",
+      "                                                                 'f1': {'type': 'f1',\n",
+      "                                                                        'params': None}},\n",
+      "                                                     'dataset': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'dataset-DALY8efJ78UgkXHcrfACGZ',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': None,\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-08T20:31:57.416530',\n",
+      "                                                                 'updated_at': '2025-07-08T20:31:57.416532',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'dataset-DALY8efJ78UgkXHcrfACGZ',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'format': None,\n",
+      "                                                                 'files_url': 'hf://datasets/nemo-eval-tutorial/similarity_eval/similarity_metrics/inputs.jsonl',\n",
+      "                                                                 'hf_endpoint': None,\n",
+      "                                                                 'split': None,\n",
+      "                                                                 'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/llama-chat-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/similarity-configuration\"\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "similarity_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "803ab48f-cba7-4285-9267-ab513b2862e2",
+   "metadata": {},
+   "source": [
+    "**Monitoring job status and results**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "78c6b851-4c6f-4a11-9c21-8c4a26da0950",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'my-similarity-metrics-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{similarity_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "83676ffb-2ef1-493a-a2a1-3dc1b41a72d0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'metrics': {'accuracy': {'scores': {'accuracy': {'value': 0.0}}},\n",
+      "             'bleu': {'scores': {'bleu_score': {'value': 0.015511131876432806}}},\n",
+      "             'em': {'scores': {'em': {'value': 0.0}}},\n",
+      "             'f1': {'scores': {'f1': {'value': 0.10128911130270025}}},\n",
+      "             'rouge': {'scores': {'rouge_1_score': {'value': 0.1166731565559731},\n",
+      "                                  'rouge_2_score': {'value': 0.03311328362331498},\n",
+      "                                  'rouge_3_score': {'value': 0.01193015710048473},\n",
+      "                                  'rouge_L_score': {'value': 0.09285834070191781}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{similarity_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks']['my-similarity-metrics-task'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b38981ff-5cae-4510-af60-f3458f2fc933",
+   "metadata": {},
+   "source": [
+    "### 4.2 LLM-as-Judge Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dc49476-b5a9-48a7-8453-c3f2cf0c4942",
+   "metadata": {},
+   "source": [
+    "**Upload Custom Dataset to Data Store**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "39f1d4b2-e93e-433c-95fa-3648c3139229",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "math_dataset.csv: 100%|██████████| 449/449 [00:00<00:00, 108kB/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='', commit_message='Upload llm_as_judge/math_dataset.csv with huggingface_hub', commit_description='', oid='32e22ec06ea372c6cfddf52cb733e86fd1b135ed', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 52,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "DATASET_NAME = \"llm_as_judge_data\"\n",
+    "repo_id = setup_dataset_repo(HF_API, NMS_NAMESPACE, DATASET_NAME, NEMO_URL)\n",
+    "HF_API.upload_file(\n",
+    "    path_or_fileobj='./eval_dataset/llm_judge_data/math_dataset.csv',\n",
+    "    path_in_repo=\"llm_as_judge/math_dataset.csv\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2868d978-17a8-4b94-bc46-007613d74ff4",
+   "metadata": {},
+   "source": [
+    "- **Item** — Represents the current item from the dataset.\n",
+    "- **Sample** — Contains data related to the output from the model. The `sample.output_text` represents the completion text for completion models and the content of the first message for chat models.\n",
+    "\n",
+    "The properties on the `item` object are derived from the dataset's column names (for CSVs) or keys (for JSONs). \n",
+    "The following rules apply to these properties:\n",
+    "\n",
+    "- All non-alphanumeric characters are replaced with underscores.\n",
+    "- Column names are converted to lowercase.\n",
+    "- In case of conflicts, suffixes (`_1`, `_2`, etc.), are appended to the property names.\n",
+    "\n",
+    "\n",
+    "**Templates for Chat Models**\n",
+    "\n",
+    "Prompt templates are used to structure tasks for evaluating the performance of models, specifically following the NIM/OpenAI format for chat-completion tasks. Templates use the Jinja2 templating syntax. Variables are represented using double-curly brackets, for example, `{{item.review}}`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0455449-058b-4c8b-b675-23cf5aed1c4d",
+   "metadata": {},
+   "source": [
+    "**Create Eval Config**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0724c9e9-1271-4601-811b-e027183ec6ac",
+   "metadata": {},
+   "source": [
+    "We will use `llama-3.3-70b-instruct` hosted on build.nvidia.com as the judge model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "id": "edd8f1d8-eb6a-483f-b133-3663a1bf8977",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T20:37:58.674883',\n",
+      " 'updated_at': '2025-07-08T20:37:58.674884',\n",
+      " 'name': 'custom_llm_as_judge_config',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'custom',\n",
+      " 'tasks': {'qa': {'type': 'completion',\n",
+      "                  'params': {'template': {'messages': [{'role': 'system',\n",
+      "                                                        'content': 'You are a '\n",
+      "                                                                   'helpful, '\n",
+      "                                                                   'respectful '\n",
+      "                                                                   'and honest '\n",
+      "                                                                   'assistant. \\n'\n",
+      "                                                                   'Answers '\n",
+      "                                                                   'the '\n",
+      "                                                                   'following '\n",
+      "                                                                   'question '\n",
+      "                                                                   'as briefly '\n",
+      "                                                                   'as you '\n",
+      "                                                                   'can.\\n'\n",
+      "                                                                   '.'},\n",
+      "                                                       {'role': 'user',\n",
+      "                                                        'content': 'Answer '\n",
+      "                                                                   'very '\n",
+      "                                                                   'briefly '\n",
+      "                                                                   '(no '\n",
+      "                                                                   'explanation) '\n",
+      "                                                                   'this '\n",
+      "                                                                   'question: '\n",
+      "                                                                   '{{item.question}}'}]}},\n",
+      "                  'metrics': {'accuracy': {'type': 'string-check',\n",
+      "                                           'params': {'check': ['{{sample.output_text}}',\n",
+      "                                                                'contains',\n",
+      "                                                                '{{item.answer}}']}},\n",
+      "                              'bleu': {'type': 'bleu',\n",
+      "                                       'params': {'references': ['{{item.reference_answer}}']}},\n",
+      "                              'accuracy-llm-judge': {'type': 'llm-judge',\n",
+      "                                                     'params': {'model': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                                           'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                                           'api_key': '******'}},\n",
+      "                                                                'template': {'messages': [{'role': 'system',\n",
+      "                                                                                           'content': 'Your '\n",
+      "                                                                                                      'task '\n",
+      "                                                                                                      'is '\n",
+      "                                                                                                      'to '\n",
+      "                                                                                                      'evaluate '\n",
+      "                                                                                                      'the '\n",
+      "                                                                                                      'semantic '\n",
+      "                                                                                                      'similarity '\n",
+      "                                                                                                      'between '\n",
+      "                                                                                                      'two '\n",
+      "                                                                                                      'responses.'},\n",
+      "                                                                                          {'role': 'user',\n",
+      "                                                                                           'content': 'Respond '\n",
+      "                                                                                                      'in '\n",
+      "                                                                                                      'the '\n",
+      "                                                                                                      'following '\n",
+      "                                                                                                      'format '\n",
+      "                                                                                                      'SIMILARITY: '\n",
+      "                                                                                                      '4. '\n",
+      "                                                                                                      'The '\n",
+      "                                                                                                      'similarity '\n",
+      "                                                                                                      'should '\n",
+      "                                                                                                      'be '\n",
+      "                                                                                                      'a '\n",
+      "                                                                                                      'score '\n",
+      "                                                                                                      'between '\n",
+      "                                                                                                      '0 '\n",
+      "                                                                                                      'and '\n",
+      "                                                                                                      '10.\\n'\n",
+      "                                                                                                      '\\n'\n",
+      "                                                                                                      'RESPONSE '\n",
+      "                                                                                                      '1: '\n",
+      "                                                                                                      '{{item.reference_answer}}\\n'\n",
+      "                                                                                                      '\\n'\n",
+      "                                                                                                      'RESPONSE '\n",
+      "                                                                                                      '2: '\n",
+      "                                                                                                      '{{sample.output_text}}.\\n'\n",
+      "                                                                                                      '\\n'}]},\n",
+      "                                                                'scores': {'similarity': {'type': 'int',\n",
+      "                                                                                          'parser': {'type': 'regex',\n",
+      "                                                                                                     'pattern': 'SIMILARITY: '\n",
+      "                                                                                                                '(\\\\d)'}}}}}},\n",
+      "                  'dataset': {'schema_version': '1.0',\n",
+      "                              'id': 'dataset-4cBEc7vpZh8aBPpuGpNoFD',\n",
+      "                              'namespace': 'default',\n",
+      "                              'created_at': '2025-07-08T20:37:58.674690',\n",
+      "                              'updated_at': '2025-07-08T20:37:58.674693',\n",
+      "                              'custom_fields': {},\n",
+      "                              'name': 'dataset-4cBEc7vpZh8aBPpuGpNoFD',\n",
+      "                              'version_id': 'main',\n",
+      "                              'version_tags': [],\n",
+      "                              'files_url': 'hf://datasets/nemo-eval-tutorial/llm_as_judge_data/llm_as_judge/math_dataset.csv'}}},\n",
+      " 'id': 'eval-config-J9aERJok5jacrTdhYcyM1H',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "  \"type\": \"custom\",\n",
+    "  \"namespace\": NMS_NAMESPACE,\n",
+    "  \"name\": \"custom_llm_as_judge_config\",\n",
+    "  \"tasks\": {\n",
+    "    \"qa\": {\n",
+    "      \"type\": \"completion\",\n",
+    "      \"params\": {\n",
+    "        \"template\": {\n",
+    "          \"messages\": [{\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a helpful, respectful and honest assistant. \\nAnswers the following question as briefly as you can.\\n.\"\n",
+    "            }, \n",
+    "            { \n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Answer very briefly (no explanation) this question: {{item.question}}\"\n",
+    "            }]\n",
+    "        }\n",
+    "      },\n",
+    "      \"metrics\": {\n",
+    "        \"accuracy\": {\n",
+    "          \"type\": \"string-check\",\n",
+    "          \"params\": {\n",
+    "            \"check\": [\n",
+    "              \"{{sample.output_text}}\",\n",
+    "              \"contains\",\n",
+    "              \"{{item.answer}}\"\n",
+    "            ]\n",
+    "          }\n",
+    "        },\n",
+    "        \"bleu\": {\n",
+    "            \"type\": \"bleu\",\n",
+    "            \"params\": {\n",
+    "                \"references\": [\n",
+    "                \"{{item.reference_answer}}\"\n",
+    "                ]\n",
+    "             }\n",
+    "        },\n",
+    "        \"accuracy-llm-judge\": {\n",
+    "          \"type\": \"llm-judge\",\n",
+    "          \"params\": {\n",
+    "            \"model\": {\n",
+    "              \"api_endpoint\": {\n",
+    "                \"url\": \"https://integrate.api.nvidia.com/v1/chat/completions\",\n",
+    "                \"model_id\": \"meta/llama-3.3-70b-instruct\",\n",
+    "                \"api_key\": os.environ[\"NVIDIA_API_KEY\"]\n",
+    "              }\n",
+    "            },\n",
+    "            \"template\": {\n",
+    "              \"messages\": [\n",
+    "                {\n",
+    "                  \"role\": \"system\",\n",
+    "                  \"content\": \"Your task is to evaluate the semantic similarity between two responses.\"\n",
+    "                },\n",
+    "                {\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": (\n",
+    "                        \"Respond in the following format SIMILARITY: 4. \"\n",
+    "                        \"The similarity should be a score between 0 and 10.\\n\\n\"\n",
+    "                        \"RESPONSE 1: {{item.reference_answer}}\\n\\n\"\n",
+    "                        \"RESPONSE 2: {{sample.output_text}}.\\n\\n\"\n",
+    "                    )\n",
+    "                }\n",
+    "              ]\n",
+    "            },\n",
+    "            \"scores\": {\n",
+    "              \"similarity\": {\n",
+    "                \"type\": \"int\",\n",
+    "                \"parser\": {\n",
+    "                  \"type\": \"regex\",\n",
+    "                  \"pattern\": \"SIMILARITY: (\\\\d)\"\n",
+    "                }\n",
+    "              }\n",
+    "            }\n",
+    "          }\n",
+    "        }\n",
+    "      },\n",
+    "      \"dataset\": {\n",
+    "        \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/llm_as_judge/math_dataset.csv\"\n",
+    "      }\n",
+    "    }\n",
+    "  }\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8b7b997-0a5b-489d-94b0-531be7d6270d",
+   "metadata": {},
+   "source": [
+    "**Launch Eval Job**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "696c6ff9-f725-4990-81c1-775b6b662fa0",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-08T20:38:04.820220',\n",
+      " 'updated_at': '2025-07-08T20:38:04.820223',\n",
+      " 'id': 'eval-MoXUyarWP3jBR8h6LqyhWU',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-GaJYgnmDA4Ta2TN23sV9Wo',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:40:13.061228',\n",
+      "            'updated_at': '2025-07-08T19:40:13.061229',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'llama-chat-target',\n",
+      "            'type': 'model',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': {'schema_version': '1.0',\n",
+      "                      'id': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'description': None,\n",
+      "                      'type_prefix': 'model',\n",
+      "                      'namespace': 'default',\n",
+      "                      'project': None,\n",
+      "                      'created_at': '2025-07-08T19:40:13.061012',\n",
+      "                      'updated_at': '2025-07-08T19:40:13.061015',\n",
+      "                      'custom_fields': {},\n",
+      "                      'ownership': None,\n",
+      "                      'name': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'version_id': 'main',\n",
+      "                      'version_tags': [],\n",
+      "                      'spec': None,\n",
+      "                      'artifact': None,\n",
+      "                      'base_model': None,\n",
+      "                      'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                       'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                       'api_key': None,\n",
+      "                                       'format': 'openai'},\n",
+      "                      'peft': None,\n",
+      "                      'prompt': None,\n",
+      "                      'guardrails': None},\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-J9aERJok5jacrTdhYcyM1H',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T20:37:58.674883',\n",
+      "            'updated_at': '2025-07-08T20:37:58.674884',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'custom_llm_as_judge_config',\n",
+      "            'type': 'custom',\n",
+      "            'params': None,\n",
+      "            'tasks': {'qa': {'type': 'completion',\n",
+      "                             'params': {'template': {'messages': [{'role': 'system',\n",
+      "                                                                   'content': 'You '\n",
+      "                                                                              'are '\n",
+      "                                                                              'a '\n",
+      "                                                                              'helpful, '\n",
+      "                                                                              'respectful '\n",
+      "                                                                              'and '\n",
+      "                                                                              'honest '\n",
+      "                                                                              'assistant. \\n'\n",
+      "                                                                              'Answers '\n",
+      "                                                                              'the '\n",
+      "                                                                              'following '\n",
+      "                                                                              'question '\n",
+      "                                                                              'as '\n",
+      "                                                                              'briefly '\n",
+      "                                                                              'as '\n",
+      "                                                                              'you '\n",
+      "                                                                              'can.\\n'\n",
+      "                                                                              '.'},\n",
+      "                                                                  {'role': 'user',\n",
+      "                                                                   'content': 'Answer '\n",
+      "                                                                              'very '\n",
+      "                                                                              'briefly '\n",
+      "                                                                              '(no '\n",
+      "                                                                              'explanation) '\n",
+      "                                                                              'this '\n",
+      "                                                                              'question: '\n",
+      "                                                                              '{{item.question}}'}]}},\n",
+      "                             'metrics': {'accuracy': {'type': 'string-check',\n",
+      "                                                      'params': {'check': ['{{sample.output_text}}',\n",
+      "                                                                           'contains',\n",
+      "                                                                           '{{item.answer}}']}},\n",
+      "                                         'bleu': {'type': 'bleu',\n",
+      "                                                  'params': {'references': ['{{item.reference_answer}}']}},\n",
+      "                                         'accuracy-llm-judge': {'type': 'llm-judge',\n",
+      "                                                                'params': {'model': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                                                      'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                                                      'api_key': '******'}},\n",
+      "                                                                           'template': {'messages': [{'role': 'system',\n",
+      "                                                                                                      'content': 'Your '\n",
+      "                                                                                                                 'task '\n",
+      "                                                                                                                 'is '\n",
+      "                                                                                                                 'to '\n",
+      "                                                                                                                 'evaluate '\n",
+      "                                                                                                                 'the '\n",
+      "                                                                                                                 'semantic '\n",
+      "                                                                                                                 'similarity '\n",
+      "                                                                                                                 'between '\n",
+      "                                                                                                                 'two '\n",
+      "                                                                                                                 'responses.'},\n",
+      "                                                                                                     {'role': 'user',\n",
+      "                                                                                                      'content': 'Respond '\n",
+      "                                                                                                                 'in '\n",
+      "                                                                                                                 'the '\n",
+      "                                                                                                                 'following '\n",
+      "                                                                                                                 'format '\n",
+      "                                                                                                                 'SIMILARITY: '\n",
+      "                                                                                                                 '4. '\n",
+      "                                                                                                                 'The '\n",
+      "                                                                                                                 'similarity '\n",
+      "                                                                                                                 'should '\n",
+      "                                                                                                                 'be '\n",
+      "                                                                                                                 'a '\n",
+      "                                                                                                                 'score '\n",
+      "                                                                                                                 'between '\n",
+      "                                                                                                                 '0 '\n",
+      "                                                                                                                 'and '\n",
+      "                                                                                                                 '10.\\n'\n",
+      "                                                                                                                 '\\n'\n",
+      "                                                                                                                 'RESPONSE '\n",
+      "                                                                                                                 '1: '\n",
+      "                                                                                                                 '{{item.reference_answer}}\\n'\n",
+      "                                                                                                                 '\\n'\n",
+      "                                                                                                                 'RESPONSE '\n",
+      "                                                                                                                 '2: '\n",
+      "                                                                                                                 '{{sample.output_text}}.\\n'\n",
+      "                                                                                                                 '\\n'}]},\n",
+      "                                                                           'scores': {'similarity': {'type': 'int',\n",
+      "                                                                                                     'parser': {'type': 'regex',\n",
+      "                                                                                                                'pattern': 'SIMILARITY: '\n",
+      "                                                                                                                           '(\\\\d)'}}}}}},\n",
+      "                             'dataset': {'schema_version': '1.0',\n",
+      "                                         'id': 'dataset-4cBEc7vpZh8aBPpuGpNoFD',\n",
+      "                                         'description': None,\n",
+      "                                         'type_prefix': None,\n",
+      "                                         'namespace': 'default',\n",
+      "                                         'project': None,\n",
+      "                                         'created_at': '2025-07-08T20:37:58.674690',\n",
+      "                                         'updated_at': '2025-07-08T20:37:58.674693',\n",
+      "                                         'custom_fields': {},\n",
+      "                                         'ownership': None,\n",
+      "                                         'name': 'dataset-4cBEc7vpZh8aBPpuGpNoFD',\n",
+      "                                         'version_id': 'main',\n",
+      "                                         'version_tags': [],\n",
+      "                                         'format': None,\n",
+      "                                         'files_url': 'hf://datasets/nemo-eval-tutorial/llm_as_judge_data/llm_as_judge/math_dataset.csv',\n",
+      "                                         'hf_endpoint': None,\n",
+      "                                         'split': None,\n",
+      "                                         'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/llama-chat-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/custom_llm_as_judge_config\"\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "llm_judge_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a53bbd6-0878-4a54-833b-50f928930943",
+   "metadata": {},
+   "source": [
+    "**Monitoring job status and results**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "8c91d6a2-3bf8-46ca-a169-4aad5289bd0a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully.',\n",
+      " 'task_status': {'qa': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{llm_judge_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "id": "210dff5c-53e0-4650-96b1-aa0f509cc8da",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'metrics': {'accuracy': {'scores': {'string-check': {'value': 1.0,\n",
+      "                                                      'stats': {'count': 10,\n",
+      "                                                                'sum': 10.0,\n",
+      "                                                                'mean': 1.0}}}},\n",
+      "             'bleu': {'scores': {'sentence': {'value': 9.135501080023044,\n",
+      "                                              'stats': {'count': 10,\n",
+      "                                                        'sum': 91.35501080023045,\n",
+      "                                                        'mean': 9.135501080023044}},\n",
+      "                                 'corpus': {'value': 0.0}}},\n",
+      "             'accuracy-llm-judge': {'scores': {'similarity': {'value': 1.7,\n",
+      "                                                              'stats': {'count': 10,\n",
+      "                                                                        'sum': 17.0,\n",
+      "                                                                        'mean': 1.7}}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{llm_judge_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks']['qa'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56efd95-df06-4ab1-b5ed-482d91587bca",
+   "metadata": {},
+   "source": [
+    "### 4.3 Tool Calling Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12881c51-76a7-45c3-b634-91c112f480d1",
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "source": [
+    "Required dataset format for a custom tool calling evaluation:\n",
+    "\n",
+    "```json\n",
+    "[\n",
+    "    {\n",
+    "        \"messages\": [\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Find the area of a triangle with a base of 10 units and height of 5 units.\"\n",
+    "            }\n",
+    "        ],\n",
+    "        \"tools\": [\n",
+    "            {\n",
+    "                \"type\": \"function\",\n",
+    "                \"function\": {\n",
+    "                    \"name\": \"calculate_triangle_area\",\n",
+    "                    \"description\": \"Calculate the area of a triangle given its base and height.\",\n",
+    "                    \"parameters\": {\n",
+    "                        \"type\": \"object\",\n",
+    "                        \"properties\": {\n",
+    "                            \"base\": {\n",
+    "                                \"type\": \"integer\",\n",
+    "                                \"description\": \"The base of the triangle.\"\n",
+    "                            },\n",
+    "                            \"height\": {\n",
+    "                                \"type\": \"integer\",\n",
+    "                                \"description\": \"The height of the triangle.\"\n",
+    "                            },\n",
+    "                            \"unit\": {\n",
+    "                                \"type\": \"string\",\n",
+    "                                \"description\": \"The unit of measure (defaults to \\\"units\\\" if not specified)\"\n",
+    "                            }\n",
+    "                        },\n",
+    "                        \"required\": [\n",
+    "                            \"base\",\n",
+    "                            \"height\"\n",
+    "                        ]\n",
+    "                    }\n",
+    "                }\n",
+    "            }\n",
+    "        ],\n",
+    "        \"tool_calls\": [\n",
+    "            {\n",
+    "                \"function\": {\n",
+    "                    \"name\": \"calculate_triangle_area\",\n",
+    "                    \"arguments\": {\n",
+    "                        \"base\": 10,\n",
+    "                        \"height\": 5,\n",
+    "                        \"unit\": \"units\"\n",
+    "                    }\n",
+    "                }\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    "]\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92321a60-1fc7-4cde-b0cb-4e35e5d9400e",
+   "metadata": {},
+   "source": [
+    "**Upload Custom Tool Calling Dataset to Data Store**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 95,
+   "id": "125aee37-6df4-4504-984a-25be17e70f67",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='', commit_message='Upload aiva_tool_call.jsonl with huggingface_hub', commit_description='', oid='8ba99b4e0b4b2224b059c6cacda3b86d724c406e', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 95,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "DATASET_NAME = \"tool_call_data\"\n",
+    "repo_id = setup_dataset_repo(HF_API, NMS_NAMESPACE, DATASET_NAME, NEMO_URL)\n",
+    "HF_API.upload_file(\n",
+    "    path_or_fileobj='./eval_dataset/tool_call_data/aiva_tool_call.jsonl',\n",
+    "    path_in_repo=\"aiva_tool_call.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2414cd57-1ad8-4722-b8e2-ff15186db0dc",
+   "metadata": {},
+   "source": [
+    "**Create Eval Config**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 96,
+   "id": "16ce4448-b3c6-4b4f-9cb2-7811ce9533f9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T02:18:55.955723',\n",
+      " 'updated_at': '2025-07-09T02:18:55.955724',\n",
+      " 'name': 'tool-call-eval-config',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'custom',\n",
+      " 'tasks': {'custom-tool-calling': {'type': 'chat-completion',\n",
+      "                                   'params': {'template': {'messages': '{{ '\n",
+      "                                                                       'item.messages '\n",
+      "                                                                       '| '\n",
+      "                                                                       'tojson}}',\n",
+      "                                                           'tools': '{{ '\n",
+      "                                                                    'item.tools '\n",
+      "                                                                    '| tojson '\n",
+      "                                                                    '}}',\n",
+      "                                                           'tool_choice': 'auto'}},\n",
+      "                                   'metrics': {'tool-calling-accuracy': {'type': 'tool-calling',\n",
+      "                                                                         'params': {'tool_calls_ground_truth': '{{ '\n",
+      "                                                                                                               'item.tool_calls '\n",
+      "                                                                                                               '| '\n",
+      "                                                                                                               'tojson '\n",
+      "                                                                                                               '}}'}}},\n",
+      "                                   'dataset': {'schema_version': '1.0',\n",
+      "                                               'id': 'dataset-FHLbB4tDWWXwkeQShawtGg',\n",
+      "                                               'namespace': 'default',\n",
+      "                                               'created_at': '2025-07-09T02:18:55.955557',\n",
+      "                                               'updated_at': '2025-07-09T02:18:55.955560',\n",
+      "                                               'custom_fields': {},\n",
+      "                                               'name': 'dataset-FHLbB4tDWWXwkeQShawtGg',\n",
+      "                                               'version_id': 'main',\n",
+      "                                               'version_tags': [],\n",
+      "                                               'files_url': 'hf://datasets/nemo-eval-tutorial/tool_call_data/aiva_tool_call.jsonl'}}},\n",
+      " 'id': 'eval-config-N6PkTGVcZNP8ho4LwHYxqg',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "config_payload = {\n",
+    "    \"type\": \"custom\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"name\": \"tool-call-eval-config\",\n",
+    "    \"tasks\": {\n",
+    "        \"custom-tool-calling\": {\n",
+    "            \"type\": \"chat-completion\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/aiva_tool_call.jsonl\",\n",
+    "            },\n",
+    "            \"params\": {\n",
+    "                \"template\": {\n",
+    "                    \"messages\": \"{{ item.messages | tojson}}\",\n",
+    "                    \"tools\": \"{{ item.tools | tojson }}\",\n",
+    "                    \"tool_choice\": \"auto\"\n",
+    "                }\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"tool-calling-accuracy\": {\n",
+    "                    \"type\": \"tool-calling\",\n",
+    "                    \"params\": {\"tool_calls_ground_truth\": \"{{ item.tool_calls | tojson }}\"}\n",
+    "                }\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "resp = requests.post(config_url, json=config_payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de5b228c-7502-48eb-a652-0c98f905b88f",
+   "metadata": {},
+   "source": [
+    "**Launch Eval Job**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 97,
+   "id": "d8db6b64-9a2e-4da8-bcc1-c7fe9f4651a3",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T02:19:10.833300',\n",
+      " 'updated_at': '2025-07-09T02:19:10.833303',\n",
+      " 'id': 'eval-P6pSKQ2MhjQGFdT85yBBqm',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-GaJYgnmDA4Ta2TN23sV9Wo',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-08T19:40:13.061228',\n",
+      "            'updated_at': '2025-07-08T19:40:13.061229',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'llama-chat-target',\n",
+      "            'type': 'model',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': {'schema_version': '1.0',\n",
+      "                      'id': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'description': None,\n",
+      "                      'type_prefix': 'model',\n",
+      "                      'namespace': 'default',\n",
+      "                      'project': None,\n",
+      "                      'created_at': '2025-07-08T19:40:13.061012',\n",
+      "                      'updated_at': '2025-07-08T19:40:13.061015',\n",
+      "                      'custom_fields': {},\n",
+      "                      'ownership': None,\n",
+      "                      'name': 'model-EXZQbF8ZXbjoogTvF9UFKr',\n",
+      "                      'version_id': 'main',\n",
+      "                      'version_tags': [],\n",
+      "                      'spec': None,\n",
+      "                      'artifact': None,\n",
+      "                      'base_model': None,\n",
+      "                      'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                       'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                       'api_key': None,\n",
+      "                                       'format': 'openai'},\n",
+      "                      'peft': None,\n",
+      "                      'prompt': None,\n",
+      "                      'guardrails': None},\n",
+      "            'retriever': None,\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-N6PkTGVcZNP8ho4LwHYxqg',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T02:18:55.955723',\n",
+      "            'updated_at': '2025-07-09T02:18:55.955724',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'tool-call-eval-config',\n",
+      "            'type': 'custom',\n",
+      "            'params': None,\n",
+      "            'tasks': {'custom-tool-calling': {'type': 'chat-completion',\n",
+      "                                              'params': {'template': {'messages': '{{ '\n",
+      "                                                                                  'item.messages '\n",
+      "                                                                                  '| '\n",
+      "                                                                                  'tojson}}',\n",
+      "                                                                      'tools': '{{ '\n",
+      "                                                                               'item.tools '\n",
+      "                                                                               '| '\n",
+      "                                                                               'tojson '\n",
+      "                                                                               '}}',\n",
+      "                                                                      'tool_choice': 'auto'}},\n",
+      "                                              'metrics': {'tool-calling-accuracy': {'type': 'tool-calling',\n",
+      "                                                                                    'params': {'tool_calls_ground_truth': '{{ '\n",
+      "                                                                                                                          'item.tool_calls '\n",
+      "                                                                                                                          '| '\n",
+      "                                                                                                                          'tojson '\n",
+      "                                                                                                                          '}}'}}},\n",
+      "                                              'dataset': {'schema_version': '1.0',\n",
+      "                                                          'id': 'dataset-FHLbB4tDWWXwkeQShawtGg',\n",
+      "                                                          'description': None,\n",
+      "                                                          'type_prefix': None,\n",
+      "                                                          'namespace': 'default',\n",
+      "                                                          'project': None,\n",
+      "                                                          'created_at': '2025-07-09T02:18:55.955557',\n",
+      "                                                          'updated_at': '2025-07-09T02:18:55.955560',\n",
+      "                                                          'custom_fields': {},\n",
+      "                                                          'ownership': None,\n",
+      "                                                          'name': 'dataset-FHLbB4tDWWXwkeQShawtGg',\n",
+      "                                                          'version_id': 'main',\n",
+      "                                                          'version_tags': [],\n",
+      "                                                          'format': None,\n",
+      "                                                          'files_url': 'hf://datasets/nemo-eval-tutorial/tool_call_data/aiva_tool_call.jsonl',\n",
+      "                                                          'hf_endpoint': None,\n",
+      "                                                          'split': None,\n",
+      "                                                          'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/llama-chat-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/tool-call-eval-config\"\n",
+    "}\n",
+    "headers = {\n",
+    "    \"accept\": \"application/json\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "tool_call_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e023fe25-407f-4f9d-a699-7c28bbcb7324",
+   "metadata": {},
+   "source": [
+    "**Monitoring job status and results**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 99,
+   "id": "b59180c3-a1ec-4a9c-abc0-ca73b1a5150d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully.',\n",
+      " 'task_status': {'custom-tool-calling': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{tool_call_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 100,
+   "id": "35e63334-6702-4ec0-9188-45b6bf722694",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'tool-calling-accuracy': {'scores': {'function_name_accuracy': {'value': 0.9,\n",
+      "                                                                 'stats': {'count': 10,\n",
+      "                                                                           'sum': 9.0,\n",
+      "                                                                           'mean': 0.9}},\n",
+      "                                      'function_name_and_args_accuracy': {'value': 0.0,\n",
+      "                                                                          'stats': {'count': 10,\n",
+      "                                                                                    'sum': 0.0,\n",
+      "                                                                                    'mean': 0.0}}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{tool_call_eval_job_id}/results\")\n",
+    "pp(resp.json()['tasks']['custom-tool-calling']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "feb170f5-3c4e-42a1-9262-878ad7d8788d",
+   "metadata": {},
+   "source": [
+    "## 5. Retriever Pipeline Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d19c48e-9da1-45c0-a2b1-2b97e1c515cc",
+   "metadata": {},
+   "source": [
+    "### 5.1 Deploy Retriever Models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61950b31-112d-404c-a0f1-3a795d71f3da",
+   "metadata": {},
+   "source": [
+    "To evaluate retriever pipelines, retriever models must be deployed locally. For this tutorial, we will create a retriever pipeline with both embedding and reranking models. Specifically, we will deploy two retriever NIMs for `llama-3.2-nv-embedqa-1b-v2` and `llama-3.2-nv-rerankqa-1b-v2`, respectively, using Docker."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f073bbc8-756b-472c-b19c-dadb870f668a",
+   "metadata": {},
+   "source": [
+    "First, let's identify a free GPU to deploy the retriever models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "ae46e61b-ebf8-48b8-ab2e-365b7c346167",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Wed Jul  9 06:27:12 2025       \n",
+      "+---------------------------------------------------------------------------------------+\n",
+      "| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |\n",
+      "|-----------------------------------------+----------------------+----------------------+\n",
+      "| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+      "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
+      "|                                         |                      |               MIG M. |\n",
+      "|=========================================+======================+======================|\n",
+      "|   0  NVIDIA A100 80GB PCIe          On  | 00000002:00:01.0 Off |                    0 |\n",
+      "| N/A   34C    P0              43W / 300W |      0MiB / 81920MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "|   1  NVIDIA A100 80GB PCIe          On  | 00000002:00:02.0 Off |                    0 |\n",
+      "| N/A   36C    P0              62W / 300W |  70969MiB / 81920MiB |      0%      Default |\n",
+      "|                                         |                      |             Disabled |\n",
+      "+-----------------------------------------+----------------------+----------------------+\n",
+      "                                                                                         \n",
+      "+---------------------------------------------------------------------------------------+\n",
+      "| Processes:                                                                            |\n",
+      "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
+      "|        ID   ID                                                             Usage      |\n",
+      "|=======================================================================================|\n",
+      "|    1   N/A  N/A     79075      C   /opt/nim/llm/.venv/bin/python3            70958MiB |\n",
+      "+---------------------------------------------------------------------------------------+\n"
+     ]
+    }
+   ],
+   "source": [
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c8fed34-e457-4c76-ba20-a6a55d7334c5",
+   "metadata": {},
+   "source": [
+    "Then, we can deploy the `llama-3.2-nv-embedqa-1b-v2` embedding NIM (replace `<your-api-key>` with your NGC API Key below):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ec69ee62-b42f-4d12-97a6-0c88ad0cfd2a",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "export NGC_API_KEY=<your-api-key>\n",
+    "echo \"$NGC_API_KEY\" | docker login nvcr.io --username '$oauthtoken' --password-stdin\n",
+    "export NIM_MODEL_NAME=nvidia/llama-3.2-nv-embedqa-1b-v2\n",
+    "export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)\n",
+    "\n",
+    "# Choose a NIM Image from NGC\n",
+    "export IMG_NAME=\"nvcr.io/nim/$NIM_MODEL_NAME:1.5.0\"\n",
+    "\n",
+    "# Choose a path on your system to cache the downloaded models\n",
+    "export LOCAL_NIM_CACHE=~/.cache/nim\n",
+    "mkdir -p \"$LOCAL_NIM_CACHE\"\n",
+    "\n",
+    "# Start the NIM\n",
+    "docker run -d --rm --name=$CONTAINER_NAME \\\n",
+    "  --runtime=nvidia \\\n",
+    "  --gpus '\"device=0\"' \\\n",
+    "  --shm-size=16GB \\\n",
+    "  -e NGC_API_KEY \\\n",
+    "  -v \"$LOCAL_NIM_CACHE:/opt/nim/.cache\" \\\n",
+    "  -u $(id -u) \\\n",
+    "  -p 8000:8000 \\\n",
+    "  $IMG_NAME"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48cda92d-4b0b-4994-a962-95d89e1caa47",
+   "metadata": {},
+   "source": [
+    "Next, we will deploy the `llama-3.2-nv-rerankqa-1b-v2` reranking NIM:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "212ef05a-1446-4e7b-bbeb-2ac96c3f3d54",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "export NIM_MODEL_NAME=nvidia/llama-3.2-nv-rerankqa-1b-v2\n",
+    "export CONTAINER_NAME=$(basename $NIM_MODEL_NAME)\n",
+    "\n",
+    "# Choose a NIM Image from NGC\n",
+    "export IMG_NAME=\"nvcr.io/nim/$NIM_MODEL_NAME:1.3.0\"\n",
+    "\n",
+    "# Choose a path on your system to cache the downloaded models\n",
+    "export LOCAL_NIM_CACHE=~/.cache/nim\n",
+    "mkdir -p \"$LOCAL_NIM_CACHE\"\n",
+    "\n",
+    "# Start the NIM\n",
+    "docker run -d --rm --name=$CONTAINER_NAME \\\n",
+    "  --runtime=nvidia \\\n",
+    "  --gpus '\"device=0\"' \\\n",
+    "  --shm-size=16GB \\\n",
+    "  -e NGC_API_KEY \\\n",
+    "  -v \"$LOCAL_NIM_CACHE:/opt/nim/.cache\" \\\n",
+    "  -u $(id -u) \\\n",
+    "  -p 8001:8000 \\\n",
+    "  $IMG_NAME"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e783806e-beac-4133-8027-6e78266c440b",
+   "metadata": {},
+   "source": [
+    "Get Host IP address and specify the API endpoints for embedding and reranking NIMs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "a717b839-9db9-4090-8478-6b1dff70bf94",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0.0 via 172.27.16.1 dev ens3 src 172.27.20.120 uid 1000 \n"
+     ]
+    }
+   ],
+   "source": [
+    "! ip route get 1 | grep src"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "7710cc1a-b377-4f78-9d47-30be94361d65",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embed_url = \"http://172.27.20.120:8000/v1/embeddings\"\n",
+    "embed_model_name = \"nvidia/llama-3.2-nv-embedqa-1b-v2\"\n",
+    "rerank_url = \"http://172.27.20.120:8001/v1/ranking\"\n",
+    "rerank_model_name = \"nvidia/llama-3.2-nv-rerankqa-1b-v2\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aaa6fcf-9941-4e72-8e32-a88480036bb8",
+   "metadata": {},
+   "source": [
+    "### 5.2 Set up Milvus Vector Database"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "504ddb82-b4eb-4041-b95c-52d3981dbbd5",
+   "metadata": {},
+   "source": [
+    "To run retriever or RAG evaluations, you must first enable the Milvus document store by setting `milvus.enabled: true` in values.yaml, and then upgrade your Helm deployment to apply the change."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3833f8ae-acc4-4e5c-a468-65bdd579db95",
+   "metadata": {},
+   "source": [
+    "```yaml\n",
+    "evaluator:\n",
+    "  enabled: true\n",
+    "  milvus:\n",
+    "    enabled: true\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4624b0a-8478-4ca9-8f9e-48295b517916",
+   "metadata": {},
+   "source": [
+    "You should see a milvus pod running:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "a47709ee-858c-4feb-83c8-a28f15a741cd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "NAME                                                          READY   STATUS      RESTARTS      AGE\n",
+      "model-downloader-meta-llama-3-1-8b-instruct-2-0-28trx         0/1     Completed   0             29m\n",
+      "model-downloader-meta-llama-3-2-1b-instruct-2-0-b6scn         0/1     Completed   0             29m\n",
+      "modeldeployment-meta-llama-3-1-8b-instruct-6b64d56fdc-slctn   1/1     Running     0             28m\n",
+      "nemo-argo-workflows-server-655f8d755-svgn2                    1/1     Running     0             30m\n",
+      "nemo-argo-workflows-workflow-controller-8f8877cd4-8t2tf       1/1     Running     0             30m\n",
+      "nemo-customizer-5d8554fcf6-rhwfp                              1/1     Running     2 (29m ago)   30m\n",
+      "nemo-customizerdb-0                                           1/1     Running     0             30m\n",
+      "nemo-data-store-795ccbb97b-nwcf2                              1/1     Running     0             30m\n",
+      "nemo-deployment-management-646cc67c-l67lq                     1/1     Running     0             30m\n",
+      "nemo-entity-store-6bfd7c4bd9-p4s4b                            1/1     Running     0             30m\n",
+      "nemo-entity-storedb-0                                         1/1     Running     0             30m\n",
+      "nemo-evaluator-68c67f6798-f4sqh                               1/1     Running     0             30m\n",
+      "nemo-evaluatordb-0                                            1/1     Running     0             30m\n",
+      "nemo-guardrails-5985f8c6d4-bkkq2                              1/1     Running     0             30m\n",
+      "nemo-milvus-standalone-57cc6c46fc-5rctk                       1/1     Running     0             30m\n",
+      "nemo-nemo-operator-controller-manager-59c98d97f9-7gfsw        2/2     Running     0             30m\n",
+      "nemo-nim-operator-765cbf686b-w9n6c                            1/1     Running     0             30m\n",
+      "nemo-nim-proxy-6fbccbfb45-8qvmz                               1/1     Running     0             30m\n",
+      "nemo-opentelemetry-collector-6f46646fd9-5tnrf                 1/1     Running     0             30m\n",
+      "nemo-postgresql-0                                             1/1     Running     0             30m\n"
+     ]
+    }
+   ],
+   "source": [
+    "! kubectl get pod"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94301926-71c9-4590-b5fd-1208f0917422",
+   "metadata": {},
+   "source": [
+    "### 5.3 Evaluate Embedding Pipeline on FIQA Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "3fb1b610-d5be-4f1f-8536-0aa540646096",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target_payload = {\n",
+    "    \"type\": \"retriever\",\n",
+    "    \"name\": \"embed-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"retriever\": {\n",
+    "    \"pipeline\": {\n",
+    "        \"query_embedding_model\": {\n",
+    "        \"api_endpoint\": {\n",
+    "            \"url\": embed_url,\n",
+    "            \"model_id\": embed_model_name,\n",
+    "       }\n",
+    "     },\n",
+    "     \"index_embedding_model\": {\n",
+    "        \"api_endpoint\": {\n",
+    "            \"url\": embed_url,\n",
+    "            \"model_id\": embed_model_name,\n",
+    "        }\n",
+    "     },\n",
+    "     \"top_k\": 10\n",
+    "    }\n",
+    " }\n",
+    "}\n",
+    "\n",
+    "config_payload = {\n",
+    "    \"type\": \"retriever\",\n",
+    "    \"name\": \"fiqa-config\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"tasks\": {\n",
+    "        \"my-beir-task\": {\n",
+    "            \"type\": \"beir\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": \"file://fiqa/\"\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"recall_5\": {\"type\": \"recall_5\"},\n",
+    "                \"ndcg_cut_5\": {\"type\": \"ndcg_cut_5\"},\n",
+    "                \"recall_10\": {\"type\": \"recall_10\"},\n",
+    "                \"ndcg_cut_10\": {\"type\": \"ndcg_cut_10\"}\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "resp1 = requests.post(target_url, json=target_payload, headers=headers)\n",
+    "resp2 = requests.post(config_url, json=config_payload, headers=headers)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "7fc2e8c6-abc0-4197-a1c6-ff9b53c06f98",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T06:45:27.786401',\n",
+      " 'updated_at': '2025-07-09T06:45:27.786403',\n",
+      " 'id': 'eval-5icjZHn7q6YERCtUCA19bW',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-5Upwjn3JL3NPV8sGacQyWW',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T06:44:21.108017',\n",
+      "            'updated_at': '2025-07-09T06:44:21.108018',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'embed-target',\n",
+      "            'type': 'retriever',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': None,\n",
+      "            'retriever': {'pipeline': {'query_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-JsfNnspcixm9UusYGSD1iU',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T06:44:21.107756',\n",
+      "                                                                 'updated_at': '2025-07-09T06:44:21.107758',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-JsfNnspcixm9UusYGSD1iU',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'index_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-HKNVJY2a6K6nwJAaZ6VnzR',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T06:44:21.107825',\n",
+      "                                                                 'updated_at': '2025-07-09T06:44:21.107825',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-HKNVJY2a6K6nwJAaZ6VnzR',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'reranker_model': None,\n",
+      "                                       'top_k': 10},\n",
+      "                          'cached_outputs': None},\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-KsKTD5g5GbS4SGYS3VeGJJ',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T06:45:23.013022',\n",
+      "            'updated_at': '2025-07-09T06:45:23.013023',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'fiqa-config',\n",
+      "            'type': 'retriever',\n",
+      "            'params': None,\n",
+      "            'tasks': {'my-beir-task': {'type': 'beir',\n",
+      "                                       'params': None,\n",
+      "                                       'metrics': {'recall_5': {'type': 'recall_5',\n",
+      "                                                                'params': None},\n",
+      "                                                   'ndcg_cut_5': {'type': 'ndcg_cut_5',\n",
+      "                                                                  'params': None},\n",
+      "                                                   'recall_10': {'type': 'recall_10',\n",
+      "                                                                 'params': None},\n",
+      "                                                   'ndcg_cut_10': {'type': 'ndcg_cut_10',\n",
+      "                                                                   'params': None}},\n",
+      "                                       'dataset': {'schema_version': '1.0',\n",
+      "                                                   'id': 'dataset-WueQfPh9cr3WpU2ySmpf1k',\n",
+      "                                                   'description': None,\n",
+      "                                                   'type_prefix': None,\n",
+      "                                                   'namespace': 'default',\n",
+      "                                                   'project': None,\n",
+      "                                                   'created_at': '2025-07-09T06:45:23.012836',\n",
+      "                                                   'updated_at': '2025-07-09T06:45:23.012837',\n",
+      "                                                   'custom_fields': {},\n",
+      "                                                   'ownership': None,\n",
+      "                                                   'name': 'dataset-WueQfPh9cr3WpU2ySmpf1k',\n",
+      "                                                   'version_id': 'main',\n",
+      "                                                   'version_tags': [],\n",
+      "                                                   'format': None,\n",
+      "                                                   'files_url': 'file://fiqa/',\n",
+      "                                                   'hf_endpoint': None,\n",
+      "                                                   'split': None,\n",
+      "                                                   'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/embed-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/fiqa-config\"\n",
+    "}\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "embed_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "40676500-2842-4646-aa96-a10d59fd0183",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'my-beir-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "1cef2186-f3b7-4a97-ac39-8fa42441fa39",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'evaluation': {'scores': {'recall_10': {'value': 0.5984280594234299},\n",
+      "                           'ndcg_cut_10': {'value': 0.5280203494315917},\n",
+      "                           'ndcg_cut_5': {'value': 0.5054691655963462},\n",
+      "                           'recall_5': {'value': 0.5225860130952724}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_eval_job_id}/results\")\n",
+    "pp(resp.json()['groups']['evaluation']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a06bb86f-7bed-4eb4-af73-d1d5c487a53d",
+   "metadata": {},
+   "source": [
+    "### 5.4 Evaluate Embedding + Reranking Pipeline on FIQA Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "971b72ae-b8d1-473f-8e9f-13778b1958cf",
+   "metadata": {},
+   "source": [
+    "First, let's create a evaluation target for Embedding + Reranking"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "1ac9bde2-ca1c-49a9-9b85-d373a72e2c97",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target_payload = {\n",
+    "    \"type\": \"retriever\",\n",
+    "    \"name\": \"embed-rerank-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"retriever\": {\n",
+    "        \"pipeline\": {\n",
+    "            \"query_embedding_model\": {\n",
+    "                \"api_endpoint\": {\n",
+    "                    \"url\": embed_url,\n",
+    "                    \"model_id\": embed_model_name,\n",
+    "                }\n",
+    "         },\n",
+    "             \"index_embedding_model\": {\n",
+    "                \"api_endpoint\": {\n",
+    "                     \"url\": embed_url,\n",
+    "                     \"model_id\": embed_model_name,\n",
+    "                }\n",
+    "             },\n",
+    "            \"reranker_model\": {\n",
+    "                \"api_endpoint\": {\n",
+    "                     \"url\": rerank_url,\n",
+    "                     \"model_id\":rerank_model_name,\n",
+    "                }\n",
+    "            },\n",
+    "         \"top_k\": 10\n",
+    "       }\n",
+    "    }\n",
+    "}\n",
+    "resp = requests.post(target_url, json=target_payload, headers=headers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d872a20b-6531-44f2-bf75-b148976f0eb9",
+   "metadata": {},
+   "source": [
+    "Then we can launch the eval job for the embed + rerank pipeline on FIQA data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "5db765cf-1f83-46e1-b9e3-c33f0cc3de18",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T07:06:34.280669',\n",
+      " 'updated_at': '2025-07-09T07:06:34.280671',\n",
+      " 'id': 'eval-2FJVarertHH2SNA3mYaqmD',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-T4fCAFx1yhgGP3kBfEBzxb',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T07:06:31.817357',\n",
+      "            'updated_at': '2025-07-09T07:06:31.817358',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'embed-rerank-target',\n",
+      "            'type': 'retriever',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': None,\n",
+      "            'retriever': {'pipeline': {'query_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-AEyhZfkQZdQUw938n7Gw4q',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T07:06:31.816947',\n",
+      "                                                                 'updated_at': '2025-07-09T07:06:31.816949',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-AEyhZfkQZdQUw938n7Gw4q',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'index_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-C5tmcE6tZECDRHN3PuvEfk',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T07:06:31.816996',\n",
+      "                                                                 'updated_at': '2025-07-09T07:06:31.816996',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-C5tmcE6tZECDRHN3PuvEfk',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'reranker_model': {'schema_version': '1.0',\n",
+      "                                                          'id': 'model-LzMLhQFk6bzv4vFAdRNMok',\n",
+      "                                                          'description': None,\n",
+      "                                                          'type_prefix': 'model',\n",
+      "                                                          'namespace': 'default',\n",
+      "                                                          'project': None,\n",
+      "                                                          'created_at': '2025-07-09T07:06:31.817016',\n",
+      "                                                          'updated_at': '2025-07-09T07:06:31.817016',\n",
+      "                                                          'custom_fields': {},\n",
+      "                                                          'ownership': None,\n",
+      "                                                          'name': 'model-LzMLhQFk6bzv4vFAdRNMok',\n",
+      "                                                          'version_id': 'main',\n",
+      "                                                          'version_tags': [],\n",
+      "                                                          'spec': None,\n",
+      "                                                          'artifact': None,\n",
+      "                                                          'base_model': None,\n",
+      "                                                          'api_endpoint': {'url': 'http://172.27.20.120:8001/v1/ranking',\n",
+      "                                                                           'model_id': 'nvidia/llama-3.2-nv-rerankqa-1b-v2',\n",
+      "                                                                           'api_key': None,\n",
+      "                                                                           'format': 'nim'},\n",
+      "                                                          'peft': None,\n",
+      "                                                          'prompt': None,\n",
+      "                                                          'guardrails': None},\n",
+      "                                       'top_k': 10},\n",
+      "                          'cached_outputs': None},\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-KsKTD5g5GbS4SGYS3VeGJJ',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T06:45:23.013022',\n",
+      "            'updated_at': '2025-07-09T06:45:23.013023',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'fiqa-config',\n",
+      "            'type': 'retriever',\n",
+      "            'params': None,\n",
+      "            'tasks': {'my-beir-task': {'type': 'beir',\n",
+      "                                       'params': None,\n",
+      "                                       'metrics': {'recall_5': {'type': 'recall_5',\n",
+      "                                                                'params': None},\n",
+      "                                                   'ndcg_cut_5': {'type': 'ndcg_cut_5',\n",
+      "                                                                  'params': None},\n",
+      "                                                   'recall_10': {'type': 'recall_10',\n",
+      "                                                                 'params': None},\n",
+      "                                                   'ndcg_cut_10': {'type': 'ndcg_cut_10',\n",
+      "                                                                   'params': None}},\n",
+      "                                       'dataset': {'schema_version': '1.0',\n",
+      "                                                   'id': 'dataset-WueQfPh9cr3WpU2ySmpf1k',\n",
+      "                                                   'description': None,\n",
+      "                                                   'type_prefix': None,\n",
+      "                                                   'namespace': 'default',\n",
+      "                                                   'project': None,\n",
+      "                                                   'created_at': '2025-07-09T06:45:23.012836',\n",
+      "                                                   'updated_at': '2025-07-09T06:45:23.012837',\n",
+      "                                                   'custom_fields': {},\n",
+      "                                                   'ownership': None,\n",
+      "                                                   'name': 'dataset-WueQfPh9cr3WpU2ySmpf1k',\n",
+      "                                                   'version_id': 'main',\n",
+      "                                                   'version_tags': [],\n",
+      "                                                   'format': None,\n",
+      "                                                   'files_url': 'file://fiqa/',\n",
+      "                                                   'hf_endpoint': None,\n",
+      "                                                   'split': None,\n",
+      "                                                   'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/embed-rerank-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/fiqa-config\"\n",
+    "}\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "embed_rerank_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "e136f734-a435-4739-812f-5c4f129ca800",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'my-beir-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_rerank_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "886d5ee8-148b-4b74-9f18-d541929829e4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'evaluation': {'scores': {'recall_10': {'value': 0.5678767905619758},\n",
+      "                           'ndcg_cut_10': {'value': 0.5134364317202811},\n",
+      "                           'recall_5': {'value': 0.5151632575243686},\n",
+      "                           'ndcg_cut_5': {'value': 0.5011430756367373}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_rerank_eval_job_id}/results\")\n",
+    "pp(resp.json()['groups']['evaluation']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4d8c9f4-7800-4aa2-8cae-034521487348",
+   "metadata": {},
+   "source": [
+    "### 5.5 Evaluate Embedding + Reranking Pipeline on Custom Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fedb58f2-c57f-4e01-8c35-acb63f33a61a",
+   "metadata": {},
+   "source": [
+    "Upload the custom dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "00910388-e963-4185-b2f2-234cda7af55c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DATASET_NAME = \"rag_custom_data\"\n",
+    "repo_id = setup_dataset_repo(HF_API, NMS_NAMESPACE, DATASET_NAME, NEMO_URL)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "95295acc-3106-4ad6-a1e6-7e150f501c8a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "queries.jsonl: 100%|██████████| 16.7k/16.7k [00:00<00:00, 3.90MB/s]\n",
+      "corpus.jsonl: 100%|██████████| 11.0k/11.0k [00:00<00:00, 3.04MB/s]\n",
+      "test.tsv: 100%|██████████| 7.28k/7.28k [00:00<00:00, 1.79MB/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "CommitInfo(commit_url='', commit_message='Upload rag_data/qrels/test.tsv with huggingface_hub', commit_description='', oid='c901d2bb70290aa2d898d3f8102d732f774baaec', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "HF_API.upload_file(\n",
+    "    path_or_fileobj=\"./eval_dataset/retriever_and_rag/queries.jsonl\",\n",
+    "    path_in_repo=\"rag_data/queries.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")\n",
+    "\n",
+    "HF_API.upload_file(\n",
+    "    path_or_fileobj=\"./eval_dataset/retriever_and_rag/corpus.jsonl\",\n",
+    "    path_in_repo=\"rag_data/corpus.jsonl\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")\n",
+    "\n",
+    "HF_API.upload_file(\n",
+    "    path_or_fileobj=\"./eval_dataset/retriever_and_rag/qrels/test.tsv\",\n",
+    "    path_in_repo=\"rag_data/qrels/test.tsv\",\n",
+    "    repo_id=repo_id,\n",
+    "    repo_type='dataset',\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d68add56-cdf4-4a49-85ef-84c1ee75cc31",
+   "metadata": {},
+   "source": [
+    "Create config for the custom data for retriever eval:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "fa0dabef-bd41-4d5b-a75f-3170880a2c2e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "config_payload = {\n",
+    "    \"type\": \"retriever\",\n",
+    "    \"name\": \"custom-retriever-config\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"tasks\": {\n",
+    "        \"my-beir-task\": {\n",
+    "            \"type\": \"beir\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/rag_data\"\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"recall_5\": {\"type\": \"recall_5\"},\n",
+    "                \"ndcg_cut_5\": {\"type\": \"ndcg_cut_5\"},\n",
+    "                \"recall_10\": {\"type\": \"recall_10\"},\n",
+    "                \"ndcg_cut_10\": {\"type\": \"ndcg_cut_10\"}\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "resp = requests.post(config_url, json=config_payload, headers=headers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44efecd5-573e-43d3-9308-1ef14f50890d",
+   "metadata": {},
+   "source": [
+    "Launch the eval job on the embed + rerank pipeline:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "4548459b-d9de-42c2-b32a-8e4ab116cbe8",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T16:16:07.734774',\n",
+      " 'updated_at': '2025-07-09T16:16:07.734776',\n",
+      " 'id': 'eval-PeDSNf6jrxTyJnovR9ybd7',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-T4fCAFx1yhgGP3kBfEBzxb',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T07:06:31.817357',\n",
+      "            'updated_at': '2025-07-09T07:06:31.817358',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'embed-rerank-target',\n",
+      "            'type': 'retriever',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': None,\n",
+      "            'retriever': {'pipeline': {'query_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-AEyhZfkQZdQUw938n7Gw4q',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T07:06:31.816947',\n",
+      "                                                                 'updated_at': '2025-07-09T07:06:31.816949',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-AEyhZfkQZdQUw938n7Gw4q',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'index_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                 'id': 'model-C5tmcE6tZECDRHN3PuvEfk',\n",
+      "                                                                 'description': None,\n",
+      "                                                                 'type_prefix': 'model',\n",
+      "                                                                 'namespace': 'default',\n",
+      "                                                                 'project': None,\n",
+      "                                                                 'created_at': '2025-07-09T07:06:31.816996',\n",
+      "                                                                 'updated_at': '2025-07-09T07:06:31.816996',\n",
+      "                                                                 'custom_fields': {},\n",
+      "                                                                 'ownership': None,\n",
+      "                                                                 'name': 'model-C5tmcE6tZECDRHN3PuvEfk',\n",
+      "                                                                 'version_id': 'main',\n",
+      "                                                                 'version_tags': [],\n",
+      "                                                                 'spec': None,\n",
+      "                                                                 'artifact': None,\n",
+      "                                                                 'base_model': None,\n",
+      "                                                                 'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                  'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                  'api_key': None,\n",
+      "                                                                                  'format': 'nim'},\n",
+      "                                                                 'peft': None,\n",
+      "                                                                 'prompt': None,\n",
+      "                                                                 'guardrails': None},\n",
+      "                                       'reranker_model': {'schema_version': '1.0',\n",
+      "                                                          'id': 'model-LzMLhQFk6bzv4vFAdRNMok',\n",
+      "                                                          'description': None,\n",
+      "                                                          'type_prefix': 'model',\n",
+      "                                                          'namespace': 'default',\n",
+      "                                                          'project': None,\n",
+      "                                                          'created_at': '2025-07-09T07:06:31.817016',\n",
+      "                                                          'updated_at': '2025-07-09T07:06:31.817016',\n",
+      "                                                          'custom_fields': {},\n",
+      "                                                          'ownership': None,\n",
+      "                                                          'name': 'model-LzMLhQFk6bzv4vFAdRNMok',\n",
+      "                                                          'version_id': 'main',\n",
+      "                                                          'version_tags': [],\n",
+      "                                                          'spec': None,\n",
+      "                                                          'artifact': None,\n",
+      "                                                          'base_model': None,\n",
+      "                                                          'api_endpoint': {'url': 'http://172.27.20.120:8001/v1/ranking',\n",
+      "                                                                           'model_id': 'nvidia/llama-3.2-nv-rerankqa-1b-v2',\n",
+      "                                                                           'api_key': None,\n",
+      "                                                                           'format': 'nim'},\n",
+      "                                                          'peft': None,\n",
+      "                                                          'prompt': None,\n",
+      "                                                          'guardrails': None},\n",
+      "                                       'top_k': 10},\n",
+      "                          'cached_outputs': None},\n",
+      "            'rag': None,\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-NeZxti79VXY5KfXeFRsoB5',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T16:15:48.205922',\n",
+      "            'updated_at': '2025-07-09T16:15:48.205923',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'custom-retriever-config',\n",
+      "            'type': 'retriever',\n",
+      "            'params': None,\n",
+      "            'tasks': {'my-beir-task': {'type': 'beir',\n",
+      "                                       'params': None,\n",
+      "                                       'metrics': {'recall_5': {'type': 'recall_5',\n",
+      "                                                                'params': None},\n",
+      "                                                   'ndcg_cut_5': {'type': 'ndcg_cut_5',\n",
+      "                                                                  'params': None},\n",
+      "                                                   'recall_10': {'type': 'recall_10',\n",
+      "                                                                 'params': None},\n",
+      "                                                   'ndcg_cut_10': {'type': 'ndcg_cut_10',\n",
+      "                                                                   'params': None}},\n",
+      "                                       'dataset': {'schema_version': '1.0',\n",
+      "                                                   'id': 'dataset-RddzBeXffRmVKTxTvRWkjF',\n",
+      "                                                   'description': None,\n",
+      "                                                   'type_prefix': None,\n",
+      "                                                   'namespace': 'default',\n",
+      "                                                   'project': None,\n",
+      "                                                   'created_at': '2025-07-09T16:15:48.205720',\n",
+      "                                                   'updated_at': '2025-07-09T16:15:48.205722',\n",
+      "                                                   'custom_fields': {},\n",
+      "                                                   'ownership': None,\n",
+      "                                                   'name': 'dataset-RddzBeXffRmVKTxTvRWkjF',\n",
+      "                                                   'version_id': 'main',\n",
+      "                                                   'version_tags': [],\n",
+      "                                                   'format': None,\n",
+      "                                                   'files_url': 'hf://datasets/nemo-eval-tutorial/rag_custom_data/rag_data',\n",
+      "                                                   'hf_endpoint': None,\n",
+      "                                                   'split': None,\n",
+      "                                                   'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/embed-rerank-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/custom-retriever-config\"\n",
+    "}\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "embed_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "01dec2a3-9834-4454-892a-fe76bd45bcac",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'my-beir-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "6ac12f45-6730-41e2-8ba8-914a58d2f0d1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'evaluation': {'scores': {'ndcg_cut_5': {'value': 1.0},\n",
+      "                           'recall_10': {'value': 1.0},\n",
+      "                           'recall_5': {'value': 1.0},\n",
+      "                           'ndcg_cut_10': {'value': 1.0}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{embed_eval_job_id}/results\")\n",
+    "pp(resp.json()['groups']['evaluation']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f03f72f-d8a2-4b5d-8f24-b2518e65c90a",
+   "metadata": {},
+   "source": [
+    "## 6. RAG Pipeline Evaluation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "118064cf-10db-4054-9014-bf7698de2531",
+   "metadata": {},
+   "source": [
+    "### 6.1 Evaluate RAG Pipeline on NFCorpus Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "e8b6f569-e8a8-410f-9fad-45b60e072dc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target_payload = {\n",
+    "    \"type\": \"rag\",\n",
+    "    \"name\": \"rag-target\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"rag\": {\n",
+    "        \"pipeline\": {\n",
+    "            \"retriever\": {\n",
+    "                \"pipeline\": {\n",
+    "                     \"query_embedding_model\": {\n",
+    "                            \"api_endpoint\": {\n",
+    "                                \"url\": embed_url,\n",
+    "                                \"model_id\": embed_model_name\n",
+    "                           }\n",
+    "                     },\n",
+    "                    \"index_embedding_model\": {\n",
+    "                        \"api_endpoint\": {\n",
+    "                                \"url\": embed_url,\n",
+    "                                \"model_id\": embed_model_name\n",
+    "                        }\n",
+    "                    },\n",
+    "                    \"reranker_model\": {\n",
+    "                        \"api_endpoint\": {\n",
+    "                             \"url\": rerank_url,\n",
+    "                             \"model_id\": rerank_model_name,\n",
+    "                        }\n",
+    "                    },\n",
+    "                    \"top_k\": 3\n",
+    "                }\n",
+    "             },\n",
+    "            \"model\": {\n",
+    "                \"api_endpoint\": {\n",
+    "                     \"url\": llm_chat_completion_url,\n",
+    "                     \"model_id\": \"meta/llama-3.1-8b-instruct\"\n",
+    "                }\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "resp = requests.post(target_url, json=target_payload, headers=headers)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "404c3826-3388-4c59-b1a7-c284accdeedf",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T16:30:31.105191',\n",
+      " 'updated_at': '2025-07-09T16:30:31.105192',\n",
+      " 'name': 'rag-nfcorpus-config',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'rag',\n",
+      " 'tasks': {'my-beir-task': {'type': 'beir',\n",
+      "                            'params': {'judge_llm': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                      'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                      'api_key': '******'}},\n",
+      "                                       'judge_embeddings': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/embeddings',\n",
+      "                                                                             'model_id': 'nvidia/nv-embedqa-e5-v5',\n",
+      "                                                                             'api_key': '******'}},\n",
+      "                                       'judge_timeout': 300,\n",
+      "                                       'judge_max_retries': 5,\n",
+      "                                       'judge_max_workers': 16},\n",
+      "                            'metrics': {'recall_5': {'type': 'recall_5'},\n",
+      "                                        'ndcg_cut_5': {'type': 'ndcg_cut_5'},\n",
+      "                                        'recall_10': {'type': 'recall_10'},\n",
+      "                                        'ndcg_cut_10': {'type': 'ndcg_cut_10'},\n",
+      "                                        'faithfulness': {'type': 'faithfulness'},\n",
+      "                                        'answer_relevancy': {'type': 'answer_relevancy'}},\n",
+      "                            'dataset': {'schema_version': '1.0',\n",
+      "                                        'id': 'dataset-3Tgpmmzu7u4AUUrfppFhsN',\n",
+      "                                        'namespace': 'default',\n",
+      "                                        'created_at': '2025-07-09T16:30:31.105030',\n",
+      "                                        'updated_at': '2025-07-09T16:30:31.105032',\n",
+      "                                        'custom_fields': {},\n",
+      "                                        'name': 'dataset-3Tgpmmzu7u4AUUrfppFhsN',\n",
+      "                                        'version_id': 'main',\n",
+      "                                        'version_tags': [],\n",
+      "                                        'files_url': 'file://nfcorpus/'}}},\n",
+      " 'id': 'eval-config-Lsvsipxj1cEntDWsDbzJxL',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "config_payload = {\n",
+    "    \"type\": \"rag\",\n",
+    "    \"name\": \"rag-nfcorpus-config\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"tasks\": {\n",
+    "        \"my-beir-task\": {\n",
+    "            \"type\": \"beir\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": \"file://nfcorpus/\"\n",
+    "            },\n",
+    "            \"params\": {\n",
+    "                \"judge_llm\": {\n",
+    "                    \"api_endpoint\": {\n",
+    "                        \"url\": \"https://integrate.api.nvidia.com/v1/chat/completions\",\n",
+    "                        \"model_id\": \"meta/llama-3.3-70b-instruct\",\n",
+    "                        \"api_key\": os.environ['NVIDIA_API_KEY'],\n",
+    "                    }\n",
+    "                },\n",
+    "                \"judge_embeddings\": {\n",
+    "                    \"api_endpoint\": {\n",
+    "                        \"url\": \"https://integrate.api.nvidia.com/v1/embeddings\",\n",
+    "                        \"model_id\": \"nvidia/nv-embedqa-e5-v5\",\n",
+    "                        \"api_key\": os.environ['NVIDIA_API_KEY'],\n",
+    "                    }\n",
+    "                },\n",
+    "                \"judge_timeout\": 300,\n",
+    "                \"judge_max_retries\": 5,\n",
+    "                \"judge_max_workers\": 16\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"recall_5\": {\"type\": \"recall_5\"},\n",
+    "                \"ndcg_cut_5\": {\"type\": \"ndcg_cut_5\"},\n",
+    "                \"recall_10\": {\"type\": \"recall_10\"},\n",
+    "                \"ndcg_cut_10\": {\"type\": \"ndcg_cut_10\"},\n",
+    "                \"faithfulness\": {\"type\": \"faithfulness\"},\n",
+    "                \"answer_relevancy\": {\"type\": \"answer_relevancy\"}\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "resp = requests.post(config_url, json=config_payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "12783fe8-93d3-4d62-b802-cd859398d4b2",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T16:30:43.855021',\n",
+      " 'updated_at': '2025-07-09T16:30:43.855024',\n",
+      " 'id': 'eval-VrFzxRNEP1hhjrnUtP8Ug1',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-VTApJbSpoJBj8xQhgzxB4',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T16:29:23.468547',\n",
+      "            'updated_at': '2025-07-09T16:29:23.468547',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'rag-target',\n",
+      "            'type': 'rag',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': None,\n",
+      "            'retriever': None,\n",
+      "            'rag': {'pipeline': {'retriever': {'pipeline': {'query_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                                      'id': 'model-FbFHZ6sEf7pBhmSZj2eTa5',\n",
+      "                                                                                      'description': None,\n",
+      "                                                                                      'type_prefix': 'model',\n",
+      "                                                                                      'namespace': 'default',\n",
+      "                                                                                      'project': None,\n",
+      "                                                                                      'created_at': '2025-07-09T16:29:23.468212',\n",
+      "                                                                                      'updated_at': '2025-07-09T16:29:23.468213',\n",
+      "                                                                                      'custom_fields': {},\n",
+      "                                                                                      'ownership': None,\n",
+      "                                                                                      'name': 'model-FbFHZ6sEf7pBhmSZj2eTa5',\n",
+      "                                                                                      'version_id': 'main',\n",
+      "                                                                                      'version_tags': [],\n",
+      "                                                                                      'spec': None,\n",
+      "                                                                                      'artifact': None,\n",
+      "                                                                                      'base_model': None,\n",
+      "                                                                                      'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                                       'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                                       'api_key': None,\n",
+      "                                                                                                       'format': 'nim'},\n",
+      "                                                                                      'peft': None,\n",
+      "                                                                                      'prompt': None,\n",
+      "                                                                                      'guardrails': None},\n",
+      "                                                            'index_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                                      'id': 'model-BMxFNMbexdzBy1PgDyMB2f',\n",
+      "                                                                                      'description': None,\n",
+      "                                                                                      'type_prefix': 'model',\n",
+      "                                                                                      'namespace': 'default',\n",
+      "                                                                                      'project': None,\n",
+      "                                                                                      'created_at': '2025-07-09T16:29:23.468261',\n",
+      "                                                                                      'updated_at': '2025-07-09T16:29:23.468261',\n",
+      "                                                                                      'custom_fields': {},\n",
+      "                                                                                      'ownership': None,\n",
+      "                                                                                      'name': 'model-BMxFNMbexdzBy1PgDyMB2f',\n",
+      "                                                                                      'version_id': 'main',\n",
+      "                                                                                      'version_tags': [],\n",
+      "                                                                                      'spec': None,\n",
+      "                                                                                      'artifact': None,\n",
+      "                                                                                      'base_model': None,\n",
+      "                                                                                      'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                                       'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                                       'api_key': None,\n",
+      "                                                                                                       'format': 'nim'},\n",
+      "                                                                                      'peft': None,\n",
+      "                                                                                      'prompt': None,\n",
+      "                                                                                      'guardrails': None},\n",
+      "                                                            'reranker_model': {'schema_version': '1.0',\n",
+      "                                                                               'id': 'model-JVYRG5Nxwu24cxXevSD1P',\n",
+      "                                                                               'description': None,\n",
+      "                                                                               'type_prefix': 'model',\n",
+      "                                                                               'namespace': 'default',\n",
+      "                                                                               'project': None,\n",
+      "                                                                               'created_at': '2025-07-09T16:29:23.468282',\n",
+      "                                                                               'updated_at': '2025-07-09T16:29:23.468282',\n",
+      "                                                                               'custom_fields': {},\n",
+      "                                                                               'ownership': None,\n",
+      "                                                                               'name': 'model-JVYRG5Nxwu24cxXevSD1P',\n",
+      "                                                                               'version_id': 'main',\n",
+      "                                                                               'version_tags': [],\n",
+      "                                                                               'spec': None,\n",
+      "                                                                               'artifact': None,\n",
+      "                                                                               'base_model': None,\n",
+      "                                                                               'api_endpoint': {'url': 'http://172.27.20.120:8001/v1/ranking',\n",
+      "                                                                                                'model_id': 'nvidia/llama-3.2-nv-rerankqa-1b-v2',\n",
+      "                                                                                                'api_key': None,\n",
+      "                                                                                                'format': 'nim'},\n",
+      "                                                                               'peft': None,\n",
+      "                                                                               'prompt': None,\n",
+      "                                                                               'guardrails': None},\n",
+      "                                                            'top_k': 3},\n",
+      "                                               'cached_outputs': None},\n",
+      "                                 'model': {'schema_version': '1.0',\n",
+      "                                           'id': 'model-fWQ8215bWrwq9sdmSGnno',\n",
+      "                                           'description': None,\n",
+      "                                           'type_prefix': 'model',\n",
+      "                                           'namespace': 'default',\n",
+      "                                           'project': None,\n",
+      "                                           'created_at': '2025-07-09T16:29:23.468307',\n",
+      "                                           'updated_at': '2025-07-09T16:29:23.468307',\n",
+      "                                           'custom_fields': {},\n",
+      "                                           'ownership': None,\n",
+      "                                           'name': 'model-fWQ8215bWrwq9sdmSGnno',\n",
+      "                                           'version_id': 'main',\n",
+      "                                           'version_tags': [],\n",
+      "                                           'spec': None,\n",
+      "                                           'artifact': None,\n",
+      "                                           'base_model': None,\n",
+      "                                           'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                                            'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                                            'api_key': None,\n",
+      "                                                            'format': 'nim'},\n",
+      "                                           'peft': None,\n",
+      "                                           'prompt': None,\n",
+      "                                           'guardrails': None},\n",
+      "                                 'context_ordering': 'desc'},\n",
+      "                    'cached_outputs': None},\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-Lsvsipxj1cEntDWsDbzJxL',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T16:30:31.105191',\n",
+      "            'updated_at': '2025-07-09T16:30:31.105192',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'rag-nfcorpus-config',\n",
+      "            'type': 'rag',\n",
+      "            'params': None,\n",
+      "            'tasks': {'my-beir-task': {'type': 'beir',\n",
+      "                                       'params': {'judge_llm': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                                 'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                                 'api_key': '******'}},\n",
+      "                                                  'judge_embeddings': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/embeddings',\n",
+      "                                                                                        'model_id': 'nvidia/nv-embedqa-e5-v5',\n",
+      "                                                                                        'api_key': '******'}},\n",
+      "                                                  'judge_timeout': 300,\n",
+      "                                                  'judge_max_retries': 5,\n",
+      "                                                  'judge_max_workers': 16},\n",
+      "                                       'metrics': {'recall_5': {'type': 'recall_5',\n",
+      "                                                                'params': None},\n",
+      "                                                   'ndcg_cut_5': {'type': 'ndcg_cut_5',\n",
+      "                                                                  'params': None},\n",
+      "                                                   'recall_10': {'type': 'recall_10',\n",
+      "                                                                 'params': None},\n",
+      "                                                   'ndcg_cut_10': {'type': 'ndcg_cut_10',\n",
+      "                                                                   'params': None},\n",
+      "                                                   'faithfulness': {'type': 'faithfulness',\n",
+      "                                                                    'params': None},\n",
+      "                                                   'answer_relevancy': {'type': 'answer_relevancy',\n",
+      "                                                                        'params': None}},\n",
+      "                                       'dataset': {'schema_version': '1.0',\n",
+      "                                                   'id': 'dataset-3Tgpmmzu7u4AUUrfppFhsN',\n",
+      "                                                   'description': None,\n",
+      "                                                   'type_prefix': None,\n",
+      "                                                   'namespace': 'default',\n",
+      "                                                   'project': None,\n",
+      "                                                   'created_at': '2025-07-09T16:30:31.105030',\n",
+      "                                                   'updated_at': '2025-07-09T16:30:31.105032',\n",
+      "                                                   'custom_fields': {},\n",
+      "                                                   'ownership': None,\n",
+      "                                                   'name': 'dataset-3Tgpmmzu7u4AUUrfppFhsN',\n",
+      "                                                   'version_id': 'main',\n",
+      "                                                   'version_tags': [],\n",
+      "                                                   'format': None,\n",
+      "                                                   'files_url': 'file://nfcorpus/',\n",
+      "                                                   'hf_endpoint': None,\n",
+      "                                                   'split': None,\n",
+      "                                                   'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/rag-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/rag-nfcorpus-config\"\n",
+    "}\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "rag_nfcorpus_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "ed65bf45-83b7-447d-96da-998906b54c47",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'my-beir-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{rag_nfcorpus_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "a97a457a-2739-4903-a696-5a1ad66938bc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'evaluation': {'scores': {'ndcg_cut_10': {'value': 0.2679049189207276},\n",
+      "                           'recall_5': {'value': 0.11898776243879489},\n",
+      "                           'recall_10': {'value': 0.11898776243879489},\n",
+      "                           'ndcg_cut_5': {'value': 0.35399013412606123},\n",
+      "                           'faithfulness': {'value': 0.798750415118147},\n",
+      "                           'answer_relevancy': {'value': 0.38691755209414236}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{rag_nfcorpus_eval_job_id}/results\")\n",
+    "pp(resp.json()['groups']['evaluation']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f179b501-79e0-4261-af70-532b22b0b0d5",
+   "metadata": {},
+   "source": [
+    "### 6.2 Evaluate RAG Pipeline on Custom Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "075d2471-1cfc-41ba-a43a-affe2844b407",
+   "metadata": {},
+   "source": [
+    "Create evaluation config for RAG on custom dataset. We will use remote judge LLM and judge embedding model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "id": "a6a9d68c-3cdd-4faa-ac01-460ea37cdad0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T17:35:11.033333',\n",
+      " 'updated_at': '2025-07-09T17:35:11.033334',\n",
+      " 'name': 'custom-rag-config',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'type': 'rag',\n",
+      " 'tasks': {'rag-beir-task': {'type': 'beir',\n",
+      "                             'params': {'judge_llm': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                       'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                       'api_key': '******'}},\n",
+      "                                        'judge_embeddings': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/embeddings',\n",
+      "                                                                              'model_id': 'nvidia/nv-embedqa-e5-v5',\n",
+      "                                                                              'api_key': '******'}},\n",
+      "                                        'judge_timeout': 300,\n",
+      "                                        'judge_max_retries': 5,\n",
+      "                                        'judge_max_workers': 16},\n",
+      "                             'metrics': {'recall_5': {'type': 'recall_5'},\n",
+      "                                         'ndcg_cut_5': {'type': 'ndcg_cut_5'},\n",
+      "                                         'recall_10': {'type': 'recall_10'},\n",
+      "                                         'ndcg_cut_10': {'type': 'ndcg_cut_10'},\n",
+      "                                         'faithfulness': {'type': 'faithfulness'},\n",
+      "                                         'answer_relevancy': {'type': 'answer_relevancy'}},\n",
+      "                             'dataset': {'schema_version': '1.0',\n",
+      "                                         'id': 'dataset-4LzRVXqd494p5rSG5BeULA',\n",
+      "                                         'namespace': 'default',\n",
+      "                                         'created_at': '2025-07-09T17:35:11.033156',\n",
+      "                                         'updated_at': '2025-07-09T17:35:11.033158',\n",
+      "                                         'custom_fields': {},\n",
+      "                                         'name': 'dataset-4LzRVXqd494p5rSG5BeULA',\n",
+      "                                         'version_id': 'main',\n",
+      "                                         'version_tags': [],\n",
+      "                                         'files_url': 'hf://datasets/nemo-eval-tutorial/rag_custom_data/rag_data'}}},\n",
+      " 'id': 'eval-config-7NBFBYfrYxGgcdqo5M7agU',\n",
+      " 'custom_fields': {}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "config_payload = {\n",
+    "    \"type\": \"rag\",\n",
+    "    \"name\": \"custom-rag-config\",\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"tasks\": {\n",
+    "        \"rag-beir-task\": {\n",
+    "            \"type\": \"beir\",\n",
+    "            \"dataset\": {\n",
+    "                \"files_url\": f\"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}/rag_data\"\n",
+    "            },\n",
+    "            \"params\": {\n",
+    "                \"judge_llm\": {\n",
+    "                    \"api_endpoint\": {\n",
+    "                        \"url\": \"https://integrate.api.nvidia.com/v1/chat/completions\",\n",
+    "                        \"model_id\": \"meta/llama-3.3-70b-instruct\",\n",
+    "                        \"api_key\": os.environ['NVIDIA_API_KEY'],\n",
+    "                    }\n",
+    "                },\n",
+    "                \"judge_embeddings\": {\n",
+    "                    \"api_endpoint\": {\n",
+    "                        \"url\": \"https://integrate.api.nvidia.com/v1/embeddings\",\n",
+    "                        \"model_id\": \"nvidia/nv-embedqa-e5-v5\",\n",
+    "                        \"api_key\": os.environ['NVIDIA_API_KEY'],\n",
+    "                    }\n",
+    "                },\n",
+    "                \"judge_timeout\": 300,\n",
+    "                \"judge_max_retries\": 5,\n",
+    "                \"judge_max_workers\": 16\n",
+    "            },\n",
+    "            \"metrics\": {\n",
+    "                \"recall_5\": {\"type\": \"recall_5\"},\n",
+    "                \"ndcg_cut_5\": {\"type\": \"ndcg_cut_5\"},\n",
+    "                \"recall_10\": {\"type\": \"recall_10\"},\n",
+    "                \"ndcg_cut_10\": {\"type\": \"ndcg_cut_10\"},\n",
+    "                \"faithfulness\": {\"type\": \"faithfulness\"},\n",
+    "                \"answer_relevancy\": {\"type\": \"answer_relevancy\"}\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "resp = requests.post(config_url, json=config_payload, headers=headers)\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "id": "8eab031f-155c-4ad5-a44f-a1369fe00f28",
+   "metadata": {
+    "collapsed": true,
+    "jupyter": {
+     "outputs_hidden": true
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'created_at': '2025-07-09T17:35:43.792368',\n",
+      " 'updated_at': '2025-07-09T17:35:43.792370',\n",
+      " 'id': 'eval-NYudxKxqnNzbRk5k7CpJNr',\n",
+      " 'namespace': 'nemo-eval-tutorial',\n",
+      " 'description': None,\n",
+      " 'target': {'schema_version': '1.0',\n",
+      "            'id': 'eval-target-VTApJbSpoJBj8xQhgzxB4',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-target',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T16:29:23.468547',\n",
+      "            'updated_at': '2025-07-09T16:29:23.468547',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'rag-target',\n",
+      "            'type': 'rag',\n",
+      "            'cached_outputs': None,\n",
+      "            'model': None,\n",
+      "            'retriever': None,\n",
+      "            'rag': {'pipeline': {'retriever': {'pipeline': {'query_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                                      'id': 'model-FbFHZ6sEf7pBhmSZj2eTa5',\n",
+      "                                                                                      'description': None,\n",
+      "                                                                                      'type_prefix': 'model',\n",
+      "                                                                                      'namespace': 'default',\n",
+      "                                                                                      'project': None,\n",
+      "                                                                                      'created_at': '2025-07-09T16:29:23.468212',\n",
+      "                                                                                      'updated_at': '2025-07-09T16:29:23.468213',\n",
+      "                                                                                      'custom_fields': {},\n",
+      "                                                                                      'ownership': None,\n",
+      "                                                                                      'name': 'model-FbFHZ6sEf7pBhmSZj2eTa5',\n",
+      "                                                                                      'version_id': 'main',\n",
+      "                                                                                      'version_tags': [],\n",
+      "                                                                                      'spec': None,\n",
+      "                                                                                      'artifact': None,\n",
+      "                                                                                      'base_model': None,\n",
+      "                                                                                      'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                                       'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                                       'api_key': None,\n",
+      "                                                                                                       'format': 'nim'},\n",
+      "                                                                                      'peft': None,\n",
+      "                                                                                      'prompt': None,\n",
+      "                                                                                      'guardrails': None},\n",
+      "                                                            'index_embedding_model': {'schema_version': '1.0',\n",
+      "                                                                                      'id': 'model-BMxFNMbexdzBy1PgDyMB2f',\n",
+      "                                                                                      'description': None,\n",
+      "                                                                                      'type_prefix': 'model',\n",
+      "                                                                                      'namespace': 'default',\n",
+      "                                                                                      'project': None,\n",
+      "                                                                                      'created_at': '2025-07-09T16:29:23.468261',\n",
+      "                                                                                      'updated_at': '2025-07-09T16:29:23.468261',\n",
+      "                                                                                      'custom_fields': {},\n",
+      "                                                                                      'ownership': None,\n",
+      "                                                                                      'name': 'model-BMxFNMbexdzBy1PgDyMB2f',\n",
+      "                                                                                      'version_id': 'main',\n",
+      "                                                                                      'version_tags': [],\n",
+      "                                                                                      'spec': None,\n",
+      "                                                                                      'artifact': None,\n",
+      "                                                                                      'base_model': None,\n",
+      "                                                                                      'api_endpoint': {'url': 'http://172.27.20.120:8000/v1/embeddings',\n",
+      "                                                                                                       'model_id': 'nvidia/llama-3.2-nv-embedqa-1b-v2',\n",
+      "                                                                                                       'api_key': None,\n",
+      "                                                                                                       'format': 'nim'},\n",
+      "                                                                                      'peft': None,\n",
+      "                                                                                      'prompt': None,\n",
+      "                                                                                      'guardrails': None},\n",
+      "                                                            'reranker_model': {'schema_version': '1.0',\n",
+      "                                                                               'id': 'model-JVYRG5Nxwu24cxXevSD1P',\n",
+      "                                                                               'description': None,\n",
+      "                                                                               'type_prefix': 'model',\n",
+      "                                                                               'namespace': 'default',\n",
+      "                                                                               'project': None,\n",
+      "                                                                               'created_at': '2025-07-09T16:29:23.468282',\n",
+      "                                                                               'updated_at': '2025-07-09T16:29:23.468282',\n",
+      "                                                                               'custom_fields': {},\n",
+      "                                                                               'ownership': None,\n",
+      "                                                                               'name': 'model-JVYRG5Nxwu24cxXevSD1P',\n",
+      "                                                                               'version_id': 'main',\n",
+      "                                                                               'version_tags': [],\n",
+      "                                                                               'spec': None,\n",
+      "                                                                               'artifact': None,\n",
+      "                                                                               'base_model': None,\n",
+      "                                                                               'api_endpoint': {'url': 'http://172.27.20.120:8001/v1/ranking',\n",
+      "                                                                                                'model_id': 'nvidia/llama-3.2-nv-rerankqa-1b-v2',\n",
+      "                                                                                                'api_key': None,\n",
+      "                                                                                                'format': 'nim'},\n",
+      "                                                                               'peft': None,\n",
+      "                                                                               'prompt': None,\n",
+      "                                                                               'guardrails': None},\n",
+      "                                                            'top_k': 3},\n",
+      "                                               'cached_outputs': None},\n",
+      "                                 'model': {'schema_version': '1.0',\n",
+      "                                           'id': 'model-fWQ8215bWrwq9sdmSGnno',\n",
+      "                                           'description': None,\n",
+      "                                           'type_prefix': 'model',\n",
+      "                                           'namespace': 'default',\n",
+      "                                           'project': None,\n",
+      "                                           'created_at': '2025-07-09T16:29:23.468307',\n",
+      "                                           'updated_at': '2025-07-09T16:29:23.468307',\n",
+      "                                           'custom_fields': {},\n",
+      "                                           'ownership': None,\n",
+      "                                           'name': 'model-fWQ8215bWrwq9sdmSGnno',\n",
+      "                                           'version_id': 'main',\n",
+      "                                           'version_tags': [],\n",
+      "                                           'spec': None,\n",
+      "                                           'artifact': None,\n",
+      "                                           'base_model': None,\n",
+      "                                           'api_endpoint': {'url': 'http://nim.test/v1/chat/completions',\n",
+      "                                                            'model_id': 'meta/llama-3.1-8b-instruct',\n",
+      "                                                            'api_key': None,\n",
+      "                                                            'format': 'nim'},\n",
+      "                                           'peft': None,\n",
+      "                                           'prompt': None,\n",
+      "                                           'guardrails': None},\n",
+      "                                 'context_ordering': 'desc'},\n",
+      "                    'cached_outputs': None},\n",
+      "            'rows': None,\n",
+      "            'dataset': None},\n",
+      " 'config': {'schema_version': '1.0',\n",
+      "            'id': 'eval-config-7NBFBYfrYxGgcdqo5M7agU',\n",
+      "            'description': None,\n",
+      "            'type_prefix': 'eval-config',\n",
+      "            'namespace': 'nemo-eval-tutorial',\n",
+      "            'project': None,\n",
+      "            'created_at': '2025-07-09T17:35:11.033333',\n",
+      "            'updated_at': '2025-07-09T17:35:11.033334',\n",
+      "            'custom_fields': {},\n",
+      "            'ownership': None,\n",
+      "            'name': 'custom-rag-config',\n",
+      "            'type': 'rag',\n",
+      "            'params': None,\n",
+      "            'tasks': {'rag-beir-task': {'type': 'beir',\n",
+      "                                        'params': {'judge_llm': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/chat/completions',\n",
+      "                                                                                  'model_id': 'meta/llama-3.3-70b-instruct',\n",
+      "                                                                                  'api_key': '******'}},\n",
+      "                                                   'judge_embeddings': {'api_endpoint': {'url': 'https://integrate.api.nvidia.com/v1/embeddings',\n",
+      "                                                                                         'model_id': 'nvidia/nv-embedqa-e5-v5',\n",
+      "                                                                                         'api_key': '******'}},\n",
+      "                                                   'judge_timeout': 300,\n",
+      "                                                   'judge_max_retries': 5,\n",
+      "                                                   'judge_max_workers': 16},\n",
+      "                                        'metrics': {'recall_5': {'type': 'recall_5',\n",
+      "                                                                 'params': None},\n",
+      "                                                    'ndcg_cut_5': {'type': 'ndcg_cut_5',\n",
+      "                                                                   'params': None},\n",
+      "                                                    'recall_10': {'type': 'recall_10',\n",
+      "                                                                  'params': None},\n",
+      "                                                    'ndcg_cut_10': {'type': 'ndcg_cut_10',\n",
+      "                                                                    'params': None},\n",
+      "                                                    'faithfulness': {'type': 'faithfulness',\n",
+      "                                                                     'params': None},\n",
+      "                                                    'answer_relevancy': {'type': 'answer_relevancy',\n",
+      "                                                                         'params': None}},\n",
+      "                                        'dataset': {'schema_version': '1.0',\n",
+      "                                                    'id': 'dataset-4LzRVXqd494p5rSG5BeULA',\n",
+      "                                                    'description': None,\n",
+      "                                                    'type_prefix': None,\n",
+      "                                                    'namespace': 'default',\n",
+      "                                                    'project': None,\n",
+      "                                                    'created_at': '2025-07-09T17:35:11.033156',\n",
+      "                                                    'updated_at': '2025-07-09T17:35:11.033158',\n",
+      "                                                    'custom_fields': {},\n",
+      "                                                    'ownership': None,\n",
+      "                                                    'name': 'dataset-4LzRVXqd494p5rSG5BeULA',\n",
+      "                                                    'version_id': 'main',\n",
+      "                                                    'version_tags': [],\n",
+      "                                                    'format': None,\n",
+      "                                                    'files_url': 'hf://datasets/nemo-eval-tutorial/rag_custom_data/rag_data',\n",
+      "                                                    'hf_endpoint': None,\n",
+      "                                                    'split': None,\n",
+      "                                                    'limit': None}}},\n",
+      "            'groups': None},\n",
+      " 'result': None,\n",
+      " 'output_files_url': None,\n",
+      " 'status_details': {'message': None, 'task_status': {}, 'progress': None},\n",
+      " 'status': 'created',\n",
+      " 'project': None,\n",
+      " 'custom_fields': {},\n",
+      " 'ownership': None}\n"
+     ]
+    }
+   ],
+   "source": [
+    "payload = {\n",
+    "    \"namespace\": NMS_NAMESPACE,\n",
+    "    \"target\": f\"{NMS_NAMESPACE}/rag-target\",\n",
+    "    \"config\": f\"{NMS_NAMESPACE}/custom-rag-config\"\n",
+    "}\n",
+    "resp = requests.post(job_url, json=payload, headers=headers)\n",
+    "custom_rag_eval_job_id = resp.json()[\"id\"]\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "bbff7c0c-b4a1-4255-bd75-0344c993c902",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'message': 'Job completed successfully',\n",
+      " 'task_status': {'rag-beir-task': 'completed'},\n",
+      " 'progress': 100.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{custom_rag_eval_job_id}/status\")\n",
+    "pp(resp.json())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "93df7226-f474-4b05-8203-b886db01197a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'evaluation': {'scores': {'ndcg_cut_10': {'value': 1.0},\n",
+      "                           'recall_10': {'value': 1.0},\n",
+      "                           'ndcg_cut_5': {'value': 1.0},\n",
+      "                           'recall_5': {'value': 1.0},\n",
+      "                           'faithfulness': {'value': 0.804586038961039},\n",
+      "                           'answer_relevancy': {'value': 0.5430273571543547}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "resp = requests.get(f\"{NEMO_URL}/v1/evaluation/jobs/{custom_rag_eval_job_id}/results\")\n",
+    "pp(resp.json()['groups']['evaluation']['metrics'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b8ba9c56-5187-4b33-9b9b-284c9ddfcfa0",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/nemo/Evaluator/eval_dataset/agent_data/agent_goal_data.jsonl b/nemo/Evaluator/eval_dataset/agent_data/agent_goal_data.jsonl
new file mode 100644
index 000000000..f1c8d88dc
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/agent_data/agent_goal_data.jsonl
@@ -0,0 +1,2 @@
+{"user_input":[{"content":"Hey, book a table at the nearest best Chinese restaurant for 8:00pm","role":"user"},{"content":"Sure, let me find the best options for you.","role":"assistant","tool_calls":[{"name":"restaurant_search","args":{"cuisine":"Chinese","time":"8:00pm"}}]},{"content":"Found a few options: 1. Golden Dragon, 2. Jade Palace","role":"tool"},{"content":"I found some great options: Golden Dragon and Jade Palace. Which one would you prefer?","role":"assistant"},{"content":"Let's go with Golden Dragon.","role":"user"},{"content":"Great choice! I'll book a table for 8:00pm at Golden Dragon.","role":"assistant","tool_calls":[{"name":"restaurant_book","args":{"name":"Golden Dragon","time":"8:00pm"}}]},{"content":"Table booked at Golden Dragon for 8:00pm.","role":"tool"},{"content":"Your table at Golden Dragon is booked for 8:00pm. Enjoy your meal!","role":"assistant"},{"content":"thanks","role":"user"}],"reference":"Table booked at one of the chinese restaurants at 8 pm"}
+{"user_input":[{"content":"Hey, book a table at the nearest Indian restaurant for 8:00pm","role":"user"},{"content":"Sure, let me find the best options for you.","role":"assistant","tool_calls":[{"name":"restaurant_search","args":{"cuisine":"Indian","time":"8:00pm"}}]},{"content":"Found a few options: 1. Icca, 2. Sagum","role":"tool"},{"content":"I found some great options: Icca and Sagum. Which one would you prefer?","role":"assistant"},{"content":"Let's go with Icca.","role":"user"},{"content":"Great choice! I'll book a table for 8:00pm at Icca.","role":"assistant","tool_calls":[{"name":"restaurant_book","args":{"name":"Icca","time":"8:00pm"}}]},{"content":"Table booked at Icca for 8:00pm.","role":"tool"},{"content":"Your table at Icca is booked for 8:00pm. Enjoy your meal!","role":"assistant"},{"content":"thanks","role":"user"}],"reference":"Table booked at one of the Indian restaurants at 8 pm"}
\ No newline at end of file
diff --git a/nemo/Evaluator/eval_dataset/agent_data/agent_tool_call_data.jsonl b/nemo/Evaluator/eval_dataset/agent_data/agent_tool_call_data.jsonl
new file mode 100644
index 000000000..9d450cd83
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/agent_data/agent_tool_call_data.jsonl
@@ -0,0 +1,2 @@
+{"user_input":[{"content":"What's the weather like in New York right now?","type":"human"},{"content":"The current temperature in New York is 75°F and it's partly cloudy.","type":"ai","tool_calls":[{"name":"weather_check","args":{"location":"New York"}}]},{"content":"Can you translate that to Celsius?","type":"human"},{"content":"Let me convert that to Celsius for you.","type":"ai","tool_calls":[{"name":"temperature_conversion","args":{"temperature_fahrenheit":75}}]},{"content":"75°F is approximately 23.9°C.","type":"tool"},{"content":"75°F is approximately 23.9°C.","type":"ai"}],"reference_tool_calls":[{"name":"weather_check","args":{"location":"New York"}},{"name":"temperature_conversion","args":{"temperature_fahrenheit":75}}]}
+{"user_input":[{"content":"What's the weather like in Chicago right now?","type":"human"},{"content":"Let me check the current weather in Chicago.","type":"ai","tool_calls":[{"name":"weather_check","args":{"location":"Chicago"}}]},{"content":"Yes, it's currently raining in Seattle with a temperature of 60°F.","type":"tool"},{"content":"It's currently raining in Chicago with a temperature of 60°F.","type":"ai"},{"content":"How much is that in Celsius?","type":"human"},{"content":"Converting 60°F to Celsius...","type":"ai","tool_calls":[{"name":"temperature_conversion","args":{"temperature_fahrenheit":60}}]},{"content":"60°F is about 15.6°C.","type":"tool"},{"content":"60°F is about 15.6°C.","type":"ai"}],"reference_tool_calls":[{"name":"weather_check","args":{"location":"Chicago"}},{"name":"temperature_conversion","args":{"temperature_fahrenheit":60}}]}
\ No newline at end of file
diff --git a/nemo/Evaluator/eval_dataset/llm_judge_data/math_dataset.csv b/nemo/Evaluator/eval_dataset/llm_judge_data/math_dataset.csv
new file mode 100644
index 000000000..4e4f69a21
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/llm_judge_data/math_dataset.csv
@@ -0,0 +1,5 @@
+id,question,answer,reference_answer
+0,What is 4+8?,12,The answer is 12
+1,Square root of 144?,12,The answer is 12
+2,Cube of 2?,8,The answer is 8
+3,What is 6*7?,42,The answer is 42
\ No newline at end of file
diff --git a/nemo/Evaluator/eval_dataset/retriever_and_rag/corpus.jsonl b/nemo/Evaluator/eval_dataset/retriever_and_rag/corpus.jsonl
new file mode 100644
index 000000000..e43cba7f5
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/retriever_and_rag/corpus.jsonl
@@ -0,0 +1,20 @@
+{"_id": "doc1", "title": "Eiffel Tower - A French Icon", "text": "The Eiffel Tower is an iconic landmark of Paris, France. It was designed by the engineer Gustave Eiffel and built for the 1889 Exposition Universelle (World's Fair) to celebrate the 100th anniversary of the French Revolution.", "metadata": {}}
+{"_id": "doc2", "title": "The Great Wall of China - Ancient Protection", "text": "The Great Wall of China is a series of fortifications made of stone, brick, tamped earth, wood, and other materials, generally built along an east-to-west line across the historical northern borders of China to protect the Chinese states and empires against raids and invasions from various nomadic groups.", "metadata": {}}
+{"_id": "doc3", "title": "Taj Mahal - A Symbol of Love", "text": "The Taj Mahal is an ivory-white marble mausoleum on the south bank of the Yamuna river in the Indian city of Agra. It was commissioned in 1632 by the Mughal emperor, Shah Jahan (reigned 1628\u20131658), to house the tomb of his favourite wife, Mumtaz Mahal.", "metadata": {}}
+{"_id": "doc4", "title": "Machu Picchu - Lost City of the Incas", "text": "Machu Picchu is a 15th-century Inca citadel situated on a mountain ridge above the Sacred Valley in Peru. It is the most famous icon of Inca civilization, known for its sophisticated dry-stone walls that fuse huge blocks without the use of mortar.", "metadata": {}}
+{"_id": "doc5", "title": "The Colosseum - Ancient Roman Architecture", "text": "The Colosseum, also known as the Flavian Amphitheatre, is an oval amphitheatre located in the centre of the city of Rome, Italy. Built of concrete and sand, it is the largest amphitheatre ever built and is considered one of the greatest works of Roman architecture and engineering.", "metadata": {}}
+{"_id": "doc6", "title": "Petra - Ancient City of Jordan", "text": "Petra is a historical and archaeological city in southern Jordan. The city is famous for its rock-cut architecture and water conduit system. Another name for Petra is the Rose City due to the color of the stone out of which it is carved.", "metadata": {}}
+{"_id": "doc7", "title": "Chichen Itza - Ancient Mayan City", "text": "The Chichen Itza is a large pre-Columbian city built by the Maya civilization. The ruins of Chichen Itza are located in the northern center of the Yucatan Peninsula, in the Mexican state of Yucatan.", "metadata": {}}
+{"_id": "doc8", "title": "Pyramids of Giza - Ancient Egyptian Wonders", "text": "The Pyramids of Giza are the largest Egyptian pyramids, located on the Giza plateau on the outskirts of Cairo, Egypt. The pyramids, built as tombs for Pharaohs, are the only surviving structures of the Seven Wonders of the Ancient World.", "metadata": {}}
+{"_id": "doc9", "title": "Sistine Chapel - Renaissance Masterpiece", "text": "The Sistine Chapel is a chapel in the Apostolic Palace, the official residence of the Pope, in Vatican City. It is famous for its architecture and its decoration that was painted by various most famous artists of the High Renaissance, including Michelangelo.", "metadata": {}}
+{"_id": "doc10", "title": "Grand Canyon - Natural Wonder of the World", "text": "The Grand Canyon is a steep-sided canyon carved by the Colorado River in Arizona, United States. The Grand Canyon is 277 miles (446 km) long, up to 18 miles (29 km) wide and over a mile (1.6 km) deep.", "metadata": {}}
+{"_id": "doc11", "title": "Christopher Lloyd", "text": "Christopher Allen Lloyd (born October 22, 1938)[1] is an American actor, voice actor, and comedian. He is best known for his roles as Emmett \"Doc\" Brown in the Back to the Future trilogy, Judge Doom in Who Framed Roger Rabbit (1988), Merlock the Magician in DuckTales the Movie: Treasure of the Lost Lamp (1990), Uncle Fester in The Addams Family (1991) and its sequel Addams Family Values (1993), and Grigori Rasputin in Anastasia (1997).", "metadata": {}}
+{"_id": "doc12", "title": "Tobacco advertising", "text": "After World War II, cigarette companies advertised frequently on television programs. To combat this move by the cigarette companies, the Federal Communications Commission required television stations to air anti-smoking advertisements at no cost to the organizations providing such advertisements. In 1970, Congress took their anti-smoking initiative one step further and passed the Public Health Cigarette Smoking Act, banning the advertising of cigarettes on television and radio starting on January 2, 1971. After the television ban, most cigarette advertising took place in magazines, newspapers, and on billboards. However, in 1999 all cigarette billboard advertisements were replaced with anti-smoking messages, with some of these anti-smoking messages playing parodies of cigarette companies advertising figures and slogans. Since 1984, cigarette companies have also been forced to place Surgeon's General warnings on all cigarette packs and advertisements because of the passing of the Federal Cigarette Labeling and Advertising Act.[8] Restrictions on cigarette companies became even tighter in 2010 with the Family Smoking Prevention and Tobacco Control Act. The act prohibits tobacco companies from sponsoring sports, music, and other cultural events and also prevents the display of their logos or products on T-shirts, hats, or other apparel.[9] The constitutionality of both this act and the Food and Drug Administration's new graphic cigarette warning labels are being questioned under cigarette companies' first amendment rights.[10]", "metadata": {}}
+{"_id": "doc13", "title": "Beti Bachao, Beti Padhao Yojana", "text": "In 26 August 2016, Olympics 2016 bronze medallist Sakshi Malik was made brand ambassador for BBBP.[7]", "metadata": {}}
+{"_id": "doc14", "title": "List of Prison Break episodes", "text": "The first five seasons of Prison Break have been released on DVD and Blu-ray in Regions 1, 2, and 4. Each DVD boxed set includes all of the broadcast episodes from that season, the associated special episode, commentary from cast and crew, and profiles of various parts of Prison Break, such as Fox River State Penitentiary or the tattoo.[4][5][6][7] Prison Break is also available online, including iTunes, Amazon Video, and Netflix.[8][9] After the premiere of the second season of Prison Break, Fox began online streaming of the prior week's episode, though it originally restricted viewing to the United States.[10]", "metadata": {}}
+{"_id": "doc15", "title": "Atomic bombings of Hiroshima and Nagasaki", "text": "By August 1945, the Allies' Manhattan Project had produced two types of atomic bomb, and the 509th Composite Group of the United States Army Air Forces (USAAF) was equipped with the specialized Silverplate version of the Boeing B-29 Superfortress that could deliver them from Tinian in the Mariana Islands. Orders for atomic bombs to be used on four Japanese cities were issued on July 25. On August 6, one of its B-29s dropped a Little Boy uranium gun-type bomb on Hiroshima. Three days later, on August 9, a Fat Man plutonium implosion-type bomb was dropped by another B-29 on Nagasaki. The bombs immediately devastated their targets. Over the next two to four months, the acute effects of the atomic bombings killed 90,000\u2013146,000 people in Hiroshima and 39,000\u201380,000 people in Nagasaki; roughly half of the deaths in each city occurred on the first day. Large numbers of people continued to die from the effects of burns, radiation sickness, and other injuries, compounded by illness and malnutrition, for many months afterward. In both cities, most of the dead were civilians, although Hiroshima had a sizable military garrison.", "metadata": {}}
+{"_id": "doc16", "title": "Two-party system", "text": "There is general agreement that the United States has a two-party system; historically, there have been few instances in which third party candidates won an election. In the First Party System, only Alexander Hamilton's Federalist Party and Thomas Jefferson's Democratic-Republican Party were significant political parties. Toward the end of the First Party System, the Republicans dominated a one-party system (primarily under the Presidency of James Monroe). Under the Second Party System, the Democratic-Republican Party split during the election of 1824 into Adams' Men and Jackson's Men. In 1828, the modern Democratic Party formed in support of Andrew Jackson. The National Republicans were formed in support of John Quincy Adams. After the National Republicans collapsed, the Whig Party and the Free Soil Party quickly formed and collapsed. In 1854, the modern Republican Party formed from a loose coalition of former Whigs, Free Soilers and other anti-slavery activists. Abraham Lincoln became the first Republican president in 1860.", "metadata": {}}
+{"_id": "doc17", "title": "Equilateral triangle", "text": "An equilateral triangle is the most symmetrical triangle, having 3 lines of reflection and rotational symmetry of order 3 about its center. Its symmetry group is the dihedral group of order 6 D3.", "metadata": {}}
+{"_id": "doc18", "title": "The O.C. (season 4)", "text": "The fourth and final season of The O.C., an American teen drama television series, aired in the United States from November 2, 2006 to February 22, 2007 and consisted of sixteen episodes. The O.C's final season aired Thursdays at 9:00\u00a0p.m. ET in the United States on Fox, a terrestrial television network.[1] Fox tested a new timeslot by airing the second episode on Wednesday at 9:00\u00a0p.m. ET against ABC's Lost and CBS's Criminal Minds.[2] With Lost about to enter a three-month hiatus, Fox hoped that changing the timeslot for The O.C. would attract back viewers it had lost since the end of the previous season; however, the move was unsuccessful and the show returned to its Thursday timeslot.[3]", "metadata": {}}
+{"_id": "doc19", "title": "Crossing the Bar", "text": "The extended metaphor of \"crossing of bar\" represents traveling serenely and securely from life through death. The Pilot is a metaphor for God, whom the speaker hopes to meet face to face. Tennyson explained, \"The Pilot has been on board all the while, but in the dark I have not seen him\u00e2\u20ac\u00a6[He is] that Divine and Unseen Who is always guiding us.\"[1]", "metadata": {}}
+{"_id": "doc20", "title": "Protectionism", "text": "Protectionism is the economic policy of restricting imports from other countries through methods such as tariffs on imported goods, import quotas, and a variety of other government regulations. Proponents claim that protectionist policies shield the producers, businesses, and workers of the import-competing sector in the country from foreign competitors. However, they also reduce trade and adversely affect consumers in general (by raising the cost of imported goods), and harm the producers and workers in export sectors, both in the country implementing protectionist policies, and in the countries protected against.", "metadata": {}}
\ No newline at end of file
diff --git a/nemo/Evaluator/eval_dataset/retriever_and_rag/qrels/test.tsv b/nemo/Evaluator/eval_dataset/retriever_and_rag/qrels/test.tsv
new file mode 100644
index 000000000..5be0bac96
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/retriever_and_rag/qrels/test.tsv
@@ -0,0 +1,101 @@
+query-id	corpus-id	score
+d4791e95319ed5799fe2e42e52bc947608626cf54828725cc98e19a490e31bb8	doc1	1
+c6329f64c45dceca6c4bd99dcd9d12e41dfe1cd8f654616ab319e9e9e31b8e99	doc1	1
+4d7474ae36a0a7a8555dd42c863e51e2f2c25614391876a65695cbfcfa36637c	doc1	1
+ba9889ab0567d3a7f477d12370efaaced2f77282fb321198eb0f4eadb13f1b27	doc1	1
+9e4a4c17c17223eae9b10749486319eca51fe04364409454c147a5613e2c3e98	doc1	1
+9988a93f1d281f8a4f109a63f08f4678ec4625375250901e14761bd394e5381a	doc2	1
+239dec3fb00a7d424e21b55a3b55b5877951c7ef6450d2d4de8fdc5cf39424fd	doc2	1
+672ec0ce4c048890428b405cf5a5cf2fb62c37da11eb32a5a5666786c9848e2f	doc2	1
+bc62643829abe4184f204299034d98ae936a3711b8fbb93f607f7eb44e4962b8	doc2	1
+f91cd8f9315b982471907638dc60f846bd54fbdc34db9ae6750395aedae24352	doc2	1
+4544bce08ed71d23edebf19ef59a86ce3db295cb9546291e262495944aedf616	doc3	1
+611bb43ce3321561149d1f513cca46d8a7bdb8c11c32e3cbdbaa6ad373592ebf	doc3	1
+9d476a2f512e7b758fafe9504dfcf2bc6e6794ee85b6fca2c530a6a9c60f0832	doc3	1
+801e8214b018bd8e0aac7e8631d7a82face81d483fbe3d6009d9278a946d7f68	doc3	1
+ea27f5a45db1ba616513dc69654d1975fcd60831eb87629b068c2333b2d4170a	doc3	1
+719cb39d8508cecd912286496cacf63c4e40b92fb19514738b591dcfc00de93f	doc4	1
+c301a19a6e44e7dbcf4199b795e268c12cb03980ba7dd0c1b0987e4124273fc8	doc4	1
+861cbf8c481828629aece860b032d73b15fa24ec32c7ca56f50463ed361fc226	doc4	1
+a8beced7938da062f56f99cb5aa0f591653f5604a45460978d79a3c7c4c4d1af	doc4	1
+8439305d8e992d0382bfd844ec1112f8e6fa74a06c6675a3a5df8ec283768558	doc4	1
+463d8b1341734939d23575b487076168f9d3189a3c8caf71d2102b3b260c2f35	doc5	1
+98968d1bf294786eb1b16249ae84e9c5bd61f8b2d198ffaece756c040c6fb920	doc5	1
+cef3453c377c4212cfe324f88c03d10a0f9adf39ef5414c9010e3fee616d2e16	doc5	1
+b054567594dc05b28e7a9285874a42b34d4975b3faa11831576223fa7acdeda8	doc5	1
+8fa174b629b8c6e50f3a76d8fdfcc7902f575ffa12624be813d5bfbd9f767831	doc5	1
+100507878129d6b20eaa8150a72bce36d4a60d08d5fc1aedf9d59054bce20da6	doc6	1
+903e62c8d08b3f58252ccda641d90ccb80bea056c56489bf8f0bb8e800fd13d0	doc6	1
+abd53764de3136706f539beb77d0d39c69e6e1617affd6822d1d61e5cb7ccc69	doc6	1
+482d31f3d23498a496719d5d461f86407a93ed344d922d045afb327dee0e2800	doc6	1
+bc29052d954a0821ab6221dae7cdadb471d2f02f5672aca529c5e077e4620857	doc6	1
+1f2cac557fd69325838a58f0402d7668092c444a2c5afef49e169db144668739	doc7	1
+32fe21d4640076008527e1318c3760e30317ba2d6f5196a00a738c7ddaaf637e	doc7	1
+4d951731a1df580f54afbb4ec2f9095c0e4133d490b061630e8c82a6baa457f2	doc7	1
+63697d4c54f7c5c6e2aa9eefbf561354f28cf842287550ce7e6b80a23b44c508	doc7	1
+383641b56f501cf1b4bca7512dc78e9df4b6208c776bb64cd910acaaa05a03fb	doc7	1
+4ab9b1148552aa354a95712e3bb78fa7625151e9727c22a9716eee55c8cd1ca5	doc8	1
+8a0f32e2bdda9140025f64fb77d82b8ed2c67b971cb4c44432ca1847a3e33ebc	doc8	1
+f3bf2d4d988d35bcd34440f4df27d81e705aa1e4250a44e1573f2426e4c672c9	doc8	1
+5f87ea96710eeadd6b9fce4b8fa0a31c4e4ad0c4deefa1d4f03c003b812ca35c	doc8	1
+4006d3300b1e5e882bde97c7a38cd57f11e3bea840c5d02c5106b2877eada5af	doc8	1
+0492bed59f5f2f477031319746855c36d341367dd0981cd1def258d4f6f233b2	doc9	1
+e24a3dc0487600b38b9224ff5eab454a3434cab46be17557967c63bf1d71bc41	doc9	1
+1bfc6624fca71593956762639a30f954c022e45161a59d7aa2c387451fb2628d	doc9	1
+1ca5ffac6744e0320a1d7d7d5ab6c106321a2a5ca950072faa330ea8849c5323	doc9	1
+5c4bc852ee60333f3dbe85c4a973da86767e84f3a696aba84ce093ef7ae1db07	doc9	1
+f8744a94668898ea7a9af97d04c870d73f9718a569f5c4ba0a0b46387c132924	doc10	1
+668a0071474dae0a9a07cc97322c8c43666c3d1dbda400527842d5313f85f81d	doc10	1
+212976ea90e81e91a3eb1032dbad007ae21ab3e36d3a197f470191651d8554a1	doc10	1
+ab31a5f5da3eb9b30e4597c8f2eeded3f4403c1372b6f8a74f7fd14f7b4ecd61	doc10	1
+e5dbb3daaf04133c91aee4ae5c23ad3eb1757a09bb476813ba1b8d744ec86853	doc10	1
+d6006420dc385e72a2e8eda25542df449150578a282fda6663269b021b66098c	doc11	1
+5a73841ef94dfaf35f5ac1b3fb6ba819cff3adeba1f45c9e1e3bdb683858577c	doc11	1
+8b861544082204ba43b1a68d3e1fc9d59ad248b728574706d74afbd40192f040	doc11	1
+c4b0c28cc818b06c14fd1135df72379ce5cf2631a490fcfa219935191ba08b84	doc11	1
+42a3f8e0d64feee3603b181022a106a97d71344720b923bc1f90af78964dd493	doc11	1
+e2836ccb6afb375121364f988119e252fae23780ee137f145de6da21ab53dafb	doc12	1
+d1e3c12606ef7743f57751e47e8e404ff2763f290121e393758d71f4a3aa9ca7	doc12	1
+1e5c00b6dc0b3efd1bcf172c4e310142ea3bdc44adf9855c7e6016c5e290fbdf	doc12	1
+b753b1f7764f19fb207c491283bfeb26c47d3c55ec946428cf76772ae36506d2	doc12	1
+af1aae944e5f4ccf2a95101c2cdf2ac638a72bda7fe88872cd5955c56996c48e	doc12	1
+c15597cc9a8f95e6e1353309e69590cc2d9ebcab1a999ce2b4e7e0a2493d38bf	doc13	1
+9ba6e043d95e03e67bfbf5099c7de9e6c4089679d3a72ae025c895e84b7ce45d	doc13	1
+5e546550dbec16a3f4cbe2ac034e5a71cbe80dad360091456d0b344ecc2d3153	doc13	1
+4dcbc0c9666769faf634cb3c5a8539f09d09774a702f70662b72ee84260c6a9a	doc13	1
+9e97131fc58c473f7ac7d3e0ed0f25212e7e5d773d02641159eab064b46a209f	doc13	1
+3512d9923d2f81059aa64d569c272fe33ced6f931c8c01fad46fd098a54c0e09	doc14	1
+6d7af57205637b8fa8cb7665619f5d36af09f8f56b06515aceee0813187404f7	doc14	1
+7e111e169cfef0f5c99cbabb522d72df2ed55134641aed68bbe9198a017a3f67	doc14	1
+28c42b4f42ca39358042db96a01aec297fa886e2d9a36cc84870fda089a6870b	doc14	1
+77ec725f54837f1d86a1dd2c52f433c2b83290aa4339593b7c775c131e6c5b06	doc14	1
+a95b549912c99a328763a45f0511b4344fc6549e8e90caf6bd9f5210e5f0661f	doc15	1
+f2b36e091bb21ef5903d17a80534859061406e40af0de87db33e13ec2387b832	doc15	1
+bfd509121ad8a90e0533026c08d452e1bd56f83db167e8157f221f29c152ee10	doc15	1
+500924f95b3ae6011ae7cf060eb40e93b9d2d62944760e7dbd407d6231c67523	doc15	1
+a4b087bdf981463fb0c9339414fbcb1af5a0962cb300a20bd58c945cc3e34dec	doc15	1
+62c6de75339d64b3c5b4d5f434f6b4e9d8b2625d88a39508d9aedfc3a1b39147	doc16	1
+a13a16a9168d8f18c6c8e636649797aa6067d1c870f6f7b57e9ea01ef7b7ec85	doc16	1
+57d13fd5c2ced7da0b4a93b52e0dc238564dacf567b8016238f484dc628e3fbb	doc16	1
+4fb124be5d39851dc01336dde383ed082ca2a2e0e333cd92e857921c7022f2b3	doc16	1
+e79e0a2a9de22158daf2c5037fcbc4ab3ca71bcddcb4818980e3d7ae9989bfa4	doc16	1
+9ce5f6e8874f507ee6c4df3a7b56ad620e3403b74b0e5bada827f5a79a2905b9	doc17	1
+555c5b1a2e9f6549d0fcdfb90b044d3760ac7f52e29ca6a23d071549b24cbaea	doc17	1
+b7e4e3a74de3a4ebd3cc5274dad55b64c304425212d9069d3ccaa24d8f30fdb0	doc17	1
+9499207380debdffa8c33f2ce6fa1a8189ff578707f62dfd916dc7d1b283eaa1	doc17	1
+90d4377aa058d6ef0124fbc61edde6ea9edb0efb7af5758322f9720e45434f12	doc17	1
+bb01d6a1a44ecc88fc81b47963eb585ce41d8cf3d9130b7af87c0fa16afc58fc	doc18	1
+d10bec3ed0b1de2d9e2d34d5f1b3f9a064f0c55ad94f614d2ea017386b9a170d	doc18	1
+1b16d1d166643ba00ec3e7a52ae5dfd17d7fd266817e0559f1f8e7ec1d91eefc	doc18	1
+12ddbc559c471313e736b6a136d2b40239200b97169979613aa5e13bd93e49a3	doc18	1
+227372a9d23586b741e7c969719d3e46a3845438ba9c71bab6777b1623c95c6d	doc18	1
+c78d6e1e3b459f5e484a61bfb174f482f5106aedabd41eab2321521ff98f5901	doc19	1
+e8eee05c41c2b11ca21e911a6efe892e4fb2423a888d31292a3db7d0ac4e0b5f	doc19	1
+5ddd182273afaad1041cce9ac0acd222b7e3f4e5f6d32a53dbc8dcb10006174e	doc19	1
+f77123640adc18f3c20d4545ca4efd137ff488d3c00e23edaeaf6047477d0454	doc19	1
+4934a02194e60bef81ae48e268589cb25a5a1d77a945fa8ab5db484f7901ed68	doc19	1
+a6cf02c4aac83a7ace43a93d3d65269666ec4d78a1e47f043b711f289e2b5a0a	doc20	1
+05114cb50594dadce7fffb81f84b82736cd1b9a945fbf1d0206ba4d6f9383b5f	doc20	1
+0ffed93e17e7077c9e7c31ed78cfbd9b49b3a015e81daccb239bc9f76f95131f	doc20	1
+74d7b7cbea4794a880cbfbf9f0d212b7c3e3f9f00657bac9ce6f363ac6947560	doc20	1
+e9973a6255565aee321cdc29ceea47a84aa5c392a49c017cad5c704f44d1ecc0	doc20	1
diff --git a/nemo/Evaluator/eval_dataset/retriever_and_rag/queries.jsonl b/nemo/Evaluator/eval_dataset/retriever_and_rag/queries.jsonl
new file mode 100644
index 000000000..66709ffbd
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/retriever_and_rag/queries.jsonl
@@ -0,0 +1,100 @@
+{"_id": "d4791e95319ed5799fe2e42e52bc947608626cf54828725cc98e19a490e31bb8", "text": "Who was the designer of the Eiffel Tower?", "metadata": {}}
+{"_id": "c6329f64c45dceca6c4bd99dcd9d12e41dfe1cd8f654616ab319e9e9e31b8e99", "text": "When was the Eiffel Tower constructed and for what purpose?", "metadata": {}}
+{"_id": "4d7474ae36a0a7a8555dd42c863e51e2f2c25614391876a65695cbfcfa36637c", "text": "In what year did the Eiffel Tower celebrate its 100th anniversary?", "metadata": {}}
+{"_id": "ba9889ab0567d3a7f477d12370efaaced2f77282fb321198eb0f4eadb13f1b27", "text": "Which event was the Eiffel Tower built for?", "metadata": {}}
+{"_id": "9e4a4c17c17223eae9b10749486319eca51fe04364409454c147a5613e2c3e98", "text": "Who is the engineer behind the Eiffel Tower?", "metadata": {}}
+{"_id": "9988a93f1d281f8a4f109a63f08f4678ec4625375250901e14761bd394e5381a", "text": "What are the primary materials used in the construction of the Great Wall of China?", "metadata": {}}
+{"_id": "239dec3fb00a7d424e21b55a3b55b5877951c7ef6450d2d4de8fdc5cf39424fd", "text": "What was the main purpose of building the Great Wall of China?", "metadata": {}}
+{"_id": "672ec0ce4c048890428b405cf5a5cf2fb62c37da11eb32a5a5666786c9848e2f", "text": "Where is the Great Wall of China located?", "metadata": {}}
+{"_id": "bc62643829abe4184f204299034d98ae936a3711b8fbb93f607f7eb44e4962b8", "text": "What kind of structures are the Great Wall of China?", "metadata": {}}
+{"_id": "f91cd8f9315b982471907638dc60f846bd54fbdc34db9ae6750395aedae24352", "text": "What historical periods does the Great Wall of China cover?", "metadata": {}}
+{"_id": "4544bce08ed71d23edebf19ef59a86ce3db295cb9546291e262495944aedf616", "text": "Who was the Taj Mahal built for?", "metadata": {}}
+{"_id": "611bb43ce3321561149d1f513cca46d8a7bdb8c11c32e3cbdbaa6ad373592ebf", "text": "When did Shah Jahan commission the building of the Taj Mahal?", "metadata": {}}
+{"_id": "9d476a2f512e7b758fafe9504dfcf2bc6e6794ee85b6fca2c530a6a9c60f0832", "text": "What type of building is the Taj Mahal?", "metadata": {}}
+{"_id": "801e8214b018bd8e0aac7e8631d7a82face81d483fbe3d6009d9278a946d7f68", "text": "On which river is the Taj Mahal located?", "metadata": {}}
+{"_id": "ea27f5a45db1ba616513dc69654d1975fcd60831eb87629b068c2333b2d4170a", "text": "Which emperor commissioned the construction of the Taj Mahal?", "metadata": {}}
+{"_id": "719cb39d8508cecd912286496cacf63c4e40b92fb19514738b591dcfc00de93f", "text": "What type of structure is Machu Picchu?", "metadata": {}}
+{"_id": "c301a19a6e44e7dbcf4199b795e268c12cb03980ba7dd0c1b0987e4124273fc8", "text": "On which mountain ridge is Machu Picchu located?", "metadata": {}}
+{"_id": "861cbf8c481828629aece860b032d73b15fa24ec32c7ca56f50463ed361fc226", "text": "What is special about the walls of Machu Picchu?", "metadata": {}}
+{"_id": "a8beced7938da062f56f99cb5aa0f591653f5604a45460978d79a3c7c4c4d1af", "text": "When was Machu Picchu built?", "metadata": {}}
+{"_id": "8439305d8e992d0382bfd844ec1112f8e6fa74a06c6675a3a5df8ec283768558", "text": "Who built Machu Picchu?", "metadata": {}}
+{"_id": "463d8b1341734939d23575b487076168f9d3189a3c8caf71d2102b3b260c2f35", "text": "What is the alternative name for the famous amphitheater located in Rome, Italy?", "metadata": {}}
+{"_id": "98968d1bf294786eb1b16249ae84e9c5bd61f8b2d198ffaece756c040c6fb920", "text": "In which city is the largest amphitheater ever built located?", "metadata": {}}
+{"_id": "cef3453c377c4212cfe324f88c03d10a0f9adf39ef5414c9010e3fee616d2e16", "text": "What is the Colosseum built of?", "metadata": {}}
+{"_id": "b054567594dc05b28e7a9285874a42b34d4975b3faa11831576223fa7acdeda8", "text": "Why is the Colosseum considered one of the greatest works of Roman architecture and engineering?", "metadata": {}}
+{"_id": "8fa174b629b8c6e50f3a76d8fdfcc7902f575ffa12624be813d5bfbd9f767831", "text": "What type of structure is the Colosseum?", "metadata": {}}
+{"_id": "100507878129d6b20eaa8150a72bce36d4a60d08d5fc1aedf9d59054bce20da6", "text": "What is another name for the historical city known for its rock-cut architecture?", "metadata": {}}
+{"_id": "903e62c8d08b3f58252ccda641d90ccb80bea056c56489bf8f0bb8e800fd13d0", "text": "What is Petra famous for, in terms of its physical structures?", "metadata": {}}
+{"_id": "abd53764de3136706f539beb77d0d39c69e6e1617affd6822d1d61e5cb7ccc69", "text": "In what country is the archaeological site of Petra located?", "metadata": {}}
+{"_id": "482d31f3d23498a496719d5d461f86407a93ed344d922d045afb327dee0e2800", "text": "Which factor gives Petra its other name, the Rose City?", "metadata": {}}
+{"_id": "bc29052d954a0821ab6221dae7cdadb471d2f02f5672aca529c5e077e4620857", "text": "What is the primary reason for Petra's historical significance?", "metadata": {}}
+{"_id": "1f2cac557fd69325838a58f0402d7668092c444a2c5afef49e169db144668739", "text": "What civilization is responsible for building the Chichen Itza?", "metadata": {}}
+{"_id": "32fe21d4640076008527e1318c3760e30317ba2d6f5196a00a738c7ddaaf637e", "text": "In which Mexican state are the ruins of Chichen Itza located?", "metadata": {}}
+{"_id": "4d951731a1df580f54afbb4ec2f9095c0e4133d490b061630e8c82a6baa457f2", "text": "Is Chichen Itza a modern city?", "metadata": {}}
+{"_id": "63697d4c54f7c5c6e2aa9eefbf561354f28cf842287550ce7e6b80a23b44c508", "text": "Where is the exact location of the Chichen Itza ruins?", "metadata": {}}
+{"_id": "383641b56f501cf1b4bca7512dc78e9df4b6208c776bb64cd910acaaa05a03fb", "text": "Who currently maintains the Chichen Itza ruins?", "metadata": {}}
+{"_id": "4ab9b1148552aa354a95712e3bb78fa7625151e9727c22a9716eee55c8cd1ca5", "text": "What is the significance of the Pyramids of Giza in the context of the Seven Wonders of the Ancient World?", "metadata": {}}
+{"_id": "8a0f32e2bdda9140025f64fb77d82b8ed2c67b971cb4c44432ca1847a3e33ebc", "text": "Where are the Pyramids of Giza located?", "metadata": {}}
+{"_id": "f3bf2d4d988d35bcd34440f4df27d81e705aa1e4250a44e1573f2426e4c672c9", "text": "What is the purpose of the Pyramids of Giza?", "metadata": {}}
+{"_id": "5f87ea96710eeadd6b9fce4b8fa0a31c4e4ad0c4deefa1d4f03c003b812ca35c", "text": "How many of the Seven Wonders of the Ancient World are still standing?", "metadata": {}}
+{"_id": "4006d3300b1e5e882bde97c7a38cd57f11e3bea840c5d02c5106b2877eada5af", "text": "What is the historical significance of the Pyramids of Giza?", "metadata": {}}
+{"_id": "0492bed59f5f2f477031319746855c36d341367dd0981cd1def258d4f6f233b2", "text": "Who are some of the artists that contributed to the decoration of the Sistine Chapel?", "metadata": {}}
+{"_id": "e24a3dc0487600b38b9224ff5eab454a3434cab46be17557967c63bf1d71bc41", "text": "In which city is the Sistine Chapel located?", "metadata": {}}
+{"_id": "1bfc6624fca71593956762639a30f954c022e45161a59d7aa2c387451fb2628d", "text": "What is the significance of the Sistine Chapel?", "metadata": {}}
+{"_id": "1ca5ffac6744e0320a1d7d7d5ab6c106321a2a5ca950072faa330ea8849c5323", "text": "Who is the residence of the Pope?", "metadata": {}}
+{"_id": "5c4bc852ee60333f3dbe85c4a973da86767e84f3a696aba84ce093ef7ae1db07", "text": "What is the style of the art in the Sistine Chapel?", "metadata": {}}
+{"_id": "f8744a94668898ea7a9af97d04c870d73f9718a569f5c4ba0a0b46387c132924", "text": "How long is the Grand Canyon in miles?", "metadata": {}}
+{"_id": "668a0071474dae0a9a07cc97322c8c43666c3d1dbda400527842d5313f85f81d", "text": "What has carved the Grand Canyon?", "metadata": {}}
+{"_id": "212976ea90e81e91a3eb1032dbad007ae21ab3e36d3a197f470191651d8554a1", "text": "How wide is the Grand Canyon at its widest point?", "metadata": {}}
+{"_id": "ab31a5f5da3eb9b30e4597c8f2eeded3f4403c1372b6f8a74f7fd14f7b4ecd61", "text": "How deep is the Grand Canyon?", "metadata": {}}
+{"_id": "e5dbb3daaf04133c91aee4ae5c23ad3eb1757a09bb476813ba1b8d744ec86853", "text": "In which state is the Grand Canyon located?", "metadata": {}}
+{"_id": "d6006420dc385e72a2e8eda25542df449150578a282fda6663269b021b66098c", "text": "In which well-known film did Christopher Lloyd play the role of a villainous toon?", "metadata": {}}
+{"_id": "5a73841ef94dfaf35f5ac1b3fb6ba819cff3adeba1f45c9e1e3bdb683858577c", "text": "What type of character did Christopher Lloyd voice in DuckTales the Movie", "metadata": {}}
+{"_id": "8b861544082204ba43b1a68d3e1fc9d59ad248b728574706d74afbd40192f040", "text": "What role did Christopher Lloyd play in the Back to the Future trilogy?", "metadata": {}}
+{"_id": "c4b0c28cc818b06c14fd1135df72379ce5cf2631a490fcfa219935191ba08b84", "text": "In which two Addams Family movies did Christopher Lloyd play the role of Uncle Fester?", "metadata": {}}
+{"_id": "42a3f8e0d64feee3603b181022a106a97d71344720b923bc1f90af78964dd493", "text": "In which animated film did Christopher Lloyd voice a character based on Grigori Rasputin?", "metadata": {}}
+{"_id": "e2836ccb6afb375121364f988119e252fae23780ee137f145de6da21ab53dafb", "text": "What legislation was passed in 1970 to further combat cigarette advertising?", "metadata": {}}
+{"_id": "d1e3c12606ef7743f57751e47e8e404ff2763f290121e393758d71f4a3aa9ca7", "text": "What was the purpose of the Federal Communications Commission's requirement for anti-smoking advertisements on television?", "metadata": {}}
+{"_id": "1e5c00b6dc0b3efd1bcf172c4e310142ea3bdc44adf9855c7e6016c5e290fbdf", "text": "When were all cigarette billboard advertisements replaced with anti-smoking messages?", "metadata": {}}
+{"_id": "b753b1f7764f19fb207c491283bfeb26c47d3c55ec946428cf76772ae36506d2", "text": "What restrictions were placed on cigarette companies through the Family Smoking Prevention and Tobacco Control Act in 2010?", "metadata": {}}
+{"_id": "af1aae944e5f4ccf2a95101c2cdf2ac638a72bda7fe88872cd5955c56996c48e", "text": "What are the constitutionality of the Family Smoking Prevention and Tobacco Control Act and new graphic cigarette warning labels being questioned under?", "metadata": {}}
+{"_id": "c15597cc9a8f95e6e1353309e69590cc2d9ebcab1a999ce2b4e7e0a2493d38bf", "text": "Who was appointed as the brand ambassador for BBBP in August 26, 2016?", "metadata": {}}
+{"_id": "9ba6e043d95e03e67bfbf5099c7de9e6c4089679d3a72ae025c895e84b7ce45d", "text": "What is the significance of Sakshi Malik's appointment as brand ambassador for BBBP in August 26, 2016?", "metadata": {}}
+{"_id": "5e546550dbec16a3f4cbe2ac034e5a71cbe80dad360091456d0b344ecc2d3153", "text": "What is the full form of BBBP?", "metadata": {}}
+{"_id": "4dcbc0c9666769faf634cb3c5a8539f09d09774a702f70662b72ee84260c6a9a", "text": "When was Sakshi Malik appointed as the brand ambassador for BBBP?", "metadata": {}}
+{"_id": "9e97131fc58c473f7ac7d3e0ed0f25212e7e5d773d02641159eab064b46a209f", "text": "What is the role of Sakshi Malik as the brand ambassador for BBBP?", "metadata": {}}
+{"_id": "3512d9923d2f81059aa64d569c272fe33ced6f931c8c01fad46fd098a54c0e09", "text": "In which regions is the DVD boxed set of Prison Break available?", "metadata": {}}
+{"_id": "6d7af57205637b8fa8cb7665619f5d36af09f8f56b06515aceee0813187404f7", "text": "What streaming services offer Prison Break for online viewing?", "metadata": {}}
+{"_id": "7e111e169cfef0f5c99cbabb522d72df2ed55134641aed68bbe9198a017a3f67", "text": "What additional features are included in the Prison Break DVD boxed sets?", "metadata": {}}
+{"_id": "28c42b4f42ca39358042db96a01aec297fa886e2d9a36cc84870fda089a6870b", "text": "When did Fox begin streaming prior week's episodes of Prison Break online?", "metadata": {}}
+{"_id": "77ec725f54837f1d86a1dd2c52f433c2b83290aa4339593b7c775c131e6c5b06", "text": "Is access to online streaming of Prison Break restricted to the United States?", "metadata": {}}
+{"_id": "a95b549912c99a328763a45f0511b4344fc6549e8e90caf6bd9f5210e5f0661f", "text": "What were the types of atomic bombs produced by the Manhattan Project?", "metadata": {}}
+{"_id": "f2b36e091bb21ef5903d17a80534859061406e40af0de87db33e13ec2387b832", "text": "Which group of the USAAF dropped the atomic bombs on Hiroshima and Nagasaki?", "metadata": {}}
+{"_id": "bfd509121ad8a90e0533026c08d452e1bd56f83db167e8157f221f29c152ee10", "text": "When and where was the first atomic bomb dropped during World War II?", "metadata": {}}
+{"_id": "500924f95b3ae6011ae7cf060eb40e93b9d2d62944760e7dbd407d6231c67523", "text": "What was the immediate impact of the atomic bombings on Hiroshima and Nagasaki?", "metadata": {}}
+{"_id": "a4b087bdf981463fb0c9339414fbcb1af5a0962cb300a20bd58c945cc3e34dec", "text": "How many people died as a result of the atomic bombings in Hiroshima and Nagasaki?", "metadata": {}}
+{"_id": "62c6de75339d64b3c5b4d5f434f6b4e9d8b2625d88a39508d9aedfc3a1b39147", "text": "What political parties were significant during the First Party System in the United States?", "metadata": {}}
+{"_id": "a13a16a9168d8f18c6c8e636649797aa6067d1c870f6f7b57e9ea01ef7b7ec85", "text": "When did the modern Democratic Party form in the United States?", "metadata": {}}
+{"_id": "57d13fd5c2ced7da0b4a93b52e0dc238564dacf567b8016238f484dc628e3fbb", "text": "What were the two factions that the Democratic-Republican Party split into during the election of 1824?", "metadata": {}}
+{"_id": "4fb124be5d39851dc01336dde383ed082ca2a2e0e333cd92e857921c7022f2b3", "text": "Who was the first Republican president in the United States?", "metadata": {}}
+{"_id": "e79e0a2a9de22158daf2c5037fcbc4ab3ca71bcddcb4818980e3d7ae9989bfa4", "text": "What parties formed after the National Republicans collapsed in the United States?", "metadata": {}}
+{"_id": "9ce5f6e8874f507ee6c4df3a7b56ad620e3403b74b0e5bada827f5a79a2905b9", "text": "What is the order of rotational symmetry of an equilateral triangle?", "metadata": {}}
+{"_id": "555c5b1a2e9f6549d0fcdfb90b044d3760ac7f52e29ca6a23d071549b24cbaea", "text": "How many lines of reflection does an equilateral triangle have?", "metadata": {}}
+{"_id": "b7e4e3a74de3a4ebd3cc5274dad55b64c304425212d9069d3ccaa24d8f30fdb0", "text": "What is the symmetry group of an equilateral triangle?", "metadata": {}}
+{"_id": "9499207380debdffa8c33f2ce6fa1a8189ff578707f62dfd916dc7d1b283eaa1", "text": "What kind of symmetry does an equilateral triangle have?", "metadata": {}}
+{"_id": "90d4377aa058d6ef0124fbc61edde6ea9edb0efb7af5758322f9720e45434f12", "text": "What is the symmetry group of a regular triangle?", "metadata": {}}
+{"_id": "bb01d6a1a44ecc88fc81b47963eb585ce41d8cf3d9130b7af87c0fa16afc58fc", "text": "In what timeslot did The O.C. originally air during its final season?", "metadata": {}}
+{"_id": "d10bec3ed0b1de2d9e2d34d5f1b3f9a064f0c55ad94f614d2ea017386b9a170d", "text": "On what day and time did Fox test airing The O.C.'s second episode during its final season?", "metadata": {}}
+{"_id": "1b16d1d166643ba00ec3e7a52ae5dfd17d7fd266817e0559f1f8e7ec1d91eefc", "text": "How many episodes did The O.C.'s final season consist of?", "metadata": {}}
+{"_id": "12ddbc559c471313e736b6a136d2b40239200b97169979613aa5e13bd93e49a3", "text": "Against which shows did Fox air The O.C.'s second episode during its final season?", "metadata": {}}
+{"_id": "227372a9d23586b741e7c969719d3e46a3845438ba9c71bab6777b1623c95c6d", "text": "When did The O.C.'s final season air in the United States?", "metadata": {}}
+{"_id": "c78d6e1e3b459f5e484a61bfb174f482f5106aedabd41eab2321521ff98f5901", "text": "Who is the Pilot metaphorically representing in this poem?", "metadata": {}}
+{"_id": "e8eee05c41c2b11ca21e911a6efe892e4fb2423a888d31292a3db7d0ac4e0b5f", "text": "What does the \"crossing of bar\" symbolize in the poem?", "metadata": {}}
+{"_id": "5ddd182273afaad1041cce9ac0acd222b7e3f4e5f6d32a53dbc8dcb10006174e", "text": "Who does Tennyson hope to meet face to face according to the poem?", "metadata": {}}
+{"_id": "f77123640adc18f3c20d4545ca4efd137ff488d3c00e23edaeaf6047477d0454", "text": "Who is always guiding us according to Tennyson's explanation of the poem?", "metadata": {}}
+{"_id": "4934a02194e60bef81ae48e268589cb25a5a1d77a945fa8ab5db484f7901ed68", "text": "What is the role of the Pilot in the poem's metaphor?", "metadata": {}}
+{"_id": "a6cf02c4aac83a7ace43a93d3d65269666ec4d78a1e47f043b711f289e2b5a0a", "text": "What are some methods used in protectionist policies?", "metadata": {}}
+{"_id": "05114cb50594dadce7fffb81f84b82736cd1b9a945fbf1d0206ba4d6f9383b5f", "text": "Who do protectionist policies shield according to its proponents?", "metadata": {}}
+{"_id": "0ffed93e17e7077c9e7c31ed78cfbd9b49b3a015e81daccb239bc9f76f95131f", "text": "What are the negative effects of protectionist policies on consumers?", "metadata": {}}
+{"_id": "74d7b7cbea4794a880cbfbf9f0d212b7c3e3f9f00657bac9ce6f363ac6947560", "text": "Which sectors are harmed by protectionist policies?", "metadata": {}}
+{"_id": "e9973a6255565aee321cdc29ceea47a84aa5c392a49c017cad5c704f44d1ecc0", "text": "What is the general economic policy of protectionism?", "metadata": {}}
\ No newline at end of file
diff --git a/nemo/Evaluator/eval_dataset/similarity_metrics_data/inputs.jsonl b/nemo/Evaluator/eval_dataset/similarity_metrics_data/inputs.jsonl
new file mode 100644
index 000000000..89a79eb9f
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/similarity_metrics_data/inputs.jsonl
@@ -0,0 +1,128 @@
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In order to be sued in a particular jurisdiction, say New York, a company must have a minimal business presence in the jurisdiction. What constitutes such a presence? Suppose the company engaged a New York-based Plaintiff, and its representatives signed the contract with the Plaintiff in New York City. Does this satisfy the minimum presence rule? Suppose, instead, the plaintiff and contract signing were in New Jersey, but the company hired a law firm with offices in New York City. Does this qualify? \nTITLE: ", "ideal_response": "What constitutes \"doing business in a jurisdiction?\"", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I am currently watching the Depp/Heard trial, and there seems to be a lot of stuff that cannot be shown, or said, and a lot of people i would expect to make a statement are absent. My guess is that there are rulings in place that preclude certain witnesses from being called, certain evidence from being shown, and certain topics from being broached - but how, and why? And are those constraints known to the jury? For instance, there was a last minute witness that somehow offered herself up to testify, a very crucial character witness, that i would have thought the plaintiff would spare no expense in finding. There was a photo expert that was somehow prevented from talking about colors, there is a lot of confusion about the existence or non-existence of a wall mounted phone, but both parties have refrained from showing images of the wall that the phone should have hung on - although the existence of such photos is a near certainty, etc. So my questions: How are such no-show-no-tell boundaries established, does the jury get to know them, and is this also in the public record somehow? \nTITLE: ", "ideal_response": "What are the constraints on a civil suit (witnesses, evidence, topics) - and are they public", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: For example in medicine, you have requirements in many states for what constitutes a valid patient / client relationship and that this requires some in-person visits etc. does something exist like this in the legal profession? \nTITLE: ", "ideal_response": "Are there any legal restrictions to practicing law over video conference like there is with medicine?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Suppose I am not a professional lawyer but I have standing and want to bring a case to court pro se. To make this widely applicable and more readily answerable: Suppose I want to file a 42 USC 1983 complaint in U.S. federal court for some violation of my fourth-amendment rights by a municipal police officer. I believe the first thing I would want to do is find and read similar cases that have been adjudicated on that law using similar claims in my district. In fact, if I'm reasonably literate I assume I could construct all necessary filings and arguments using examples from prior cases, in addition to getting a sense of my odds of success in court. But a person who is not a professional lawyer typically lacks ready access to Lexis, WestLaw, and other such professional resources for researching recent and applicable case law. (Let us assume that the Bar does not exist in principle as a barrier to citizens seeking redress of grievances through the judicial system. I.e., \"That's what lawyers are for, so pay up if you want justice\" may be the practical answer. But I want to know how practical it is for one to seek justice pro se.) How can a pro se litigant to find applicable case law and bootstrap his way through the judicial process? \nTITLE: ", "ideal_response": "What resources are available to a pro se litigant?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Yesterday I had a discussion with my colleague. This year I am traveling rather large distances for work. My colleague asked me how I can bear the costs of travel. I told her it is ok, because the government allows me to deduct a large part of my travel costs from my taxes. My colleague warned me that she has heard that the government will revoke these rules, making it no longer possible to deduct as much of my travel costs from my taxes. Her argument was that since my taxes for 2019 will be done in april of 2020, the government could, at any point before that date, change the rules so that I may no longer be able to deduct my travel expenses. I argued that rules could only apply to travels/work that occurred after said rules were implemented, and could not apply retroactively. Norway has an \"ex post facto\" clause in its constitution against retroactive laws. Is my reasoning correct? \nTITLE: ", "ideal_response": "Would \"ex post facto\" apply to tax deduction laws?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: It's Halloween, and some municipalities are creating curfew laws intended to prevent people under 18 from being in public during the late night hours, for example South Brunswick, NJ. It seems like this apparently common sort of ordinance could be considered unconstitutional on multiple grounds, in that it is restricting free movement as well as targeting only a specific demographic. Here is the actual text of one such ordinance. So here are my questions: Are this and similar ordinances constitutionally valid? What defenses could a government make if challenged? Further, based on precedent, would they likely be successful? \nTITLE: ", "ideal_response": "What is the justification for curfew laws?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: If a tenant buys the house they were renting, does the landlord have to refund the security deposit to them? Alice is renting a house from Bob on a six-month lease. She then enters an agreement to purchase the house. At closing is Bob responsible to pay back her security deposit? I am in Oregon and could not find the answer easily. I would assume this is a basic, common-sense \"yes\", but I'd like to know the statutory basis, if possible. Thanks. \nTITLE: ", "ideal_response": "If a tenant buys the house they were renting, does the landlord have to refund the security deposit to them?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The companies which operate coworking facilities are in a position to learn a lot about a person's business. Every facility I've seen is blanketed with cameras, potentially including microphones, the internet connection is provided and monitored by them, and their staff greets every visitor for a meeting. I've reviewed the contracts from a few of these companies and was surprised to discover they do not explicitly state they will limit their use of information they learn about their customers' businesses merely to the realm of providing the service to the customers. One would expect at least a simple and clear statement that customer trade secrets and intellectual property will not be used by nor shared by the coworking company. But no such statement is provided. Is there something I am not understanding here? For example, is there overarching legal precedent (which wouldn't be included in the contract) that protects a small business if they use one of these facilities? To be clear, I am not talking about protections from other users of the coworking facility. I am talking about protection from the owner of the coworking facility itself and its employees. \nTITLE: ", "ideal_response": "Are coworking companies bound by legal precedent not to use the intellectual property nor trade secrets of its customers?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In India, religious figures have been given the status of legal person, and from the case law, I've read there doesn't seem to be a clear rationale for this. Corporations and organizations are also given the status of a legal person. Does this mean that if a significant amount of people associate with something it can gain the status of a legal person? By this logic, can say fictional characters get the status of a legal person as well? and would it effect copyright law in any way ? \nTITLE: ", "ideal_response": "Why are religious figures given the status of \"legal person\" in India?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The Armenian assassin of the ex-Ottoman Grand Vizier Talat Pasha, Soghomon Tehlirian was acquitted of murder in his trial. His defence was that he was killing Talat Pasha in retaliation for his crimes in orchestrating the Ottoman Armenian Genocide. Tehlirian was tried for murder, but was eventually acquitted by the twelve-man jury. His trial was a rather sensationalized event at the time, taking place shortly after the establishment of the Weimar Republic, with Tehlirian being represented by three German defense attorneys, including Dr. Theodor Niemeyer, professor of law at Kiel University. Priest and Armenian Genocide survivor Grigoris Balakian, German activist Johannes Lepsius, and German commander of the Ottoman armed forces during the war General Liman von Sanders were among several of the prominent individuals called as witnesses to the trial. The trial examined not only Tehlirian\u2019s actions but also Tehlirian's conviction that Tal\u00e2t was the main author of the Armenian deportation and mass killings. The defense attorneys made no attempt to deny the fact that Tehlirian had killed a man, and instead focused on the influence of the Armenian Genocide on Tehlirian's mental state. Tehlirian claimed during the trial that he had been present in Erzincan in 1915 and had been deported along with his family and personally witnessed their murder. When asked by the judge if he felt any sort of guilt, Tehlirian remarked, \"I do not consider myself guilty because my conscience is clear\u2026I have killed a man. But I am not a murderer. However, murder in retaliation for crimes the victim committed is often not considered a legal defence to the act of murder. For example, this 2013 Baltimore case resulted in the sentencing of the killer to 30 years in jail. Similarly, the Lillehammer affair involves Israeli Mossad agents sent to assassinate a (mistakenly identified) mastermind of the Munich attacks on Israelis. The agents were arrested and convicted by Norwegian authorities. A final example was the case of Vitaly Kaloyev, who stabbed to death the air traffic controller responsible for the \u00dcberlingen mid-air collision in which his family was killed. He was also convicted for murder in that case and sentenced to prison. What differences in law in that time in Germany led to the acquittal of Soghomon Tehlirian for murder? \nTITLE: ", "ideal_response": "Why was the Armenian assassin of Talat Pasha acquitted for murder?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Before 1964 the New Hampshire Constitution had two articles (12 and 13) in its Bill of Rights permitting citizens to pay money in lieu of compulsory military service. In World War 2 and the Korean War did the men of New Hampshire have the ability to pay to avoid being drafted? If not, what the legal pretext used to void this right? \nTITLE: ", "ideal_response": "Did citizens of New Hampshire have the ability to buy their way out of military service in WW2?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I live in Germany and have a yearly contract for the Adobe Creative Cloud. This contract can only be canceled 30 days before the one year period ends. They say, that a termination notice prior to those 30 days cannot be noted in their system. Are companies allowed to restrict the period in which I can submit a termination request to cancel a contract? \nTITLE: ", "ideal_response": "Can a company restrict the period in which a contract termination can be submitted?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Article II, Section 2, of the US Constitution (emphasis added) states: [The President] by and with the advice and consent of the Senate, shall appoint ambassadors, other public ministers and consuls, judges of the Supreme Court, and all other officers of the United States, whose appointments are not herein otherwise provided for, and which shall be established by law[.] Suppose the Senate were to pass a resolution advising the President to appoint a strict \"originalist\" who has been a long-time member and avid supporter of the Republican Party. To what extent must the President heed this advice? Taken further, does this mean the Senate can specifically identify who an appointee shall be? \nTITLE: ", "ideal_response": "How much does the President have to take the Senate's appointment \"advice?\"", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This comes from an English Language Lerner's Stack Exchange question, the current answers to which are lacking in historical/legal analysis of the term. What is the origin of the legal phrase \"grand theft auto\"? In particular, is there any legal or historical rationale for the ordering of the terms? (E.g. why isn't it \"grand theft of auto\" or \"auto grand theft\" or even \"grand auto theft\".) There are some claims that it's simply an elision of a comma (\"grand theft, auto\") implying that it's a simplification of something like an entry in an (alphabetized) list of crimes. Others have indicated that there might be some Law French or Latin influence in putting the descriptor last (along the lines of \"courts martial\"). Are either of these accurate with the historical origin of the term, or is there some other reason? \nTITLE: ", "ideal_response": "What is the origin and history of the term \"grand theft auto\"?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: As we await the Rittenhouse verdict it is clear that this is a polarizing case and the governor has mobilized 500 National Guardsmen in anticipation of violence. The verdict is expected to attract polarized crowds that officials perceive as having a proclivity for violence / destruction. Pursuant to preventing violence: What tools are available to the Wisconson governor, besides having Guardsmen to strictly enforce a curfew? Ideally, tools would not infringe on any rights of assembly. \nTITLE: ", "ideal_response": "Anticipation of Violent Gatherings", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: HIPAA seems comparable but only applies to medical data, but is there anything more general than that? How does privacy act compare to data subject access rights under gdpr? Is it that it only applies to government entities but not private sector data controllers? \nTITLE: ", "ideal_response": "What is the most comparable American law to GDPR right to subject access? Privacy Act?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Can a contractor refuse to pay a subcontractor until the contractor receives his payment from the company/client? Does it matter if it's in a written contract? Can the buck be passed like that? \"Payment is contingent on when Client pay Contractor. In the case where a Client uses net-30 payment schedules, the Contractor can expect to be paid up to 30 days after the first day of the month after Subcontractor submitted invoice.\" They slipped this one by me. Hypothetically if the Client doesn't pay Contractor does that mean I wouldn't get paid and there would be nothing I could do about it? Is there even a deadline? \nTITLE: ", "ideal_response": "Can a contractor refuse to pay a subcontractor if he doesn't get paid?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The internet allows for some level of anonymity and occasionally you will find posts that give detailed descriptions of how to perform an activity that is illegal. Examples may include how to covertly grow marijuana, or swindle money. If I live in the US and make posts or provide answers to questions that explain how to perform an activity that is illegal in the US, can I be prosecuted? This question is about sharing the \"how to\" information only, not actually performing any illegal activity. \nTITLE: ", "ideal_response": "If I live in the US can I be prosecuted for answers describing illegal activities?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I'm an engineer and I'm studying for the law and ethics exam to become a registered engineer in Canada. I'm studying a chapter on a book about Bonds where I see: A bond is a special form of contract, whereby one party, the surety, guarantees the performance by another party, the principal, of certain obligations. The party to whom the obligations are owed is called the obligee. Reading this link I find: Insurance: When a claim is paid the insurance company usually doesn\u2019t expect to be repaid by the insured. Surety Bond: A surety bond is a form of credit, so the principal is responsible to pay any claims. Typically, the surety requires the principal and its major shareholders to indemnify the surety against any loss. In fact, the indemnity given by the principal is one of the things that distinguishes a bond from an insurance contract My doubt is: If as a contractor(principal) you will be required to repay to the surety for any losses that the surety had to pay to the owner(obligee). What are the advantages of a bond over liability insurance where the insurance company pays and the contractor doesn't have to repay anything else that the usual monthly premiums? Why creating the figure of the surety if the contractor will end up having to pay the surety to compensate for what the surety paid to the obligee anyway? Why not using liability insurance for this? I don't understand very well the notion of face value of a bond which is \"the maximum potential liability of the surety\". Does the principal have to pay up to this to the surety ? \nTITLE: ", "ideal_response": "Performance bonds vs insurance", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I\u2019m not well versed in economics or law by any stretch of the imagination. But I am confused with certain measures taken thus far by the government. There is currently a temporary shutdown of vast sectors of the economy now and workers affected are not receiving any income. To me, logic would dictate that here should also be a temporary cessation of payments for rental properties (for small businesses as well as individuals) and, to be fair, a temporary cessation of payments for loans for owners of those properties. In turn, banks would be given temporary cessation payments for interest to its customers since the banks would not receive any payments on loans including payments on mortgages from homeowners. That would essentially close the loop on money flow which again would be temporary and the impact to the greater economy would be limited. If undue burden is placed on any one of these points in the cycle, and right now, workers and small businesses are taking that brunt, then there could be long-standing damage to the economy from bankruptcies, foreclosures, etc. The Fed is already addressing the banking side of things by keeping rates close to 0. Does the government have the authority to push for a standstill in mortgage/interest/rent payments? I feel this temporary measure would alleviate economic hardship and prevent long-term damage to the economy. \nTITLE: ", "ideal_response": "Could government legally temporatiliy halt rent/mortgage/interest payments for the benefit of the greater economy?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I just came across some US tax forms for foreign entities like W-8BEN-E and 1040NR which made me ask this question. Suppose a company called Cool Stuff Ltd. runs some sort of online business serving customers all over the world. This could be paid content subscriptions, dating, posting ads/classifieds, hosting, VPN \u2014 all sorts of stuff that is performed by software accessible by customers over the Internet. The company is not present in the US in any way: no incorporation, no offices, no employees, no web hosting, the shareholders and directors are not related to the US at all. People in the US can access the company's website/app and buy services in the company's home country currency. The company may not even care or take note of where they are from. Technically, if any single John Doe from the US decides to buy a subscription, he will make the company \"receive\" \"income\" \"from U.S. sources that consists of\" \"Compensation for, or in expectation of, services performed\" (W-8BEN-E): Does that really trigger the company's obligation to fill US tax forms and pay tax in the US? If so, is there any threshold i.e. minimum income that triggers the obligation? \nTITLE: ", "ideal_response": "Do non-US companies have to pay US tax just because they have US customers?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I am relying on the ratings on Avvo to determine who I should choose. Is it reliable? \nTITLE: ", "ideal_response": "Are the attorney ratings on the website avvo.com reliable?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I'm developing an application for mobile/web/desktop that will (in very watered down basic terms) store free text strings they've self-contributed and access them freely. As it stands, these applications can run without processing data and storing everything locally on the device. However, if I want to save the users data to a server in order for them to load the same information on multiple devices, I will be storing and processing data. The reason I'm hitting a brick wall in trying to track down the legal issues involved here are twofold: If the developer is based in the EU and the server is based in the US, which jurisdiction does this fall under? If I'm storing just an e-mail address and unique identifier key in the database for login, would this in itself, be identifiable information? Would a user storing the text string \"Mop floor tonight\" be considered personal data? I ask because as this is free text, I'm aware a user could essentially type \"My name is Dave Smith and I live at 742 Evergreen Terrace\" and suddenly that's very personal data. I'm just trying to get my head around the risks/efforts involved in having a simple \"sync to all devices\" command that can be manageable by a 1 person developer team. Are there perhaps exemptions for small businesses whose IP is the code itself and not the data (i.e., I'm just collecting a username/email for login, no other data, not writing to the e-mail or using that data to generate revenue etc), and the self-supplied data is essentially just there to enable a feature? Or is this a legal framework I'll have no choice but to consider globally for all users? \nTITLE: ", "ideal_response": "What is the law regarding web/mobile applications and \"personal\" data?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Applications for search warrants are typically ex parte. In such a proceeding the person who owns the property to be searched and/or seized cannot challenge the unconstitutional laws that form the basis of an affidavit justifying that probable cause exists for a search. Does there exist a basis to challenge the admissibility of evidence obtained following this search? Particularly if such a search is used as pretext to find items that violate laws that are not unconstitutional. Would agents that engage in such a search be able to assert qualified immunity in a 1983 proceeding even though the statute was blatantly unconstitutional? \nTITLE: ", "ideal_response": "If a law is blatantly unconstitutional, is a search warrant based on that law valid?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: If an external company's logo includes a registered symbol, are you obligated to reproduce it (in a link to their website, for example) along with this symbol? What if the usage guidelines dictate the use of (R) -- can you omit it? \nTITLE: ", "ideal_response": "Are you legally obliged to show the registered symbol with a logo?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: If a person invented something new but didn't file a patent: Is it legal for other people to file a patent for it? Is it possible for other people to be granted a patent for it if they are not the original inventor? If other people file a patent and claim they invented it independently at about the same time, what evidence do they need to show? If the original inventor has clear evidence to prove his/her originality, is there a way to stop other people from patenting this invention? \nTITLE: ", "ideal_response": "Is it possible for one's own invention to be patented by others?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: As per the title, I am (as a non-lawyer) helping someone in a dispute where fraud has occurred. If I'm helping the person write a letter to one of the involved parties based on my advice, if I help her produce a letter which says she has \"taken advice\" (and I am the one advising her), would this be construed as a claim she has received advice specifically from a lawyer? Put another way, if someone claims to have \"taken advice\", and the advice they received was from a non-lawyer, is the claim misleading? \nTITLE: ", "ideal_response": "Do the words \"take advice\" / \"taken advice\" imply the advisor is a lawyer?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I would like to request a copy of my contract from a previous employer, where most of the staff is juridically uneducated people. I'm afraid that if I introduce myself, HR would realize that I am a former employee and would decline to send the copy, stating that I should have my own version (which I have lost). There might be a legal debate, depending whether I signed a certain page where it says \"overtime hours are not paid\" or not. I know I can hire a lawyer who can demand the contract copy, but then if I really signed the page (it was a long time ago, I don't remember), the job of the lawyer is done and I still pay for their service. To save money I have considered this trick: Register a new email address with something like Person@FloridaLawGroup.com Call the HR office and impersonate an abstract lawyer who is defending an interest of the former employee. Request a copy of the contract be sent to the aforementioned email address. Is any part of this plan illegal? \nTITLE: ", "ideal_response": "Is it legal to impersonate an abstract lawyer?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The Supremacy Clause of the U.S. Constitution states that the Constitution, laws and treaties are the supreme law of the land. Obviously, the Constitution supersedes both laws and treaties, but which is given more importance, laws or treaties? For example, if the Congress passed a law in violation of a treaty, would that law be valid? If the Senate ratified a treaty in violation of federal law, would that treaty be valid? \nTITLE: ", "ideal_response": "Which supersedes the other, laws or treaties?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Is it a GPLv3 violation for a project to charge a fee for downloading binaries? Concretely I am wondering about this case: http://moodeaudio.org/ The project is clearly licenced under GPLv3 as stated on the website and uses open source software in it. However it is not distributed freely. A fee of 10$ is charged for downloading. Furthermore I can not find any links to the source code anywhere on the website. \nTITLE: ", "ideal_response": "Is it a GPLv3 violation for a project to charge a fee for downloading binaries?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I have a judgment against a debtor with only their name and date of birth. How do I obtain their credit report? Just write to Experian with a court order? First question- is it legal to do this. Second question- do I need their ssn, and if so, how can I legally obtain it. \nTITLE: ", "ideal_response": "Pulling debtor credit report without SSN", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In the US, it is typical for patients to sign contracts with a medical office assigning whatever benefits are applicable from their health insurance to the doctor's office and leaving the patient financially responsible for what is left unpaid. Is there a minimum standard for doctor's offices to actually complete the insurance process before passing on the rest of the bill, and if so, what kind of remedy is available if they don't? Example: A doctor's office sends to the insurance company details of a procedure to be paid, but left off one fact that the insurance company needed to complete the claim. The insurance company informs the doctor's office that in order to finish the claim, they have to complete the missing bit of paperwork. Instead of doing so, the doctor's office leaves benefits the client is entitled to unclaimed and proceeds to bill the patient for the unpaid amount. Is this a valid bill under that standard kind of contract? Is there some kind of minimum due diligence required on the doctor's part, and if so, is there some area of tort law that is available for redress? \nTITLE: ", "ideal_response": "What is the duty of a medical office to properly bill one's insurance? (USA)", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: 8 USC \u00a71324b prohibits employers from discriminating against employees or prospective employees on the basis of immigration status, if the individual in question is a citizen or national of the United States, a lawful permanent resident (with a proviso that I won't discuss here), or an asylee or refugee. I'm wondering whether the existence of 8 USC \u00a71324b rules out the possibility of any stronger state law. For example, suppose a state had a law that says employers can't discriminate against people with Temporary Protected Status (TPS). This doesn't directly conflict with federal law, since TPS people have an immigration status that allows them to work for any employer. However, I am not sure whether the concept of field preemption would apply here. \nTITLE: ", "ideal_response": "Can states and localities have stronger immigration status-based employment discrimination laws than the federal law?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Does NJ law prohibit alcohol consumption in public parks? \nTITLE: ", "ideal_response": "Does NJ law prohibit alcohol consumption in public parks?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: My friends and I want to start selling merchandise of pictures of two celebrities. The pictures are of two celebrities. We want to make sure that we won't have a lawsuit on our hands if we start selling. The pictures we want to use are available and posted publicly by them. The pictures are not copyrighted. Is this legal? \nTITLE: ", "ideal_response": "Can I legally sell merchandise with the face of a celebrity without their permission?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Kyle Rittenhouse has been found not guilty of Wisconsin state criminal charges. Is there any federal law that he violated? \nTITLE: ", "ideal_response": "Is Kyle Rittenhouse \"out of the woods\" or could he face Federal charges?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I would like to know if it is legal to download the content of a website and display it in my own? There is one way to include a website inside another: <iframe> (w3schools description) However, some websites decide that they don't want to get iframed. For example Google doesn't allow other websites to frame their content. This has of course certain reasons. There is a very simple option, the X-Frame-Option that can tell a browser that it must not render a page withing a frame if it is not from the same origin as the page itself. So far so good. Just for the sake of completeness I want to say that it's still possible to display just the content of another website. The most trivial one would be to hit CTRL + S on my keyboard, save the website and host it on another server. A more sophisticated of course would be to write a program that does that for me. One could simply download the HTML page and embed the result into another. I can imagine that e.g. copyright could be an issue. But what if my site states that it does not own the content etc. and that it did not modify anything and so on. Is there a way that this could be legal? Example What is with this example? https://proxy-nl.hide.me allows you to enter any URL and e.g. specify that it should remove JavaScript code. This allows you to see even www.theindependent.com, which tries to take counter measures against embedding their site, under another domain. \nTITLE: ", "ideal_response": "Is it legal to download a website and display it on another that I own?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This question comes out of reading this New York Times article (quoted below), about a bill that would allow officers to use a \"textalyzer\" to digitally search phones for evidence that the phones had been handled recently: The technology could determine whether a driver had used the phone to text, email or do anything else that is forbidden under New York\u2019s hands-free driving laws, which prohibit drivers from holding phones to their ear. Failure to hand over a phone could lead to the suspension of a driver\u2019s license, similar to the consequences for refusing a Breathalyzer. The intention is behavior change: \u201cWe need something on the books where people\u2019s behavior can change,\u201d said [F\u00e9lix W. Ortiz, a Democratic assemblyman who was a sponsor of the bipartisan Textalyzer bill], who pushed for the state\u2019s 2001 ban on hand-held devices by drivers. If the Textalyzer bill becomes law, he said, \u201cpeople are going to be more afraid to put their hands on the cellphone.\u201d The US Supreme Court has unanimously ruled that when somebody is arrested, their phone can't be searched without a warrant. But the bill\u2019s authors say they have based the Textalyzer concept on the same \u201cimplied consent\u201d legal theory that allows the police to use the Breathalyzer: When drivers obtain a license, they are consenting in advance to a Breathalyzer, or else they will risk the suspension of their license. (Emphasis added.) The analogy to the Breathalyzer is based on comparability of dangers, too: Deborah Hersman, the president of the nonprofit National Safety Council and a former chairwoman of the National Transportation Safety Board, ... said the Textalyzer-Breathalyzer comparison was apt because looking at and using a phone can be as dangerous as driving drunk. \u201cWhy are we making a distinction between a substance you consume and one that consumes you?\u201d How far does this \"implied consent\" legal theory go? Can it be extended to mean that drivers' license applicants give \"implied consent\" to have their cars, homes, or body cavities searched in detail, or give \"implied consent\" to physical control by police (to enforce any failures to comply with an instruction), etc., if police request that power and find a suitably sympathetic audience in the General Assembly? It's an old argument that increasing police search powers increases safety, because it allows the police to search out and stamp out whatever behavior creates the danger to others. Where's the line, if any, which limits the extent of how far that can go? What do people really consent to when applying for a driver's license? Can \"implied consent\" be retroactive, such that if I have a New York driver's license today, that means I've given \"implied consent\" to whatever permissions the General Assembly grants police a couple years after I've gotten that license? If not, does a license renewal establish that \"consent?\" There is some precedent in that having gotten a NY license a few years ago does indicate consent to be bound by traffic laws including updates and changes that occur during the period of license validity. Is \"implied consent\" to searches one of the topics covered by that? This is tagged for New York, but as the article notes: If it were to pass in New York,... it could well spread in the same way that the hands-free rules did after New York adopted them. So feel free to answer for other jurisdictions, if they are a better fit for your expertise, and note that in your answer. \nTITLE: ", "ideal_response": "What are the limits of \"implied consent\" when obtaining a driver's license?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I need advice on how I can get a refund on a defective ~$186 Sam's Club prescription. They were not only ineffective, but I experienced withdrawal symptoms taking those. What legal recourse do I have? Obviously, the attorney's fee will cost more than the Rx. Here are the details... Cutting the tablets in half, it was obvious that the previous tablets --that were effective-- looked different from the last Rx that was not. The tablets that were effective had a layer of dark gray on one side; the \"dud\" batch did not, but instead had an all-white interior. As soon as I took the replacement non-time-release alternative Rx prescribed by my physician, I experienced relief from the withdrawal symptoms as well as my medical condition. The M.D. told me that I needed to contact the pharmacy. Well, both the manager of the Sam's Club pharmacy in Springfield, MO and the lady I spoke with at the drug manufacturer have flatly stated (over the phone) that they will neither accept a return (or even samples for testing) nor refund the price we paid for these dud, ineffective tablets. The lady at the manufacturer told me that they would file a manufacturing defect report to the FDA. That was all. The pharmacy manager claimed that the tablets for the previous, effective Rx was from the same batch as the last, \"dud\" Rx refill. What to do? I've already contacted a local TV news station that has excellent consumer-advocacy spots (On Your Side Contact KY3). However, it now seems clear to me that she is not going to televise this. \nTITLE: ", "ideal_response": "Received defective Rx. Pharmacy and manufacturer refuses to refund me", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I was 2 months late to renewing my license plate this year. I was given a fine (which was more than fair given my mistake), paid it without contesting, and then renewed my plates (with a late fee). The payments for my plate sticker went through and I was able to confirm that my plates were listed in the Secretary of State database as \"In Good Standing\". A week later I was still waiting for my sticker to come in the mail but parked my car on a city street. A Chicago officer issued another expired plates violation for my vehicle while I was away and didn't have an opportunity to explain that my plates were now in good standing for atleast the past week and I was merely waiting for the sticker in the mail. I immediately contested this ticket online, providing evidence of my receipt from the Secretary of State's office that I had renewed well before this ticket was issued (and worded it as formally and politely as I could have). I thought when my situation was reviewed it would be rather open and shut, and the fine would be dropped; however today I received a letter in the mail that a local judge had reviewed my case and determined the violation was still in affect. It then stated I would need to pay additional money in order to open a civil lawsuit to sue the city of Chicago if I wanted to appeal further; otherwise, I would need to pay my fine. It is my understanding they are probably within their right to press forward on charging me (I believe state law is that even if your plates are in good standing you can be fined for not displaying your sticker within 30 days of the original expiration date); however, I felt my situation would have been more understandable and that I had done my best to renew my plates prior to when this new violation was issued. In short, I have a 60 dollar fine (not for having expired plates but for not receiving my sticker quickly enough to display). Please let me know if a circuit court in Chicago would be even be willing to see things from my perspective or if I should just hand over more money. Lastly, If I were to file a civil lawsuit how much would I expect to pay to even try appealing this? Would I effectively be losing more money than if I just paid this fine? I tried to get this answered by calling the circuit court's financial department yesterday but they just transferred me to different branches for over an hour. \nTITLE: ", "ideal_response": "Do I have grounds to file a civil lawsuit against the City of Chicago for my Expired Plates Violation? Is it worth it?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: For example, what medium are they recorded/stored in? Is it a hybrid/mix as technology has evolved? And, does a court automatically receive copies of all other regional courts of the same and higher degree of superiority's decisions periodically? Or would a Willesden County Court archive only have case decisions that were issued in that facility / court division? And how far back would they go? \nTITLE: ", "ideal_response": "How are courts' official case law archives distributed and stored?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In the UK, if someone makes a statement that \"he is a cripple,\" or \"she has problems holding her water at night,\" those are potential causes of action if they cause distress, even if the statement is true. (Truth is not an absolute defense.) In the U.S. if I break your antique vase, I could be liable for at least the market value of the vase. But if it has sentimental value because you inherited it from your grandmother, I could be liable for even more, based on pain and suffering. But I couldn't be sued for libelling your dead grandmother. How do these rules differ? Is it that in the UK, there could be \"pain and suffering\" independently of another tort, while in the U.S. it could take place only in connection with some other offense (e.g. the broken vase)? \nTITLE: ", "ideal_response": "How does the law on \"pain and suffering\" differ in the UK and US?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: According to some information, a person will need to be a resident in an EU country in order to be protected by the GDPR regulation. Are there jurisdictions in North or South America where residency similarly conveys GDPR protection for EU citizens? \nTITLE: ", "ideal_response": "Which EU-related jurisdictions in Americas apply the GDPR regulation?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I often receive emails asking me to do something in exchange of potentially getting some prize (e.g. Amazon Gift Card). Example 1: A great bonus is that if you download Riffiti in the next 48 hours from receiving this email, you could win a $100 Amazon Gift Card. Example 2: Attend the session this Thursday 3pm-5pm in 32-124. Refreshments will be served. You can also win a couple giftcards by responding via their google doc! Example 3: The study involves completing a 20-45 minute online questionnaire. You will have the opportunity to enter a lottery to win a $250 Amazon.com gift card. Example 4: Teams that come up with the best plan will be entered into a lottery to win an extra $20 Amazon gift card. Example 5: Bring Your Own Mug and Win a $10 Gift Card from Flour - Everyone who brings their own mug and helps to reduce waste can enter a raffle for a $10 flour gift card. Example 6: If you fill this form, we'll enter you in raffle for several $50 Amazon gift cards. Are senders obliged to actually gave the announced prize to someone? Or can they simply have no prize at all? Assumes that everything happens in the United States, and more specifically in California or in Massachusetts. \nTITLE: ", "ideal_response": "Implicit lottery: what if there is no prize?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: If I am out somewhere, and say my door is open and there are drugs in plain view, if a cop was in the house could I be arrested even though I'm not there? \nTITLE: ", "ideal_response": "Can I be arrested if I'm not home and there are drugs in my room?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Suppose some sentient aliens figure out how to receive our radio and TV broadcasts. They eventually decode the English language, and a few of them study it enough to be fluent. They land somewhere in the United States, perfectly capable of conversing with humans. Are they legal persons? Since they obviously aren't citizens, are they illegal \"aliens\" (in the legal sense of the word \"alien\")? If a person stole from one of the aliens, could the alien sue? \nTITLE: ", "ideal_response": "Would sentient aliens be legal persons?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Let's say you use encryption on your computer, and you've been arrested. The police believe there is evidence on your computer, and have seized the computer via a warrant. The police cannot extract any information because your hard drive is encrypted. The police were unable to find the password via other means (such as written notes). You are aware that there is incriminating evidence on your hard drive, so you do not give up the password willingly. Can the police get a court order or otherwise legally compel you to provide the password to decrypt your hard drive? (I imagine this would implicate your right to refuse to answer questions that could incriminate yourself.) \nTITLE: ", "ideal_response": "Can you be legally compelled to disclose your password in a criminal investigation?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Hypothetical Question: I am using an adult app (18+) for roleplaying. A girl from South Africa joined. She told her age is 14. I asked \"wanna do roleplay?\" She also kept her category in Roleplay. Then she said I'm underage you creep\" and then closed the chat. We didn't say anything nor did we send any pictures. Is it illegal? Will I be jailed over just saying \"wanna do roleplay\". I am 19 year old. Is this chatting with a minor is illegal? \nTITLE: ", "ideal_response": "Is using the word Roleplay with overseas minor can land me in jail", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The State of Florida seems to be in a protracted battle with Disney. The state has legislated in the direction of Disney, and continues to do so (https://www.wfla.com/news/florida/desantis-speaking-at-reedy-creek-administration-building/). Despite the fact that there might be real arguments that such legislation is for the well-being of the state, the dialog surrounding Reedy Creek issues, including statements coming out of the State, seems to provide a solid argument that the motivation for such legislation is to punish Disney for voicing objection to recent policy decisions in the State. Assuming that as a given, for the moment (though that's, at the least, arguable), doesn't that provide Disney with the argument that these are Bills of Attainder? \nTITLE: ", "ideal_response": "Can Disney claim Florida has passed and is pursuing Bills of Attainder?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I am at the stage where I want to expand my software startup, however, I do not know the proper procedure to do so. I was wondering how to create a non disclosure agreement and more importantly if you need a lawyer in order to legitimize it. I also would like to know what you would do if someone broke your NDA. \nTITLE: ", "ideal_response": "How do you draft an NDA for sharing a startup idea and do you need a lawyer to legitimize it?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Lately I have been wondering why in the UK public buses do not require that the passengers wear a seat belt? In-fact, in the UK it is my understanding that no public buses actually have seat belts. I also have the same understanding in regards to private coaches. While most private coaches do come with seat belts as far as I'm aware you're not required to wear one. I find it strange since in a car you're required to wear a seat belt by law. It strikes me as odd that buses do not have the same requirements since this seems to imply that the bus is somehow safer or drivers are more qualified, yet bus accidents do happen! Is there any specific mention of this in any UK legislation? \nTITLE: ", "ideal_response": "Why do buses not require passengers to wear seatbelts?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Signing a document to show your agreement is very common, but signatures seem very easy to forge. Nowadays, with electronic signatures it is trivial - some software doesn't even include your actual signature but pastes in your name in a cursive font. What good does it do to sign something, then? Couldn't you just claim you didn't sign anything afterwards? \nTITLE: ", "ideal_response": "What is the point of signatures?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Would these 2 invoices sent from the same limited company be classed as the same invoice (my) Company name: XYZ Invoice number: 185 date sent: middle of july in the footer was XYZ trading as ABC and then (My) Company Name: ABC date sent: end of October Invoice Number: 185 client is saying that since invoice number 185 to XYZ trading as ABC was paid that makes invoice number 185 from ABC null and void, as they have proof of paying the invoice number 185 \nTITLE: ", "ideal_response": "Would these two invoices be classed as the same?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I bought four tickets to a concert on craigslist. Then I sold two of them on stubhub. I later learned the tickets might be fake. I immediately notified stubhub. I was told seller's agreement is I am liable for 40% of what I earned on sale or full replacement cost (based on market rates) for new tickets, which would be double what I made on sale. I sold in good faith, not knowing they might be fakes, so is there anything that protects me from having to pay more money than I earned? I will happily refund the entire amount I was paid. Thanks. \nTITLE: ", "ideal_response": "Ticket Seller's Liability", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I want to understand what the implications are for user emails with the onset of GDPR. It's a bit of a strange question but here is the scenario. We run a Software as a Service platform when a user can invite other users to a team. They do this by email, at which point we have a user's \"personal information\" in the form of their email and we have issued them an email invite. We have to keep the email address on record for a period of time, say 7 days, for that invite to be accepted at which point we delete it if no action has been taken. I understand the restrictions on me in terms of not marketing to this email address etc. I want to understand if I am even allowed to send the invite email in the first place. That is to say, if someone hasn't explicitly given me their email address, am I allowed to use it. Logic would indicate that this should be fine as it really makes the user experience of inviting people very poor, however, the question has been asked by a member of my team and so I thought I'd seek some clarity. \nTITLE: ", "ideal_response": "GDPR Legislation and user invites by email", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Florida enables recording of HOA meetings: FLORIDA STATUTE 718.112(2)(C) states: Board of administration meetings.\u2014Meetings of the board of administration at which a quorum of the members is present are open to all unit owners. Members of the board of administration may use e-mail as a means of communication but may not cast a vote on an association matter via e-mail. A unit owner may tape record or videotape the meetings. The right to attend such meetings includes the right to speak at such meetings with reference to all designated agenda items. The division shall adopt reasonable rules governing the tape recording and videotaping of the meeting. The association may adopt written reasonable rules governing the frequency, duration, and manner of unit owner statements. If the meeting is done through the phone, to avoid problems caused by COVID-19. Does recording over the phone create a conflict with Florida wire-tapping laws? 934.03 Interception and disclosure of wire, oral, or electronic communications prohibited. Because Florida requires all parties to consent, if any party did not consent then this would veto the right of other owners to record. Which statute would take precedent and why? Ideally there would be relevant cases to provide precedent. Florida Statues: 934.03 Interception and disclosure of wire, oral, or electronic communications excerpt: (h) It shall **not** be unlawful under this section and ss. 934.04-934.09 for any person: 1. To intercept or access an electronic communication made through an electronic communication system that is configured so that such electronic communication is readily accessible to the general public. 2. To intercept any radio communication which is transmitted: a. By any station for the use of the general public, or that relates to ships, aircraft, vehicles, or persons in distress; b. By any governmental, law enforcement, civil defense, private land mobile, or public safety communications system, including any police or fire communications system, readily accessible to the general public; c. By a station operating on an authorized frequency within the bands allocated to the amateur, citizens band, or general mobile radio services; or d. By any marine or aeronautical communications system. 3. To engage in any conduct which: a. Is prohibited by s. 633 of the Communications Act of 1934; or b. Is excepted from the application of s. 705(a) of the Communications Act of 1934 by s. 705(b) of that act. \nTITLE: ", "ideal_response": "HOA meetings transcripts", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: What does it mean, and what are the implications of having a trial (or other hearing) vacated? Does it more strongly suggest that it will have been dispensed with finally, or that it will be rescheduled/relisted for another date? Importantly, I would like to know whether this means that the current case is ditched and the prosecution has to lay down charges to the court again from scratch? \nTITLE: ", "ideal_response": "What does it mean for a trial, or other hearing, that has been listed to be \u201cvacated\u201d?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: It seems arbitrary that it seems to conventionally refer to one rather than the other, but is actually even more confusing that it refers to barristers rather than solicitors when one typically seeks advice from a solicitor by consulting them for their opinion of the legal position, and not from a barrister. By the time to instruct a barrister to appear in court on one's behalf, it seems that one is largely clear on one's position already and has rather to instruct the barrister on what they wish for them to represent to the court, rather than to seek their counsel. So, what is the basis for this nomenclatural convention? \nTITLE: ", "ideal_response": "Why does the term \"counsel\" typically refer to barristers rather than solicitors?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: An European country Slovenia (That small thing east of Italy) has a law that basically forces each household to pay additional tax each month for national radio-and-tv station. The amount is fixed to nearly 15\u20ac per month regardless of household financial position. This is not about companies or public-like places such as pubs. By law a person or a family needs to pay this monthly fee regardless of actual usage of this service. A person or family may opt out of this by ensuring that they have no way of connecting to the national radio or television. So already owning a car or even a phone with internet capabilities is out of question. Is this common/normal? Does every (or most) country have such a law? This kind of law to me seems very dangerous and scary. In a way I feel like every year a new such law could be created and suddenly after 10 laws you pay 150\u20ac per month additional tax regardless of your financial status and/or usage of services you are charged with extra. Just for more info (though I feel a bit weak with my English at this point): The law basically states that the institute for national radio and television may gather data from those paying for electricity. They are to use this list to demand payment. So basically each household needs to pay once per month. And to opt out there should be no technically possible solution to connect to radio or television at all. \nTITLE: ", "ideal_response": "Is it common that a country has a law that supports additional taxing for national media?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Assume that we have a private investment company (PIC) called \"C\". At time t, \"C\" has given a power of attorney (PoA) over one of its accounts to another entity \"L\", without power of substitution. If before time t, \"L\" had itself granted a PoA to a natural person \"P\", can \"P\" avail itself of the PoA given by \"C\" to \"L\"? \nTITLE: ", "ideal_response": "Validity of a Power of Attorney (PoA)", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: There was a recent altercation recently between media representatives and protesters at the University of Missouri. The protesters formed a human barrier around the camp, impeding free access to anyone. They also went as far as physically pushing away media representatives. The rationale, according to one Twitter account associated with the protest: \u201cWe ask for no media in the parameters so the place where people live, fellowship, and sleep can be protected from twisted insincere narratives,\u201d a Twitter account associated with the activists later declared, adding that \u201cit\u2019s typically white media who don\u2019t understand the importance of respecting black spaces.\u201d Did they have any legal grounds to block access to the protest site, which is ostensibly a public space? Video of the altercation here: https://www.youtube.com/watch?v=1S3yMzEee18 \nTITLE: ", "ideal_response": "Did the University of Missouri protesters have a legal right to privacy on their protest site?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This is a disclaimer found on the Qt website. The \"Read More\" link doesn't contain any information about what exactly \"using\" the website involves, either. What do I have to do in order for my consent to the use of cookies on this website to be confirmed? \nTITLE: ", "ideal_response": "What constitutes \"using a website\" in Canada?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Following on this question: Am I allowed to kill a person threatening me? CA, USA What if you are the target of lethal force? In this case your own life is threatened, even though you created the situation that warranted the use of lethal force against you. So what if now the hostage-taker kills the guy who tried to stop him with lethal force. And after that the hostage-taker gives up. Would he be guilty \"only\" of the crime of hostage taking or would he be guilty of murder as well? \nTITLE: ", "ideal_response": "Am I allowed to use lethal force against some one who is lawfully threatening my life?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: An administrator over the Google Apps (Gmail) account within my organization had some concerns about a user and discovered some problematic emails. The user was suspended and Google Apps access revoked. The user didn't really do anything illegal, but it was against organizational policy, etc. There is some question of legal liability that has arisen. The user who feels \"violated\" is now threatening that their account was accessed illegally and is saying he's looking into suing the administrator for getting into his email. Everything we've read seems to indicate that any kind of email server (even a service like Google Apps/Gmail) is not considered private and that you can't do anything (legal or otherwise) about someone like an administrator or account owner accessing it. Is this true? \nTITLE: ", "ideal_response": "Organizational rights to access users' emails?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: June 17, 2019 - My uncle's daughter is the petitioner in Beaumont California (father and daughter relationship). Before my uncle left the daughter's house, He asks for his Philippine passport and U.S. alien-green-card but the daughter didn't give my uncle's IDs. So my uncle left together with his brother to live permanently in Vallejo, California. How can my uncle get his passport and green card back? Update - July 06, 2019 My uncle and his brother went back to Beaumont and called a Police Officer to help out to get my uncle's personal ID's but my uncle's daughter didn't give back his ID's (Philippine passport, & Alien/green-card). Her reason to the police officer was since she was the petitioner, she is responsible for his dad (my uncle) and the only he can get it back his ID's is when he return's to the Philippines for good. Can we file in the city of Beaumont for small claims court to get his personal property back? What are the procedures to file? What forms do we need to use? My uncle is not fluent in English speaking, can we request a translator? Thank you in advance for all your help. \nTITLE: ", "ideal_response": "Is it illegal to withhold someone's passport and green card in California?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Some manufacturers are not putting their names on their products: no contact numbers nor e-mail address for support to keep in touch with the manufacturer if something happens to your bought item(s). Is there any legal requirement for the manufacturer to make themself known? \nTITLE: ", "ideal_response": "Anonymous manufacturers", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Unlike a normal GmbH, a founder/partner (Gesellschafter) of a gemeinn\u00fctzige GmbH (or a UG) may, according to the tax code, not receive any profits: Die Mitglieder oder Gesellschafter (Mitglieder im Sinne dieser Vorschriften) d\u00fcrfen keine Gewinnanteile und in ihrer Eigenschaft als Mitglieder auch keine sonstigen Zuwendungen aus Mitteln der K\u00f6rperschaft erhalten. -- \u00a7 55 Abs. 1 Nr. 1 Satz 2 AO Roughly translated: The members or partners (members within the meaning of these regulations) may not receive any portion of the profits and in their capacity as members, may not receive any other benefits from the funds of the corporation. However, people can be hired by a non-profit company and some websites indicate that Gesch\u00e4ftsf\u00fchrer (CEOs) can be paid, and even a Gesellschafter (partner) may be a CEO, for instance: W\u00fcrde sich nun ein gesch\u00e4ftsf\u00fchrender Gesellschafter ein zu hohes Gehalt auszahlen, k\u00f6nnte das als verdeckte Gewinnaussch\u00fcttung gewertet werden -- firma.de - Der Gesch\u00e4ftsf\u00fchrer einer gGmbH: Das m\u00fcssen Sie wissen If a managing partner were to receive too high a salary, this could be interpreted as hidden profit distribution. Does the tax code above not explicitly exclude a partner from receiving a salary, even as a \"CEO\"? \nTITLE: ", "ideal_response": "Can a founder of a non-profit company in Germany (gGmbH) receive a salary?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: As I understand it the EPC (Europe) requires 'inventive step' and the USPTO require claims to be 'non obvious'. An electronic measurement was designed and tested. What are the questions that one should ask when determining: How to identify the 'inventive step' of my invention? How do I determine if the invention is 'obvious' (litmus test) I have read the section below: said invention requires skills from electrical engineering, system engineering, computer science and industrial engineering. 35 U.S.C. 103 Conditions for patentability; non-obvious subject matter A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. \nTITLE: ", "ideal_response": "Patent law: What is the litmus test for 'inventive step' \\'non obvious'?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: The living room in my house is monitored by CCTV with audio. Will that be admissible in court? \nTITLE: ", "ideal_response": "Can I get around the law of not recording a phone call unless the other end knows by making the call on a speaker in my house?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: A website owner is data controller. Most website owners are using web hosting companies, i.e. data processors. Each website owner must have a GDPR compliant privacy policy. Must each website owner disclose which data processors are being used? Must each website owner list in its privacy policy which hosting provider is in use and where data is located? \nTITLE: ", "ideal_response": "Does a website owner (data controller) have to disclose its data processors in its privacy policy?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: So I have had a short term lease (Dec-August) that costs me $X 2 months ago, I was offered, and accepted (signed a new lease), to renew the lease for a full year for the same price. However I was told few days ago that the property manager has made a mistake and that they shouldn't have renewed it because the owner needs it from August till December. I demanded compensation because all rentals around me cost higher, and because good deals are now off market (university area). However, the Leasing office is not willing to pay anything, and they are arguing that this is because I shouldn't have been offered this lease in the first place, and that the reason why my lease is actually just $X is that it is a short term lease. So the fact that I am going to pay $200+X is just because $X is too low for a comparable apartment. Does their argument make any sense? is there a chance that I would lose the case if I brought it to court? \nTITLE: ", "ideal_response": "Agent breaking the lease", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This might sound insane, because it is. My 16 year old friend living in Chicago, IL got stabbed by her drug-using brother, and her mother doesn't seem to care. She just woke up in the hospital. She's in a difficult social situation and I was the first person she texted. I can't help anyhow, since I'm a Slovak, 7500km away, and not even an adult. She obviously can't return home to her insane family. She has been on the verge of suicide. I'm trying to help anyhow I can - to prevent her from doing something everyone would regret. I got a number for a child abuse centre in Illinois (1-800-252-2873), only to get a response, \"we can't help you at the moment, we'll get back to you in 2-3 weeks\". I feel like suing the US government right now, as a EU citizen. Anyways, I don't know US law but if there's anyone who does, please help us. What can we do? She will be let out of the hospital in a couple of hours and if we don't figure something out, she'll have to go homeless. Where can I send her? I suppose she'll have to go to court eventually, but meanwhile, is there any shelter for people in her situation where I won't get a response telling me to wait three weeks? Thank you so much. She is a beautiful person whom I care about and I am, we are, in desperate need of help. \nTITLE: ", "ideal_response": "My friend got stabbed by her brother and can't return home", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: It's my understanding that the role of taxation is to maintain a functional government. At this time, I suspect that the U.S. government is saving quite a bit of money by not paying some subset of its employees, etc., per the partial government shutdown. What I've heard in the news leads me to believe that many of those employees will not receive back-pay. Can I pay less taxes this year, prorated against the duration of the shutdown and the proportion of the government which was not operational during that time? \nTITLE: ", "ideal_response": "Can I pay less taxes because of the government shutdown?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I am planning to build a Facebook-related webapp. Judging from Quora: Can I use Facebook word in my domain? and the list of WIPO cases of domains containing \"facebook\" I'm not allowed to use the word \"facebook\" or \"FB\" in the domain name I'll register for this app. But when I launch this app I want to make sure that search engines like Google understand that this web app is about Facebook. To make that happen I plan to use the word \"Facebook\" in my website description and in the keywords list of my website. My question is: Am I allowed to do this or would I be infringing on the Facebook trademark? \nTITLE: ", "ideal_response": "Am I allowed to use the term \"facebook\" as a keyword in the <head> section of my website?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I called NVC and they told me that only certain types of visas are going to be affected. I have read the presidential proclamation but it is still not clear for me if everyone is going to be affected by this proclamation or not as I only see this: (a) an H-1B or H-2B visa, and any alien accompanying or following to join such alien; (b) a J visa, to the extent the alien is participating in an intern, trainee, teacher, camp counselor, au pair, or summer work travel program, and any alien accompanying or following to join such alien; and (c) an L visa, and any alien accompanying or following to join such alien. Sorry for the question, I was supposed to migrate this year. Greetings. \nTITLE: ", "ideal_response": "Are F3 visas going to be affected by the new presidential proclamation (June 22, 2020)?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: What happens to someone who\u2019s committed a murder in the UK? Assume that the person will be caught. \nTITLE: ", "ideal_response": "What happens to someone who\u2019s committed a murder in the UK?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: According to GDPR, any person has 'Right to Erasure (Recital)' and have his/hers sensitive information deleted. Does that mean that an organization can keep all non-sensitive data for a user? For an example, if there is a Database record: Phone| Option| Actived| Deactivated| ... (555)-123-4567| A| 2017-04-12| 2018-01-10| ... If all sensitive information is deleted and information is reduced, such as: Phone| Option| Actived| Deactivated| ... (XXX)-XXX-XXXX| A| 2017| 2018| ... This process of hiding users data is considered anonymization and as such should not be applicable. On the other hand, this approach will ensure that information about users who had option A activated last year will be kept without storing the users personal information. However, by matching multiple sources, someone COULD potentially still find out what this number was. Does this mean that reasonable effort was put into securing the users sensitive data or does 'Right to Erasure' mean just that, delete everything about that user? That seems like it would seriously harm a lot of organization analytic. \nTITLE: ", "ideal_response": "What does 'Right to Erasure' actually cover?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This is a pretty USB specific question, I was redirected here from the technical stack exchange. I am trying to figure out what the legislation is around the development by company A of a USB peripheral for use by hardware/software developed by company B, where company B developped a peripheral that A's device is a clone of (in term of functionality - they're not stealing the implementation/blueprints). Company B did not consent to anything. Company B is having its hardware/software check the USB descriptor of the peripheral in various ways before accepting to use it. Vendor ID, Product ID, iManufacturer... Long story short, let's say the product will only function if its USB descriptor entirely mimicks company A's product's USB descriptor. What does the law say about mimicking USB descriptors ? Is there a uniform law over countries on this subject ? To give a concrete example, let's consider third party controllers for a video game console. Company B sells consoles, and special USB controllers that use a custom protocol to communicate with their console, which looks for a given VID/PID to try to communicate with the controllers. Company A wants to manufacture third party controllers and sell them. Doing so requires using company A's product's VID/PID, or they can't work. Can they proceed ? What is there to consider ? Are all companies currently doing this breaking the law ? \nTITLE: ", "ideal_response": "Is it forbidden to sell a device that mimicks a third party's device USB descriptor?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Large swaths of land have been recovered from the sea throughout history. Modern technology has allowed for bigger projects even into open sea. For example, Palm Jumeirah sticks 5 km into the sea, and Palm Deira (which is under construction) will stick out 14 km into the Persian Gulf. I don't think these affect Dubai's territorial waters, but one can easily imagine similar projects in other countries which would have an effect on them. Singapore, for example, lies very close to both Indonesian and Malaysian territories and is doing a lot of land reclamation. With technological advances and increasing populations, need for land may become larger, and bigger projects may be undertaken in the future. It is easy to see how such projects could cause international tensions. So what happens to the territorial waters (or other relevant borders), when a country reclaims sea land that protrudes out of its coast line? Are there any rules in international law that deal with this kind of situation? \nTITLE: ", "ideal_response": "Land reclamation and territorial waters", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I have three doubts about GDPR compliance in modern systems, specifically about the \"Right to be forgotten\" section in the GDPR. What I would really like to understand is whether the right to be forgotten section in GDPR means: \"The person is forgotten about\" or \"all the person's data is erased\". Can anyone please share a reputable source that either explains how can we still comply to GDPR given when the data is erased in the application layer, it is actually not erased in reality and may still be recoverable, or a source that explains that the data is permitted to still be recoverable. I give three examples below for cases where even though the data may be erased in the application layer, it may still be recoverable in reality. Some databases have journaling, redo and undo spaces. Hence, if you delete data from the database, it might still be recoverable. Is this a problem for GDPR compliance? For disaster recovery purposes, data is often backed up, and sometimes on sequential append only media devices such as tape drives. If you need to erase data to comply to GDPR, do you still need to take care to delete the data from the backup? How is this done? In some cases you'd need to re-write the entire storage unit again to a new storage unit to accommodate for this. In SSD when you change data it may be written in a new place, and the new data might still be present on the disk. Is this a problem for GDPR compliance? Are we not allowed to use SSDs? \nTITLE: ", "ideal_response": "The scope of GDPR and the right to be forgotten", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Insurance typically follows the car and not the driver, which means other licensed drivers can use your car. However, and I'm aware this is policy dependent, insurers generally do not allow for repeated use of a car by other drivers, especially if they live in the same household without being added to the policy. However, imagine a scenario where one is the title holder and allows someone else that has been added to the policy to use a car. However, the person added to the policy lives with someone else and repeatedly allows that person to use the car against your wishes, which means the insurance company could potentially deny coverage. Can you be held liable if the person (roommate) not on the policy gets in an accident even if the person that is on the policy didn't honor your request (you are the title holder)? In such a situation, should one retake possession of the car in order to avoid liability issues? \nTITLE: ", "ideal_response": "Car insurance and liability in Michigan", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I have received an NDA so that I can become part of a software project as a developer. Most of the clauses seems reasonable except for two parts that I'm slightly worried about. Could you give an example of something that would breach each clause? I have no idea what these clauses mean, practically speaking. 1. Non-Solicitation The Recipient, its affiliates, subsidiaries and representatives will not, from the date of this agreement until day of, December 31, 2025, directly or indirectly solicit for employment or employ any person who is now employed or retained by the Information Provider or any affiliate of the Information Provider without the prior written consent of the Information Provider. 2025 seems like a long time, is this a standard practice in these kinds of agreements? It seems that a lot of the clauses of a similar nature are expressed in months, not years (nearly 10 in this case). 2. Non-Competition Other than with the express written consent of the Information Provider, which consent may not be unreasonably withheld, the Recipient will not, from the date of this Agreement until day of ____ be directly or indirectly involved with a business which is in direct competition with the business lines of the Information Provider that are the subject of this Agreement. From the date of this Agreement until day of ____ the Recipient will not divert or attempt to divert from the Information Provider any business the Information Provider had enjoyed, solicited, or attempted to solicit, from their customers, at the time the parties entered into this Agreement. It worries me to be signing a contract which doesn't have dates explicitly outlined. Is this a valid concern in this case? If it is, how should I respond in terms of possible modifications? \nTITLE: ", "ideal_response": "Examples of date interpretation in NDA", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I've been a customer of NewWave Communications as my home internet provider in Sour Lake, Texas for 10+ years. We are in a rural area and don't have many options in providers, so I'm pretty much stuck with who I have. I'm set up on auto-draft and have never missed a payment, and we have never had an interruption in service. We are happy with our internet speed, and do not need anything faster. About a year and a half ago, they tried to tell me the plan I'm on is no longer available and they wanted to charge me slightly more and move me to one of their new plans. I declined their \"faster and more reliable internet\" (because it already is fast and reliable) and they told me they would keep my service and my bill the same ($42.74/mo). They said that as long as I don't change my plan, I'm grandfathered-in at the price and service I'm getting, and that there was nothing they could do about it. Yesterday, I received a letter in the mail (from NewWave) stating, \"The internet service I subscribe to is being discontinued, and my service is being moved to a comparable service or plan.\" Same thing they tried to tell me before. That comparable plan takes me from 1000 GB Data Plan ($42.74/mo) to a lower 300GB plan ($55.00/mo) and if I go over the 300GB, they will charge me an EXTRA $10/mo for every 100GB I go over. They note that my actual monthly data usage for the Oct-Dec months has been 701GB. In what world is that a comparable plan? I called the number on the letter, and they said that NewWave Communications is being bought out by Cable ONE, and that this new company is going to charge me TWICE the amount for a \"new\" plan that I've always had because this new company says my plan doesn't exist. Look, my internet is just fine the way it is. Nobody is coming out to my house and re-wiring anything, and I'm not asking for anything more other than to leave me and my bill alone. Shouldn't this \"new\" company honor the price that I've had for years? I mean, AT&T and Verizon don't offer unlimited plans to people anymore, but the people who got on those plans before they changed what they offer weren't forced to get another plan, right? They are grandfathered-in. When Cable ONE acquires NewWave, don't they have to honor the previous agreements of the people that had them before they bought the company? I understand that this new company structures their plans differently. Fine. When they get new customers, that's what they'll have to pay, but I am NOT a new customer. I am the same customer, paying the same price (with apparently slower service than the rest of the world even though we can't tell and don't care to to know the difference), and I'm getting the same service I have always gotten. Just because Company A sold to Company B, what gives them the right to all of a sudden charge me double? I didn't ask for any of this, and as an honorable customer in good standing, I do not appreciate being treated like this. I feel like they are trying to steal from me and telling me I don't have a choice. What gives them the right to do this to me? \nTITLE: ", "ideal_response": "New Internet Company is trying to make me pay double for the same service", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: It it lawful for someone to publish on social media an audio-video clip of a private conversation without the other party's consent? Does the fact that the clip is accompanied by \"nasty\" comments make a difference? Similarly, is it lawful to post images showing the inside of a person's private bedroom and/or home? \nTITLE: ", "ideal_response": "is it illegal to publish videos or photos of someone without their consent?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: These seem to be ways of bringing actions to a court. How do they differ and compare? \nTITLE: ", "ideal_response": "What are the differences between petitions, complaints, and indictments?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In the United States, suppose a warrant was required for a seizure and the police properly obtained one. Do the law enforcement officials who execute the warrant need to present that warrant prior to seizing the evidence? Or is it okay to present the warrant after the item(s) have been seized? If it makes a difference, I'm specifically wondering in the scenario that the item is on your person. \nTITLE: ", "ideal_response": "Do police need to present warrant prior to seizure?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Federal law -- and even the regulations written up by Federal agencies -- are no less (perhaps even more) binding, than the Constitution. But they are far easier to pass and alter. Are there any standards on what can only be prohibited (or mandated) by a change in the Constitution vs. a lesser law? For example, the ban on alcohol (known as \"Prohibition\") was a Constitutional Amendment (the 18th), whereas the very similar (if not identical) in nature bans on narcotics were simply Federal (and State) laws. My own \"gut feeling\" -- without any formal legal education -- is that the Constitution proclaims something in general terms, and the laws then implement the details: banning actual particular activities, assigning punishments for different related violations, etc\u00e6tera. This explanation would make sense, but it does not work for many cases, such as, for example, the ban on narcotics... \nTITLE: ", "ideal_response": "Constitutional Amdendment vs. Federal Law", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I found this on Amazon TOS: ... You may not incorporate any portion of the Amazon Software into your own programs or compile any portion of it in combination with your own programs .. Now I'm wondering what this company is doing or Google/Bing as well. There are plenty of companies who scrape data from Amazon, eBay, Etsy or news websites, Stack Overflow, etc. and do something with it (either republish or analyse or similar). So in the case of Amazon TOS: Would indix.com be doing something legal if they put the products data into their database? How about Google? \nTITLE: ", "ideal_response": "What am I allowed to do according to Amazon TOS?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Seller said no assessments and within 6 mo I had an 8700.00 roofing assessment. I tried calling the Management company but she never returned my call . Do I have a legal recourse to go after them? My realtor is no help and would not help in looking at the HOA data. Patty \nTITLE: ", "ideal_response": "Assessments for Condo", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: European Union law is supreme over domestic legislation. So European Union regulations and directives take precedence over domestic legislation. But does this also mean the law contained within EU treaties (TEU, TFEU) is supreme over domestic legislation (eg. acts of parliament)? In the Costa judgement this appears relevant: It follows from all these observations that the law stemming from the treaty, an independent source of law, could not, because of its special and original nature, be overridden by domestic legal provisions, however framed, without being deprived of its character as community law and without the legal basis of the community itself being called into question This implies EU treaty law is supreme? \nTITLE: ", "ideal_response": "Is EU treaty law supreme over domestic legislation?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Source: p 200, Thinking Like a Lawyer: An Introduction to Legal Reasoning (2010, 2 ed) by Kenneth J. Vandevelde This \u201cbut for\u201d test has proved inadequate where two defendants jointly cause an injury that would have occurred even if only one of them had acted. For example, if two men negligently discharge firearms, both of which fire fatal shots into the victim, in a lay sense of the term both men \u201ccaused\u201d the injury. Yet by applying the \u201cbut for\u201d test, neither man can be shown to have caused it. If the first man had not fired his weapon, the victim would still have been killed by the second shot. Thus, one cannot say that but for the first man\u2019s negligence, the injury would not have occurred. The same reasoning exonerates the second man as well. [1.] The \u201cbut for\u201d test, in other words, would absolve both gunmen of liability for the shooting. Accordingly, many courts have adopted a rule that provides an alternative definition of actual causation. Under this rule, a defendant\u2019s breach of duty is considered the actual cause of an injury if it was a substantial factor in bringing about the injury, even though the breach may not have been the \u201cbut for\u201d cause. The outcome of applying this test to the shooting circumstances would be that either of the two gunmen would be considered the cause of the death. Why is 1 true? Why cannot both shooters be found liable for the shooting? \nTITLE: ", "ideal_response": "How has the \u201cbut for\u201d test proven inadequate where two defendants jointly cause an injury that would have occurred even if only one of them had acted?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Let's say a person(18yo) exchanges nudes with a 17yo (not a crime where I live) who is living in Britain where it would be a crime. Could the UK arrest and prosecute the person? \nTITLE: ", "ideal_response": "Can one be prosecuted for sexting a minor overseas?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Does following part in my contract agreement mean, if I make a mistake, and so breach the contract, company can take back all my shares on 100 USD? Do you think is it fair, proportional? Part of my compensation are stock option, but I would not work for 100 USD by mistake. Am I understand well this legal text? Remedies. Consultant agrees that a breach of the provisions of this Section 7 will cause Company irreparable injury and damage. The parties expressly agree that Company shall be entitled to injunctive and other equitable relief to prevent such a breach, and to an accounting and payment of all profits, compensation, commissions, remunerations, or benefits that Consultant directly or indirectly realizes in connection with such a breach, in addition to any other remedy to which Company might be entitled. The parties waive the posting of any bond or surety prior to the issuance of an injunction hereunder. In the event a court refuses to honor the waiver of bond hereunder, the parties expressly agree to a bond in the amount of one hundred dollars ($100.00). All remedies for such a breach shall be cumulative and the pursuit of one remedy shall not be deemed to exclude any other remedy with respect to the subject matter hereof. \nTITLE: ", "ideal_response": "Remedies: Take back shares for 100 USD?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I am making a Deku scrub figurine from The Legend of Zelda series to 3d print. I decided that I could sell these figurines on Etsy. Is there any legal requirements for the UK where I have to say something or I cannot sell this at all? Thank you from the UK, Domford \nTITLE: ", "ideal_response": "Can I sell a figurine of my own design, despite it being a game character?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In 1792, in France even men without property were enfranchised. How was it before exactly? What is meant by \u2018without property\u2019? Only those who didn\u2019t own a home or a piece of land? Or any kind of property? Could other kind of property, like owning a couple of horses, give the right to vote? \nTITLE: ", "ideal_response": "The right to vote and property ownership?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Real property seems to have peculiar legal customs. For example, if somebody has leased property like a vehicle and defaults on their lease the most expeditious way for the true owner to recover the property seems to be repossession, which is essentially physical recovery by \"reasonable\" force. But when it comes to a living space the old saying, \"Possession is nine-tenths of the law,\" seems almost literally applicable: To my knowledge, in the United States at least, there are state and local laws that make it extremely cumbersome for a property owner to evict a tenant in default (and thereby recover the use of their rightful property). Is this accurate? Or are there conditions or methods whereby a property owner can legally use private force (i.e., not the force of agents of the state) and occupation to repossess living spaces being held by \"squatters?\" Amendment: If someone trespasses on my property I can not only arrest them myself, but I can also immediately summon police and ask for their arrest and criminal prosecution. At what point does trespass become \"squatting?\" (I presume that the difference between \"squatters\" and \"non-paying tenants\" is the fact that the latter party can show that at some point they had legal rights to the residence?) \nTITLE: ", "ideal_response": "Can residences be repossessed as with other real property?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I parked on a street that has 2 parking zones: all-day (with bi-weekly street sweeping) and two-hour. Painted on the street is a single six-inch-wide six-foot-long transverse white line that, I believe, separates the zones. I parked on the \"all day\" side of this line, but received a ticket for breaching the two-hour conditions, citing \"parking within 100 feet of the two-hour zone sign\". I have previously appealed a similar ticket for parking in exactly the same place, and succeeded in obtaining a statement that this parking spot was part of the all-day zone. What is the name of the transverse white line so I can talk with intelligence? Is it legally defined anywhere? I couldn't find it in the CA MUTCD or Google. \nTITLE: ", "ideal_response": "What is the name of the transverse white line that separates different time-limited parking zones?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Is there any law in France that protects medical patients? For example, medication that excessively uses drugs as pain killers may result in bad effects. Can the family/relatives complain about it? How is the procedure? I want to know whether it's possible to refuse the use of excessive drugs, on the grounds that it doesn't heal the root of the problem but only to suffer more. \nTITLE: ", "ideal_response": "Is there any law in France that gives medical patients a right to refuse treatment?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Elon Musk made an offer to Twitter to buy the company. Twitter's board accepted. Apparently Twitter had some obligations in the fine print to disclose certain aspects of their userbase (there is a question about how many accounts are fake, aka robots or \"bots\"). Mr. Musk asserts Twitter won't provide the data and has walked away from the deal. Twitter is suing him to enforce the agreement. Elon Musk responded with a tweet of this image What caught my eye is the last part. Mr. Musk is asserting that the data he was seeking will now come out as part of the discovery phase. Is there anything Twitter can do to avoid disclosing the data, or would it be considered too material to the case to avoid? \nTITLE: ", "ideal_response": "Will Twitter's lawsuit against Elon Musk allow him to force disclosure of the bot data he was seeking from Twitter?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: When can physical force be used outside of personal self defense? If you see someone repeatedly kicking your car can you chase him away? Say someone is at your door step and pushes past you inside, can you use force on them? What if you're carrying your laptop and some nut keeps following you and kicking it? On another note, if you see someone getting attack, is it legal to intervene by using force e.g. tackling or hitting the attacker? \nTITLE: ", "ideal_response": "Can physical force be used to protect your property?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: One of those things that irks me about the FCC regulations is that manufacturers are responsible for preventing, and can apparently be held liable for, end-user modifications to wireless radio devices under 47 CFR Part 15, Subpart C: 47 CFR \u00a7 15.203: An intentional radiator shall be designed to ensure that no antenna other than that furnished by the responsible party shall be used with the device. The use of a permanently attached antenna or of an antenna that uses a unique coupling to the intentional radiator shall be considered sufficient to comply with the provisions of this section. The manufacturer may design the unit so that a broken antenna can be replaced by the user, but the use of a standard antenna jack or electrical connector is prohibited. [...] 47 CFR \u00a7 15.212(a)(2)(iv): Manufacturers must ensure that only transmitter control elements and radio front end components that have been approved together are capable of operating together. The transmitter module must not operate unless it has verified that the installed transmitter control elements and radio front end have been authorized together. Manufacturers may use means including, but not limited to, coding in hardware and electronic signatures in software to meet these requirements, and must describe the methods in their application for equipment authorization. As a result, manufacturers are forced to actively design wireless radio devices to prevent unauthorized end-user modifications: The RP-TNC connector used for Wi-Fi antennas was originally meant to be a nonstandard connector to prevent consumers from replacing antennas with those not approved by the manufacturer. The connector ultimately gained widespread use; however, the FCC has not taken action against device manufacturers. Major laptop manufacturers, including Dell, HP, and Lenovo, restrict the specific Wi-Fi cards which may be installed into a particular system through the use of \"whitelists\" in system firmware that specify which card models are allowed. If a Wi-Fi card that isn't on the whitelist is installed, the system will not boot. Why hold the manufacturer responsible for preventing end-users from making modifications? What is the philosophy or historical basis for these rules? (To prevent this question from being too opinion-based, I'd like to see answers that cite relevant documentation such as rulemaking reports.) \nTITLE: ", "ideal_response": "Why is the manufacturer of a wireless radio device responsible for preventing end-user modifications?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Since COVID-19 has senior centers shut down, I'm thinking of holding a virtual game night at these centers. I'd like to allow the seniors to play games like Risk or Monopoly against each other on laptops. Board game manufacturers often don't have very good digital versions of their games, so I was thinking of using Tabletop Simulator to recreate the game virtually. However, a virtual version of the game made with copyrighted artwork downloaded from the internet isn't legal. I know you can make a digital copy of a book or CD you own. If I owned a board game and used digital copies of the game's artwork I made to create a virtual version of the game, would this be legal? The virtual game would only be used for this project, and I wouldn't distribute the artwork to anyone else. Thank you! \nTITLE: ", "ideal_response": "Is It Legal to Create A Virtual Copy of a Physical Board Game If I Own A Copy of the Physical Game?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Is it legal to have an \"anti-exclusive\" contract? By anti-exclusive, I mean a contract that says, for example: \"This software can be used on any platform EXCEPT for X\" For a real life example, would it be legal to create software and say it is available for any platform EXCEPT for, say, a specific Linux distribution? In a broader sense, can somebody sell a product/service to everybody except for a particular person/group of people? I'm thinking this would be illegal due to discrimination laws, so wouldn't this be the case for the above example too? \nTITLE: ", "ideal_response": "Are \"anti-exclusive\" contracts legal?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Suppose I create my own AI based program which is trained by input containing a set of similar images and then it spits out a cartoonified SVG representation of the thing in the input images. For example: I feed the program with several random pictures of Eiffel tower from the internet, without caring if they were copyrighted or not. Some may be copyrighted, some may not. As a result I get the SVG image, say something like this: link (to Wikimedia's SVG of Eiffel tower). Is this copyright infringement? In this case it is obvious that the final product represents the Eiffel tower, which is public architecture and as far as I know it should be okay, but it really isn't at all obvious whether I used some copyrighted picture in the generation process or not. If I limited myself to generating only SVG's representing objects everyone could take a picture of in public (or drawn it for themself like: a tree, a lake, skyline of a city, public building, public monument etc.), how likely am I to end up being charged with copyright infringement? \nTITLE: ", "ideal_response": "Is AI generated picture violating copyright if it's fed copyrighted material?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Let's say you are laid off with a month of severance. In the second week of the severance period, you take up a temp job hanging drywall or something like that, that pays you 1099 with no benefits, and do it that week only. This job is not proper employment with all the strings attached, just freelancing. Are you obligated to report that income when filing for unemployment towards the end of the severance period, when you become eligible for it once the severance runs out? I understand if it were during unemployment, you would have to report it and your weekly claim would be adjusted. You also have to report severance so the unemployment agency can calculate your eligibility date. But this is 1099 income earned during the severance period. \nTITLE: ", "ideal_response": "1099 income during severance period before unemployment", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This is specifically in regards to parking sign law in Mesa, AZ. The legal wording is The owner or the agent of the owner of any private parking area shall be deemed to have given consent to unrestricted parking by the general public in such parking area unless such parking area is posted with signs as prescribed by this Section which are clearly visible and readable from any point within the parking area and at each entrance thereto. I could not find a clarification of what \"at the entrance\" means. I can see two interpretations: Physically situated at the entrance. Clearly visible and readable from the entrance. \nTITLE: ", "ideal_response": "What is the legal definition of at the entrance (parking sign law)", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Consider the following scenario: \"Aisha\" is the registered keeper of a car she bought on credit. \"Benin\" has had control of the car for the last 2 years following his divorce from Aisha. A few months ago Aisha, as the registered keeper, received reminders that the MOT1 was due to expire which she forwarded to Benin. Benin said that he tried to get an MOT test but the garage cancelled the appointments. Despite the MOT expiring, Benin is still driving the car on the road. If Benin is stopped by the police, what are the legal implications for him and Aisha? And what options are available to Aisha to protect herself, e.g. forcing a sale of the car? 1In the UK, an MOT is a legally required annual vehicle safety test of road worthiness. Without one you are legally only allowed to drive to one place: An MOT garage to obtain one. \nTITLE: ", "ideal_response": "Driving a car without an MOT", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Is manslaughter a potential legal consequence of treating a patient without obtaining a valid consent? i.e. A surgeon does not inform all the risks involved in the surgery to a patient, causing a death. Because it is an unlawful killing and does not involve malice aforethought, I think it is manslaughter. Am I correct? Or is it just negligence? \nTITLE: ", "ideal_response": "Manslaughter or Negligence -does not inform all risks in surgery which results in death of a patient", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: A friend of mine described this situation to me recently: The scene is a hotel lobby in the middle of the night. Three people, two men and a woman, enter. The woman appears to be drunk and unconscious, and is being supported by one of the men. The other man smells of alcohol, but is otherwise sober. They try to get a room. The receptionist, aware that the hotel is situated in the vicinity of a red-light district, is suspicious and lies that there's no room available. The trio say okay, they'll try another hotel, and leave. The receptionist then calls the police. Can the police act? On the one hand, the circumstances can be interpreted as a rape has happened or is about to happen, e.g. a date rape drug that knocks the woman out has been administered; on the other hand the receptionist can't be sure of that and the three could e.g. be blood relatives or even good Samaritans. If this does meet the threshold for \"reasonable suspicion\", does it further meet the threshold for \"probable cause\"? If this does not meet the threshold for \"reasonable suspicion\", is it worth calling the police in this situation? If it matters, the country is Singapore. \nTITLE: ", "ideal_response": "Does this situation qualify as \"reasonable suspicion\"?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I took an SVG image from a set which is distributed under GPL-3.0 and used it as (a large) part of an icon I made to a software of mine, which is also distributed under GPL-3.0. So far, everything is fine, I suppose. I am, however, intending to distribute this icon I made (together with my whole software) as a PNG image, for it is smaller and I'm not used to Create SVG images. Is redistributing this altered version of the SVG image as a PNG an infringement of the following GPL-3.0's excerpt: if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.` given that an SVG, like a script, comes with the \"source code\", as it IS the very source code itself, while a PNG \"hides\" this, being binary non-readable data? \nTITLE: ", "ideal_response": "GPL-3.0's \"Open-Sourceness\", Images and Image Formats", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I have written an android app The user does some HTTP requests do an endpoint and gets data, also the user can download images to his own storage. But I don't collect any data from the user? Is is mandatory for the dev to include the GDPR although he doesn't collect data? If so, what? \nTITLE: ", "ideal_response": "Do I have to include a GDPR in my app even tho I don't collect data from users?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I would like to use sombodys content from a youtube channel which has a copyright. The general advise is to \"ask for permission\" to do so. I wonder now what the implications are if the person says \"yes\". Is this now like a legal contract depending on the juristication? Can the permissions later be revoked? Since there are many details that are not clear like on which platforms I gonna publish, what kind of content I do, etc etc, what does this all actually mean what I can and cannot do? \nTITLE: ", "ideal_response": "Implications of asking somebody to use their copyrighted content", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Is it legal to use a scanner or camera to digitize a copyrighted book if it's only for personal use? I own the book, and I have no plans to sell it or give it away. If so, once the copyright expires, can I share my copy, or do I have to scan it again? \nTITLE: ", "ideal_response": "Is it legal to scan a copyrighted book for personal use?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: A woman is thought to have been murdered (corrected this first sentence from a previous comment). A man is arrested for it. He is offered a plea bargain deal that will get him 15 years or he can take his chances in court but would get life if found guilty. All they have is circumstantial evidence but the arrested man is convinced by his lawyer to plea bargain due to the overwhelming circumstantial evidence against him. (side-note: For this scenario, the man did NOT actually murder this woman as we find out later in this story but he just didn't want to take the chance of getting sentenced to life in prison). Man serves his 15 years and is released...no probation, he served his entire sentence. Knowing he did not murder this woman he looks for her after he is released and finds her. Ends up she wanted to disappear and start a new life and set this man up to be framed for her murder. He is obviously mad at this woman and after he finds her, he kills her for revenge of sending him to prison and losing 15 years of his life for something he didn't do. He is arrested again and admits to the police to killing her but does not worry about being charged for murder because he has already been convicted of her murder and served his sentence. Can he be charged, convicted and sent to prison again? \nTITLE: ", "ideal_response": "Double Jeopardy?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: So I forgot to turn my headlights on one night and got pulled over. He gave me a warning for the headlights but I didn't have my proof of insurance so I got a traffic citation for that and I went to court on that date that was on it and I'm 16 so the judge said I need to go to juvenile court for it. So he said the court would send me a date to go to that in about 1-3 weeks well that was 4 months ago and I still haven't got anything. So what do I do? Also if it helps I live in Savannah Georgia \nTITLE: ", "ideal_response": "Traffic citation", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: According to the article by the Electronic Frontier Foundation (EFF) \"Section 230 is Good, Actually\" Section 230 says that any site that hosts the content of other \u201cspeakers\u201d\u2014from writing, to videos, to pictures, to code that others write or upload\u2014is not liable for that content, except for some important exceptions for violations of federal criminal law and intellectual property claims. They go on to say, Basically, Section 230 means that if you break the law online, you should be the only one held responsible, not the website, app, or forum where you said the unlawful thing. Just to be clear, that isn't strictly correct, right? A quick wikipedia search shows that in Cubby, Inc. v. CompuServe Inc. As a distributor, CompuServe could only be held liable for defamation if it knew, or had reason to know, of the defamatory nature of the content. As CompuServe had made no effort to review the large volume of content on its forums, it could not be held liable for the defamatory content. In one source online, I even saw Section 230 described as Congress enacted \u00a7 230 to remove the disincentives to self-regulation So which is it? Does the removal Section 230 mean a site will be held liable for the content created by others? Or, does it mean that a site will only be held liable if they engage in optional moderation or acts of \"publishing\" (such as filtering and removal of offensive material). It seems like it's still the creator that is liable. And you can still have a free and open moderation-free internet in a post-Sec 230 world? Handling spam Update: On spam, the same case above Cubby, Inc. v. CompuServe Inc., 776 F. Supp. 135 (S.D.N.Y. 1991) states that, CompuServe's ultimate right under the contract to remove text from its system for noncompliance with its standards merely constitutes control over the result of CCI's independent work. This level of control over the Journalism Forum is insufficient to rise to the level of an agency relationship. With this text, I don't think it's sufficient to say that in a post-Section 230 world removal of content would be defamation per se, it seems it has satisfy some standard. \nTITLE: ", "ideal_response": "If Section 230 is repealed, are aggregators merely forced into a role of distributors rather than indemnified publishers?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: If other tenants in a rental home (roommates) are breaking the law and the landlord has been informed but refuses to do anything about it, is the landlord breaking the law or does a tenant (who's not involved in the illegal activity) have any special power? In particular if you move into a house and find the other tenants do a lot of drugs (and have photos/videos proving it) but most of the drugs are \"light\" such as marijuana. However they trash the house and make loud noise at 4am, house is filled with smoke, always intoxicated etc. If relevant the tenants signed a lease saying they wouldn't do anything illegal. \nTITLE: ", "ideal_response": "Landlord refuses to do anything about tenants who do a lot of drugs", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I've heard people use the terms libel and slander interchangeably, but I know that they are different in the eyes of the law. What are the defining characteristics of each? \nTITLE: ", "ideal_response": "The difference between libel and slander", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: A company that's using the contracting firm said they'd pay me $X per hour, and the same amount for hours over 40 in a given week. Apparently there's an exemption from the 1.5x pay for certain engineering positions, which I was told this was. I was (and am) happy with the pay we agreed on. So the contracting firm sent me a copy of the standard contract to read and sign. But the contract said I was to be paid 1.5x the amount of my normal rate for overtime. Before signing, I sent them a message saying, \"I think this is a mistake.\" And they came back and said, \"You're right, it's supposed to be 1x for overtime.\" They sent me another copy, but it had the same mistake in it. At that point, I signed it since we were on the same page regarding the terms even though it was still wrong. My thinking was that it was just a piece of (digital) paper that was wrong, and in my favor anyway. The actual contract (the intangible agreement between the two parties) was correct and mutually agreed upon. My question is, with the contract as it stands, is there any risk that I am taking by not fixing it? I'm getting paid the amount I expect (1x for overtime) and have no intentions to pull more out of them just because the written contract is wrong. I don't want to be in a bad situation, but I don't want to go through the hassle of changing something that doesn't matter either. Note: I live/work in New York state \nTITLE: ", "ideal_response": "There's a (potentially large) mistake in my contract, is there a need to address it?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Suppose, for example, that the Senate is considering the passage of a law. A Senator has decided to open a filibuster. After a half hour, the majority leader tires of hearing this Senator speak. He motions for an end to the filibuster, but is only able to get 51% of the vote. According to Senate rules, I believe, you need a 2/3 vote to end a filibuster. Over said Senator's objections, the majority leader begins the voting process and the bill gets 51% yeas. \nTITLE: ", "ideal_response": "Are US Senate/House rules legally enforcible?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In World War II, Executive Order 9066 suspended basic rights of Japanese Americans - even people whose loyalty to the United States was never questioned, though the government feared the group as a whole might contain some spies - and led to a lot of people (>100K) being rounded up and kept at internment camps. Korematsu v. United States ruled this was constitutional. Today, in the \"War on Terror,\" at least one politician can make surprisingly strong statements about plans for negative treatment against large groups of people (e.g. Muslims) because of concerns that a small minority of them might be radical enough to kill a bunch of people, and [the politician can] remain surprisingly popular. What, if any, modern-day obstacles are in place that would nullify the same or materially same executive order? This link indicates: \"The Court would then have to adhere to, reconcile with, or overrule Korematsu & Hirabayashi.\" However, it does not state that there this is the only obstacle and hence the question \nTITLE: ", "ideal_response": "Executive Order 9066", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: This arises out of Can state lawmakers who participated in the Capitol riot be removed through the 14th Amendment? The 14th Amendment, section 3, says: No person shall be a Senator or Representative in Congress, or elector of President and Vice President, or hold any office, civil or military, under the United States, or under any State, who, having previously taken an oath, as a member of Congress, or as an officer of the United States, or as a member of any State legislature, or as an executive or judicial officer of any State, to support the Constitution of the United States, shall have engaged in insurrection or rebellion against the same, or given aid or comfort to the enemies thereof. But Congress may, by a vote of two-thirds of each House, remove such disability. Is a separate law needed to prohibit people to whom Section 3 applies from taking office, or to remove such people from office? What proceedings would be needed to establish that a person had \"engaged in insurrection or rebellion\"? Can a Congressional resolution or a Federal statute declare that specific people have so engaged? Is a criminal conviction required? Can a private person take steps to enforce this section? \nTITLE: ", "ideal_response": "How can section 3 of the 14th Amendment to the US constitution be enforced?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: I understand the law about private communications in person and over the phone not allowing recording. However, in cases where a consumer records a conversation between him and a company offering them service (e.g. Comcast Cable), and before the human conversation begins there is a recorded notification that the call will be recorded for quality assurance, doesn't that count as consent? If anything, it proves intention for the business to record the call! Thus, in a two-consent state such as California, if I were to record my call to Comcast, and that notice was heard over the phone, both parties (myself and the employee speaking with me) have consented by taking the call. Would this be legal then? \nTITLE: ", "ideal_response": "Recording a phone conversation with customer support/sales", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Now of days it's quite easy to have a computer in a completely different part of the world from where I'm located that I'm working on. This makes me wonder how 'criminality' is defined if the laws of my home country differ from the country the computer is in. I'm wondering at what point the location of a server/computer decide the laws that apply to it and at what point it's the location of the user that matters. To give some examples, not so much expecting answers to all of the below so much as explaining my confusion of distinction in case it informs an answer. For example, lets say I choose to view material that is not legal for me to view in my home state (copyright material I haven't payed for, some type of outlawed pornography, some sort of state secret I'm not suppose to see etc), but which is legal to view in the country in which some server I connect to is located. I'm pretty sure I've committed a crime, even if the computer providing the data is illegal, but is this only because the data was copied to my computer? If I use some VNC technology that shows me the screen of another computer located outside of the US to view this data without any of it stored or cached on my computer, am I still committing a crime (I think so?) On the other hand I see servers located outside of the US doing things that would be illegal in the US, online gambling, 'spamming' people, selling things without paying US taxes etc. Can I connect to a server for any reason and still be legal? If amazon want's to have a server in some country X and pay that countries taxes instead of US taxes can someone from the US still connect to that server to do anything at all with it without suddenly being guilty of violating a law because that server isn't abiding by US law? If it's the location of the user that matters then does that mean I can use a computer outside the US to do something that is illegal in that country? If I'm in the US and using a cloud computer hosted in say the UK and for some reason I end up doing something with that computer that is 100% legal in the US but isn't legal in UK (maybe some data is allowed to view here that isn't there?) have I broken the law in UK? If I happen to travel to UK at some later date could a zealous prosecute arrest me because of the location of my computer? (I realize this last question is probably comes down to \"it depends on the country\", but generally speaking for countries that practice common law is there a consensus that most countries agree on?) \nTITLE: ", "ideal_response": "Does legality of an computerized action depend on user location or computer location?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: Person was named \"A\" prior to year 2000. The name was legally changed from \"A\" to \"B\" in 2000. When creating a legal document, is there a standard of how that person will be called when specifically describing past events before 2000? For example, In the summer of 1998, A (or B?) did XYZ. assuming that the lawyer composing the document doesn't do the smart thing and say \"Person 'B', formerly known as 'A' before 2000, and hereafter called \"Person\" in this document\" which avoids the issue in the first place. \nTITLE: ", "ideal_response": "Do legal documents use the person's current or former legal name when referring to the past?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: In the case where an Illinois resident holds two full-time jobs (e.g.remote jobs) without the knowledge of either company and Company A becomes aware of this. Surely, Company A will rightfully terminate the employee. Can Company A legally contact Company B to inform them that the employee has been working at both companies, resulting in dual termination? Any laws that would prevent a company from doing this? Does it changed anything if the companies are headquartered in a different state? \nTITLE: ", "ideal_response": "In Illinois, can a company contact their employee's other company?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: \"Like Facebook, only better\" - Can I legally use this in advertising for my website. Is it possible to do so without getting sued? \nTITLE: ", "ideal_response": "\"Like Facebook, only better\" - Can I legally use this in marketing?", "category": "summarization"}
+{"prompt": "Generate a concise, engaging title for the following legal question on an internet forum. The title should be legally relevant, capture key aspects of the issue, and entice readers to learn more. \nQUESTION: There was an incident where forbes.com was delivering viruses through Javascript exploits found in their ads. This happened just by clicking a link (before you can \"agree\" to anything). Are they liable under anti-hacking regulations? What about ad networks? \nTITLE: ", "ideal_response": "Is a \"drive-by-download\" site liable for their content", "category": "summarization"}
diff --git a/nemo/Evaluator/eval_dataset/tool_call_data/aiva_tool_call.jsonl b/nemo/Evaluator/eval_dataset/tool_call_data/aiva_tool_call.jsonl
new file mode 100644
index 000000000..1b82e7909
--- /dev/null
+++ b/nemo/Evaluator/eval_dataset/tool_call_data/aiva_tool_call.jsonl
@@ -0,0 +1,10 @@
+{"messages": [{"role": "user", "content": "Where can I find some comparison information between RTX 5080 and RTX 5070i?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "Comparison between RTX 5080 and RTX 5070i?"}}}]}
+{"messages": [{"role": "user", "content": "Why are the new Jetson Nano so much more expensive than the previous ones?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "Why are the new Jetson Nano so much more expensive than the previous ones?"}}}]}
+{"messages": [{"role": "user", "content": "Can you compare Jetson Nano and Raspberry Pi which is a lot more cheaper?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "Compare Jetson Nano and Raspberry Pi in terms of price and performance."}}}]}
+{"messages": [{"role": "user", "content": "When will I get refunded for my returned Jetson Nano?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToReturnProcessing", "arguments": {"query": "refund for returned Jetson Nano", "user_id": "4165"}}}]}
+{"messages": [{"role": "user", "content": "My shield remote return is still pending. when will I recieve an update?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToReturnProcessing", "arguments": {"query": "Shield remote return pending update", "user_id": "4165"}}}]}
+{"messages": [{"role": "user", "content": "I remember last year there were some Lululemon clothes in the gear store. will those be restocked?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "Will Lululemon clothes be restocked in the NVIDIA Gear Store?"}}}]}
+{"messages": [{"role": "user", "content": "what tshirts did I over the past year?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "tshirts purchased over the past year by user 4165"}}}]}
+{"messages": [{"role": "user", "content": "what's the difference between the two rulers available?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "What is the difference between the two rulers available?"}}}]}
+{"messages": [{"role": "user", "content": "Are there any purple women's tshirts?"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToProductQAAssistant", "arguments": {"query": "Are there any purple women's t-shirts?"}}}]}
+{"messages": [{"role": "user", "content": "What is the order status for NVIDIA GEFORCE RTX 4070 SUPER"}], "tools": [{"type": "function", "function": {"name": "HandleOtherTalk", "description": "Handles greetings and other absurd queries by offering polite redirection and clearly explaining the limitations of the chatbot.", "parameters": {"properties": {"message": {"type": "string", "description": "The message parameter"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToProductQAAssistant", "description": "Transfers work to a specialized assistant to handle Product QA. \nAnswers generic queries about products, including their descriptions, specifications, warranties, usage instructions, and troubleshooting issues.\nCan also address queries based on product manuals, product catalogs, FAQs, policy documents, and general product-related inquiries.\nCan also answer queries about the NVIDIA Gear Store's product offerings, policies, order management, shipping information, payment methods, returns, and customer service contacts.", "parameters": {"properties": {"query": {"description": "The question or issue related to the product. This can involve asking about product specifications, usage guidelines, troubleshooting, warranty details, or other product-related concerns.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToOrderStatusAssistant", "description": "Delegates queries specifically related to orders or purchase history to a specialized assistant.\nThis assistant handles inquiries regarding Order ID, Order Date, Quantity, Order Amount, Order Status, \nand any other questions related to the user's purchase history.", "parameters": {"properties": {"query": {"description": "The specific query regarding the order or purchase history, such as order status, delivery updates, or historical purchase information.", "type": "string"}, "user_id": {"description": "The unique identifier of the user.", "type": "string"}}, "type": "object"}}}, {"type": "function", "function": {"name": "ToReturnProcessing", "description": "Transfers work to a specialized assistant which handles processing of a product return request.\nThis assistant handles inquiries regarding return transactions, including return status, relevant dates, \nreasons for return, notes, and any other questions related to return processing.", "parameters": {"properties": {"query": {"description": "The specific return-related query, such as the status of the return, refund details, or return policy.", "type": "string"}, "user_id": {"description": "The unique identifier of the user requesting the return.", "type": "string"}}, "type": "object"}}}], "tool_calls": [{"function": {"name": "ToOrderStatusAssistant", "arguments": {"query": "What is the order status for NVIDIA GEFORCE RTX 4070 SUPER", "user_id": "4165"}}}]}
diff --git a/nemo/Evaluator/helpers.py b/nemo/Evaluator/helpers.py
new file mode 100644
index 000000000..26501798c
--- /dev/null
+++ b/nemo/Evaluator/helpers.py
@@ -0,0 +1,33 @@
+import requests
+from huggingface_hub import HfApi
+
+
+def create_namespaces(entity_host, ds_host, namespace):
+    # Create namespace in entity store
+    entity_store_url = f"{entity_host}/v1/namespaces"
+    resp = requests.post(entity_store_url, json={"id": namespace})
+    assert resp.status_code in (200, 201, 409, 422), \
+        f"Unexpected response from Entity Store during Namespace creation: {resp.status_code}"
+
+    # Create namespace in datastore
+    nds_url = f"{ds_host}/v1/datastore/namespaces"
+    resp = requests.post(nds_url, data={"namespace": namespace})
+    assert resp.status_code in (200, 201, 409, 422), \
+        f"Unexpected response from datastore during Namespace creation: {resp.status_code}"
+    
+
+def setup_dataset_repo(hf_api, namespace, dataset_name, entity_host):
+    repo_id = f"{namespace}/{dataset_name}"
+    # Create the repo in datastore
+    hf_api.create_repo(repo_id, repo_type="dataset", exist_ok=True)
+    # Register dataset in entity store
+    entity_store_url = f"{entity_host}/v1/datasets"
+    payload = {
+        "name": dataset_name,
+        "namespace": namespace,
+        "files_url": f"hf://datasets/{repo_id}",
+    }
+    resp = requests.post(entity_store_url, json=payload)
+    assert resp.status_code in (200, 201, 409, 422), \
+        f"Unexpected response from Entity Store creating dataset: {resp.status_code}"
+    return repo_id