fixed grammatical mistakes

giriraj-singh-couchbase · giriraj-singh-couchbase · commit 69fe694e129b · 2025-09-24T23:28:16.000+05:30
diff --git a/autovec-tutorial/autovec_langchain.ipynb b/autovec-tutorial/autovec_langchain.ipynb
@@ -27,11 +27,10 @@
     "jp-MarkdownHeadingCollapsed": true
    },
    "source": [
-    "\n",
-    "# 1. Create and Deploy Your Free Tier Operational cluster on Capella,\n",
+    "# 1. Create and Deploy Your Free Tier Operational Cluster on Capella\n",
     " To get started with Couchbase Capella, create an account and use it to deploy a cluster. To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).\n",
-    " ### Couchbase Capella Configuration,\n",
-    " When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.\n",
+    " ### Couchbase Capella Configuration\n",
+    " When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met:\n",
     "   * Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.\n",
     "   * [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running."
    ]
@@ -41,12 +40,11 @@
    "id": "4369c925-adbc-4c7d-9ea6-04ff020cb1a6",
    "metadata": {},
    "source": [
-    "\n",
     "# 2. Data Upload and Preparation\n",
     "\n",
-    "There are various techniques which exists to insert the data in the cluster, to read about the techniques please follow the [sample-data import](https://docs.couchbase.com/cloud/clusters/data-service/import-data-documents.html#import-sample-data) guide.\n",
+    "There are various techniques that exist to insert data into the cluster. To read about the techniques, please follow the [sample-data import](https://docs.couchbase.com/cloud/clusters/data-service/import-data-documents.html#import-sample-data) guide.\n",
     "\n",
-    "After data upload is comlete, follow the next steps to achieve vectorization for your required fields.\n"
+    "After data upload is complete, follow the next steps to achieve vectorization for your required fields."
    ]
   },
   {
@@ -55,27 +53,27 @@
    "metadata": {},
    "source": [
     "# 3. Deploying the Model\n",
-    "Now, before we actually create embedding for the documents we need to deploy a model which will create the embedding for us.\n",
-    "## 3.1: Selecting the model \n",
-    "1. To select the model, you first need to navigate to the \"<B>AI Services</B>\" tab, then selecting \"<B>Models</B>\" and clicking on \"<B>Deploy New Model</B>\"\n",
+    "Now, before we actually create embeddings for the documents, we need to deploy a model that will create the embeddings for us.\n",
+    "## 3.1: Selecting the Model \n",
+    "1. To select the model, you first need to navigate to the \"<B>AI Services</B>\" tab, then select \"<B>Models</B>\" and click on \"<B>Deploy New Model</B>\".\n",
     "   \n",
     "   <img src=\"./img/importing_model.png\" width=\"950px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "3. Enter the <B>model name</B>, and choose the model that you want to deploy. After Selecting your model, choose the <B>model infrastructure</B> and <B>region</B> where the model will be deployed.\n",
+    "2. Enter the <B>model name</B>, and choose the model that you want to deploy. After selecting your model, choose the <B>model infrastructure</B> and <B>region</B> where the model will be deployed.\n",
     "   \n",
     "   <img src=\"./img/deploying_model.png\" width=\"800px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "## 3.2 Access control to the model\n",
+    "## 3.2 Access Control to the Model\n",
     "\n",
-    "1. After deploying the model, go to the \"<B>Models</B>\" tab in the <B>AI-services</B> and click on \"<B>setup access</B>\".\n",
+    "1. After deploying the model, go to the \"<B>Models</B>\" tab in the <B>AI Services</B> and click on \"<B>Setup Access</B>\".\n",
     "\n",
     "    <img src=\"./img/model_setup_access.png\" width=\"1100px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "3. Enter your <B>api_key_name</B>, <B>expiration time</B> and the <B>IP-address</B> from which you will be accessing the model.\n",
+    "2. Enter your <B>API key name</B>, <B>expiration time</B> and the <B>IP address</B> from which you will be accessing the model.\n",
     "\n",
     "    <img src=\"./img/model_api_key_form.png\" width=\"1100px\" height=\"600px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "4. Download your API key\n",
+    "3. Download your API key\n",
     "\n",
     "   <img src=\"./img/download_api_key_details.png\" width=\"1200px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">"
    ]
@@ -87,42 +85,42 @@
    "source": [
     "# 4. Deploying AutoVectorization Workflow\n",
     "\n",
-    "Now, we are in the step which will help us create the embeddings/vectors. To proceed with the process of craetion of vectorization please follow the steps below:\n",
+    "Now, we are at the step that will help us create the embeddings/vectors. To proceed with the vectorization process, please follow the steps below:\n",
     "\n",
-    "1. For deploying the autovectorization, you need to go to the <B>`ai-services`</B> tab, then click on the <B>`workflows`</B>, and then click on <B>`Create New Workflow`</B>.\n",
+    "1. For deploying the autovectorization, you need to go to the <B>`AI Services`</B> tab, then click on <B>`Workflows`</B>, and then click on <B>`Create New Workflow`</B>.\n",
     "\n",
     "   <img src=\"./img/workflow.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "   \n",
-    "2. Start your workflow deployment by giving it a name, and selecting from where your data will be provided to the auto-vectorization service. There are currently 3 options, <B>`pre-processed data(JSON format) from capella`</B>, <B>`pre-processed data(JSON format) from external sources(S3 buckets)`</B> and <B>`unstructured data from external sources (S3 buckets)`</B>. For this tutorial we will be choosing first option which is pre-processed data from capella.\n",
+    "2. Start your workflow deployment by giving it a name and selecting where your data will be provided to the auto-vectorization service. There are currently 3 options: <B>`pre-processed data (JSON format) from Capella`</B>, <B>`pre-processed data (JSON format) from external sources (S3 buckets)`</B> and <B>`unstructured data from external sources (S3 buckets)`</B>. For this tutorial, we will choose the first option, which is pre-processed data from Capella.\n",
     "\n",
     "   <img src=\"./img/start_workflow.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "3. Now, select the <B>`cluster`</B>, <B>`bucket`</B>, <B>`scope`</B> and <B>`collection`</B> from which you want to select the documents and get the data vectorized.\n",
     "\n",
     "   <img src=\"./img/vector_data_source.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "4. <B>Field Mapping</B> will be used to tell the AutoVectorize service that which data will be converted to embeddings.\n",
+    "4. <B>Field Mapping</B> will be used to tell the AutoVectorize service which data will be converted to embeddings.\n",
     "\n",
-    "   There are two options:-\n",
+    "   There are two options:\n",
     "\n",
     "   - <B>All source fields</B> - This feature will convert all your fields inside the document to a single vector field.\n",
     "   \n",
     "     <img src=\"./img/vector_all_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "\n",
-    "   - <B>Custom source fields</B> - This feature will convert specific fields which are chosen by the user to a single vector field, in the image below we have chosen <B>`address`</B>, <B>`description`</B> and <B>`id`</B> as the fields to be converted to a vector having the name as <B>`vec_addr_decr_id_mapping`</B>.\n",
+    "   - <B>Custom source fields</B> - This feature will convert specific fields chosen by the user to a single vector field. In the image below, we have chosen <B>`address`</B>, <B>`description`</B> and <B>`id`</B> as the fields to be converted to a vector with the name <B>`vec_addr_decr_id_mapping`</B>.\n",
     "  \n",
     "       <img src=\"./img/vector_custom_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "  \n",
-    "5. After choosing your type of mapping, you will be required to either have an index on the new vector_embedding field or you can skip the creation of vector index which is not recommended as you will be losing out the functionality of vector searching.\n",
+    "5. After choosing your type of mapping, you will be required to either create an index on the new vector_embedding field or you can skip the creation of a vector index, which is not recommended as you will lose the functionality of vector searching.\n",
     "\n",
     "   <img src=\"./img/vector_index.png\" width=\"1200px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
-    "6. All the steps mentioned above.\n",
+    "6. Review all the steps mentioned above, below screenshot highlights the whole process of deploying AutoVectorization workflow.\n",
     "\n",
     "   <img src=\"./img/vector_index_page.png\" width=\"1200px\" height=\"1200px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "   \n",
-    "- After this step your vector-embedding for the selected fields must be ready and you can check it out in the capella UI. Now, in the next step we will demonsterate how we can use the generated vectors to do the vector search."
+    "After this step, your vector embeddings for the selected fields should be ready, and you can check them out in the Capella UI. In the next step, we will demonstrate how we can use the generated vectors to perform vector search."
    ]
   },
   {
@@ -169,11 +167,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "endpoint = \"couchbases://cb.xyz.com\" # Replace this with Connection String\n",
+    "endpoint = \"couchbases://cb.xyz.com\"  # Replace this with Connection String\n",
     "username = \"YOUR_USERNAME\"  # Replace this with your username\n",
     "password = \"YOUR_PASSWORD\"  # Replace this with your password\n",
     "auth = PasswordAuthenticator(username, password)\n",
-    "# Configure cluster options with SSL verification disabled for testing, in production you should enable it\n",
+    "# Configure cluster options with SSL verification disabled for testing; in production you should enable it\n",
     "options = ClusterOptions(auth, tls_verify='none')\n",
     "options.apply_profile(\"wan_development\")\n",
     "cluster = Cluster(endpoint, options)"
@@ -200,12 +198,12 @@
     "bucket_name = \"travel-sample\"\n",
     "scope_name = \"inventory\"\n",
     "collection_name = \"hotel\"\n",
-    "index_name = \"hybrid_autovec_workflow_vec_addr_descr_id\"  # This is the name of the search index which was created in the step 4.5 and can also be seen in the search tab of the cluster.\n",
-    "                                                          # It shall be noted that hybrid_workflow_name_index_fieldname is the naming convention for the index created by AutoVectorization workflow where\n",
+    "index_name = \"hybrid_autovec_workflow_vec_addr_descr_id\"  # This is the name of the search index that was created in step 4.5 and can also be seen in the search tab of the cluster.\n",
+    "                                                          # It should be noted that hybrid_workflow_name_index_fieldname is the naming convention for the index created by AutoVectorization workflow where\n",
     "                                                          # fieldname is the name of the field being indexed.\n",
     "embedder = NVIDIAEmbeddings(\n",
-    "    model=\"nvidia/nv-embedqa-e5-v5\",                      # This is the model which will be used to create the embedding of the query.\n",
-    "    api_key=\"nvapi-xyz\"                                   # This is the api key using which your model will be accessed.\n",
+    "    model=\"nvidia/nv-embedqa-e5-v5\",                      # This is the model that will be used to create the embedding of the query.\n",
+    "    api_key=\"nvapi-xyz\"                                   # This is the API key that will be used to access your model.\n",
     ")"
    ]
   },
@@ -237,8 +235,8 @@
     "    collection_name=collection_name,\n",
     "    embedding=embedder,\n",
     "    index_name=index_name,\n",
-    "    text_key=\"address\",                  # your document's text field\n",
-    "    embedding_key=\"vec_addr_descr_id\"    # this is the field in which your vector(embedding) is stored in the cluster.\n",
+    "    text_key=\"address\",                  # Your document's text field\n",
+    "    embedding_key=\"vec_addr_descr_id\"    # This is the field in which your vector (embedding) is stored in the cluster.\n",
     ")"
    ]
   },
@@ -257,7 +255,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": null,
    "id": "177fd6d5",
    "metadata": {},
    "outputs": [
@@ -275,7 +273,7 @@
     "query = \"Woodhead Road\"\n",
     "results = vector_store.similarity_search(query, k=3)\n",
     "\n",
-    "# Printing out the top-k results\n",
+    "# Print out the top-k results\n",
     "for rank, doc in enumerate(results, start=1):\n",
     "    title = doc.metadata.get(\"title\", \"<no title>\")\n",
     "    address_text = doc.page_content\n",
@@ -289,20 +287,20 @@
    "source": [
     "## 6. Results and Interpretation\n",
     "\n",
-    "As we can see 3 (or `k`) ranked results printed in the output.\n",
+    "As we can see, 3 (or `k`) ranked results are printed in the output.\n",
     "\n",
     "### What Each Part Means\n",
-    "- Leading number (1,2,3): The result rank (1 = most similar to your query).\n",
+    "- Leading number (1, 2, 3): The result rank (1 = most similar to your query).\n",
     "- Title: Pulled from `doc.metadata.get(\"title\", \"<no title>\")`. If your documents don't contain a `title` field, you will see `<no title>`.\n",
     "- Address text: This is the value of the field you configured as `text_key` (in this tutorial: `address`). It represents the human-readable content we chose to display.\n",
     "\n",
-    "### How The Ranking Works\n",
+    "### How the Ranking Works\n",
     "1. Your natural language query (e.g., `\"Woodhead Road\"`) is embedded using the NVIDIA model (`nvidia/nv-embedqa-e5-v5`).\n",
     "2. The vector store compares the query embedding to stored document embeddings in the field you configured (`embedding_key = \"vec_addr_descr_id\"`).\n",
-    "3. Results are sorted by vector similarity. Higher similarity = nearer semantic meaning.\n",
+    "3. Results are sorted by vector similarity. Higher similarity = closer semantic meaning.\n",
     "\n",
     "\n",
-    "> Your vector search pipeline is working if the returned documents feel meaningfully related to your natural language query—even when exact keywords do not match. Feel free to experiment with increasingly descriptive queries to observe the semantic power of the embeddings.\n"
+    "> Your vector search pipeline is working if the returned documents feel meaningfully related to your natural language query—even when exact keywords do not match. Feel free to experiment with increasingly descriptive queries to observe the semantic power of the embeddings."
    ]
   }
  ],