You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorial/markdown/python/python-haystack-pdf-chat/query_based/python-haystack-pdf-chat.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -257,11 +257,14 @@ Haystack is a powerful library that simplifies the process of building applicati
257
257
In the PDF Chat app, Haystack is used for several tasks:
258
258
259
259
-**Loading and processing PDF documents**: Haystack's [_PyPDFToDocument_](https://docs.haystack.deepset.ai/docs/pypdftodocument) component can convert PDF files into Haystack Document objects, which can hold various types of content, including text, metadata, and embeddings.
260
-
-**Text splitting**: Haystack's [_DocumentSplitter_](https://docs.haystack.deepset.ai/docs/documentsplitter) is used to split the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
260
+
-**Text preprocessing**: Haystack's [_DocumentCleaner_](https://docs.haystack.deepset.ai/docs/documentcleaner) removes unwanted characters and formatting from the extracted text, while [_DocumentSplitter_](https://docs.haystack.deepset.ai/docs/documentsplitter) splits the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
261
+
-**Embedding generation**: Haystack's [_OpenAIDocumentEmbedder_](https://docs.haystack.deepset.ai/docs/openaidocumentembedder) and [_OpenAITextEmbedder_](https://docs.haystack.deepset.ai/docs/openaitextembedder) components convert text into vector embeddings using OpenAI's embedding models, enabling semantic search capabilities.
262
+
-**Text generation**: Haystack's [_OpenAIGenerator_](https://docs.haystack.deepset.ai/docs/openaigenerator) component interfaces with OpenAI's language models to generate human-like responses based on the retrieved context and user queries.
263
+
-**Document storage**: Haystack's [_DocumentWriter_](https://docs.haystack.deepset.ai/docs/documentwriter) component handles the storage of processed documents and their embeddings into the vector store.
261
264
-**Vector store integration**: Haystack provides a [CouchbaseDocumentStore](https://haystack.deepset.ai/integrations/couchbase-document-store) class that seamlessly integrates with Couchbase's Vector Search, allowing the app to store and search through the embeddings and their corresponding text.
262
265
-**Pipelines**: Haystack uses [Pipelines](https://docs.haystack.deepset.ai/docs/pipelines) to combine different components for various tasks. In this app, we have an indexing pipeline for processing and storing documents, and a RAG pipeline for retrieval and generation.
263
-
-**Prompt Building**: Haystack's [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) component allows you to create custom prompts that guide the language model's behavior and output.
264
-
-**Streaming Output**: LangChain supports [streaming](https://docs.langchain.com/oss/python/langchain/streaming), allowing the app to stream the generated answer to the client in real-time.
266
+
-**Prompt Building**: Haystack's [_PromptBuilder_](https://docs.haystack.deepset.ai/docs/promptbuilder) component allows you to create custom prompts that guide the language model's behavior and output.
267
+
-**Answer Building**: Haystack's [_AnswerBuilder_](https://docs.haystack.deepset.ai/docs/answerbuilder) component structures the final response by combining the generated text with metadata and source information.
265
268
266
269
By combining Vector Search with Couchbase, RAG, and Haystack, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models.
Copy file name to clipboardExpand all lines: tutorial/markdown/python/python-haystack-pdf-chat/search_based/python-haystack-pdf-chat.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -288,11 +288,14 @@ Haystack is a powerful library that simplifies the process of building applicati
288
288
In the PDF Chat app, Haystack is used for several tasks:
289
289
290
290
-**Loading and processing PDF documents**: Haystack's [_PyPDFToDocument_](https://docs.haystack.deepset.ai/docs/pypdftodocument) component can convert PDF files into Haystack Document objects, which can hold various types of content, including text, metadata, and embeddings.
291
-
-**Text splitting**: Haystack's [_DocumentSplitter_](https://docs.haystack.deepset.ai/docs/documentsplitter) is used to split the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
291
+
-**Text preprocessing**: Haystack's [_DocumentCleaner_](https://docs.haystack.deepset.ai/docs/documentcleaner) removes unwanted characters and formatting from the extracted text, while [_DocumentSplitter_](https://docs.haystack.deepset.ai/docs/documentsplitter) splits the text from the PDF documents into smaller chunks or passages, which are more suitable for embedding and retrieval.
292
+
-**Embedding generation**: Haystack's [_OpenAIDocumentEmbedder_](https://docs.haystack.deepset.ai/docs/openaidocumentembedder) and [_OpenAITextEmbedder_](https://docs.haystack.deepset.ai/docs/openaitextembedder) components convert text into vector embeddings using OpenAI's embedding models, enabling semantic search capabilities.
293
+
-**Text generation**: Haystack's [_OpenAIGenerator_](https://docs.haystack.deepset.ai/docs/openaigenerator) component interfaces with OpenAI's language models to generate human-like responses based on the retrieved context and user queries.
294
+
-**Document storage**: Haystack's [_DocumentWriter_](https://docs.haystack.deepset.ai/docs/documentwriter) component handles the storage of processed documents and their embeddings into the vector store.
292
295
-**Vector store integration**: Haystack provides a [CouchbaseDocumentStore](https://haystack.deepset.ai/integrations/couchbase-document-store) class that seamlessly integrates with Couchbase's Vector Search, allowing the app to store and search through the embeddings and their corresponding text.
293
296
-**Pipelines**: Haystack uses [Pipelines](https://docs.haystack.deepset.ai/docs/pipelines) to combine different components for various tasks. In this app, we have an indexing pipeline for processing and storing documents, and a RAG pipeline for retrieval and generation.
294
-
-**Prompt Building**: Haystack's [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) component allows you to create custom prompts that guide the language model's behavior and output.
295
-
-**Streaming Output**: LangChain supports [streaming](https://python.langchain.com/docs/expression_language/streaming/), allowing the app to stream the generated answer to the client in real-time.
297
+
-**Prompt Building**: Haystack's [_PromptBuilder_](https://docs.haystack.deepset.ai/docs/promptbuilder) component allows you to create custom prompts that guide the language model's behavior and output.
298
+
-**Answer Building**: Haystack's [_AnswerBuilder_](https://docs.haystack.deepset.ai/docs/answerbuilder) component structures the final response by combining the generated text with metadata and source information.
296
299
297
300
By combining Vector Search with Couchbase, RAG, and Haystack, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models.
0 commit comments