Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Not an exhaustive list, but these are some of the core team’s biggest prioriti

- **Workflow**: Building tools for answer questions like: what embedding model should I use? And how should I chunk up my documents?
- **Visualization**: Building visualization tool to give developers greater intuition embedding spaces
- **Query Planner**: Building tools to enable per-query and post-query transforms
- **Query Planner**: Building tools to enable pre-query and post-query transforms
- **Developer experience**: Adding more features to our CLI
- **Easier Data Sharing**: Working on formats for serialization and easier data sharing of embedding Collections
- **Improving recall**: Fine-tuning embedding transforms through human feedback
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Not an exhaustive list, but these are some of the core team’s biggest prioriti

- **Workflow**: Building tools for answer questions like: what embedding model should I use? And how should I chunk up my documents?
- **Visualization**: Building visualization tool to give developers greater intuition embedding spaces
- **Query Planner**: Building tools to enable per-query and post-query transforms
- **Query Planner**: Building tools to enable pre-query and post-query transforms
- **Developer experience**: Adding more features to our CLI
- **Easier Data Sharing**: Working on formats for serialization and easier data sharing of embedding Collections
- **Improving recall**: Fine-tuning embedding transforms through human feedback
Expand Down
2 changes: 1 addition & 1 deletion docs/docs.trychroma.com/public/llms-full.text
Original file line number Diff line number Diff line change
Expand Up @@ -2891,7 +2891,7 @@ Not an exhaustive list, but these are some of the core team’s biggest prioriti

- **Workflow**: Building tools for answer questions like: what embedding model should I use? And how should I chunk up my documents?
- **Visualization**: Building visualization tool to give developers greater intuition embedding spaces
- **Query Planner**: Building tools to enable per-query and post-query transforms
- **Query Planner**: Building tools to enable pre-query and post-query transforms
- **Developer experience**: Adding more features to our CLI
- **Easier Data Sharing**: Working on formats for serialization and easier data sharing of embedding Collections
- **Improving recall**: Fine-tuning embedding transforms through human feedback
Expand Down
2 changes: 1 addition & 1 deletion sample_apps/generative_benchmarking/data/chroma_docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"0415ae0b-e74a-484c-91a6-ed5aeba83fd9": "-started).\n\n\n***\n\n## Language Clients\n\n| Language | Client |\n|---------------|--------------------------------------------------------------------------------------------------------------------------|\n| Python | [`chromadb`](https://pypistats.org/packages/chromadb) (by Chroma) |\n| Javascript | [`chromadb`](https://www.npmjs.com/package/chromadb) (by Chroma) |\n| Ruby | [from @mariochavez](https://github.com/mariochavez/chroma) |\n| Java | [from @t_azarov](https://github.com/amikos-tech/chromadb-java-client) |\n| Go | [from @t_azarov](https://github.com/amikos-tech/chroma-go) |\n| C# | [from @microsoft](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/Connectors/Connectors.Memory.Chroma) |\n| Rust | [from @Anush008](https://crates.io/crates/chromadb) |\n| Elixir | [from @3zcurdia](https://hex.pm/packages/chroma/) |\n| Dart | [from @david",
"d6227cb9-757c-4f65-8da1-9b9aa4344da9": "migloz](https://pub.dev/packages/chromadb) |\n| PHP | [from @CodeWithKyrian](https://github.com/CodeWithKyrian/chromadb-php) |\n| PHP (Laravel) | [from @HelgeSverre](https://github.com/helgeSverre/chromadb) |\n| Clojure | [from @levand](https://github.com/levand/clojure-chroma-client) |\n| R | [from @cynkra](https://cynkra.github.io/rchroma/) |\n| C++ | [from @BlackyDrum](https://github.com/BlackyDrum/chromadb-cpp) |\n\n\n{% br %}{% /br %}\n\nWe welcome [contributions](/markdoc/content/docs/overview/contributing.md) for other languages!\n\n\n# Roadmap\n\nThe goal of this doc is to align *core* and *community* efforts for the project and to share what's in store for this year!\n\n**Sections**\n- What is the core Chroma team working on right now?\n- What will Chroma prioritize over the next",
"f36b5ea9-6c0e-496d-af3d-f5fdf13be8d2": " 6mo?\n- What areas are great for community contributions?\n\n## What is the core Chroma team working on right now?\n\n- Standing up that distributed system as a managed service (aka \"Hosted Chroma\" - [sign up for waitlist](https://trychroma.com/signup)!)\n\n## What did the Chroma team just complete?\n\nFeatures like:\n- *New* - [Chroma 0.4](https://www.trychroma.com/blog/chroma_0.4.0) - our first production-oriented release\n- A more minimal python-client only build target\n- Google PaLM embedding support\n- OpenAI ChatGPT Retrieval Plugin\n\n## What will Chroma prioritize over the next 6mo?\n\n**Next Milestone: \u2601\ufe0f Launch Hosted Chroma**\n\n**Areas we will invest in**\n\nNot an exhaustive list, but these are some of the core team\u2019s biggest priorities over the coming few months. Use caution when contributing in these areas and please check-in with the core team first.\n\n- **Workflow**: Building tools for answer questions like: what embedding model should I use? And how should I chunk up my documents?\n- **Visualization**: Building visualization tool to give developers greater intuition",
"158433cc-2181-4a53-b2c5-9fb86d74e590": " embedding spaces\n- **Query Planner**: Building tools to enable per-query and post-query transforms\n- **Developer experience**: Extending Chroma into a CLI\n- **Easier Data Sharing**: Working on formats for serialization and easier data sharing of embedding Collections\n- **Improving recall**: Fine-tuning embedding transforms through human feedback\n- **Analytical horsepower**: Clustering, deduplication, classification and more\n\n## What areas are great for community contributions?\n\nThis is where you have a lot more free reign to contribute (without having to sync with us first)!\n\nIf you're unsure about your contribution idea, feel free to chat with us (@chroma) in the `#general` channel in [our Discord](https://discord.gg/rahcMUU5XV)! We'd love to support you however we can.\n\n### Example Templates\n\nWe can always use [more integrations](../../integrations/chroma-integrations) with the rest of the AI ecosystem. Please let us know if you're working on one and need help!\n\nOther great starting points for Chroma (please send PRs for more [here](https://github.com/chroma-core/docs/tree/swyx/addRoadmap/docs)):\n- [Google Col",
"158433cc-2181-4a53-b2c5-9fb86d74e590": " embedding spaces\n- **Query Planner**: Building tools to enable pre-query and post-query transforms\n- **Developer experience**: Extending Chroma into a CLI\n- **Easier Data Sharing**: Working on formats for serialization and easier data sharing of embedding Collections\n- **Improving recall**: Fine-tuning embedding transforms through human feedback\n- **Analytical horsepower**: Clustering, deduplication, classification and more\n\n## What areas are great for community contributions?\n\nThis is where you have a lot more free reign to contribute (without having to sync with us first)!\n\nIf you're unsure about your contribution idea, feel free to chat with us (@chroma) in the `#general` channel in [our Discord](https://discord.gg/rahcMUU5XV)! We'd love to support you however we can.\n\n### Example Templates\n\nWe can always use [more integrations](../../integrations/chroma-integrations) with the rest of the AI ecosystem. Please let us know if you're working on one and need help!\n\nOther great starting points for Chroma (please send PRs for more [here](https://github.com/chroma-core/docs/tree/swyx/addRoadmap/docs)):\n- [Google Col",
"9eb60ecf-5dec-45df-afa2-2c27743d07da": "ab](https://colab.research.google.com/drive/1QEzFyqnoFxq7LUGyP1vzR4iLt9PpCDXv?usp=sharing)\n- [Replit Template](https://replit.com/@swyx/BasicChromaStarter?v=1)\n\nFor those integrations we do have, like LangChain and LlamaIndex, we do always want more tutorials, demos, workshops, videos, and podcasts (we've done some pods [on our blog](https://trychroma.com/interviews)).\n\n### Example Datasets\n\nIt doesn\u2019t make sense for developers to embed the same information over and over again with the same embedding model.\n\nWe'd like suggestions for:\n\n- \"small\" (<100 rows)\n- \"medium\" (<5MB)\n- \"large\" (>1GB)\n\ndatasets for people to stress test Chroma in a variety of scenarios.\n\n### Embeddings Comparison\n\nChroma does ship with Sentence Transformers by default for embeddings, but we are otherwise unopinionated about what embeddings you use. Having a library of information that has been embedded with many models, alongside example query sets would make it much easier for empirical work to",
"bfc3cf9c-3293-4003-a3a2-2cd4739d568d": " be done on the effectiveness of various models across different domains.\n\n- [Preliminary reading on Embeddings](https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526?gi=ee46baab0d8f)\n- [Huggingface Benchmark of a bunch of Embeddings](https://huggingface.co/blog/mteb)\n- [notable issues with GPT3 Embeddings](https://twitter.com/Nils_Reimers/status/1487014195568775173) and alternatives to consider\n\n### Experimental Algorithms\n\nIf you have a research background, please consider adding to our `ExperimentalAPI`s. For example:\n\n- Projections (t-sne, UMAP, the new hotness, the one you just wrote) and Lightweight visualization\n- Clustering (HDBSCAN, PCA)\n- Deduplication\n- Multimodal (CLIP)\n- Fine-tuning manifold with human feedback [eg](https://github.com/openai/openai-cookbook/blob/main/examples/Customizing_embeddings.ipynb)\n- Expanded vector search (MMR, Polytope)\n- Your research\n\nYou can find the REST OpenAPI spec at",
"14b13c9e-2680-4030-82f0-46833f424700": " `localhost:8000/openapi.json` when the backend is running.\n\nPlease [reach out](https://discord.gg/MMeYNTmh3x) and talk to us before you get too far in your projects so that we can offer technical guidance/align on roadmap.\n\n# Telemetry\n\nChroma contains a telemetry feature that collects **anonymous** usage information.\n\n### Why?\n\nWe use this information to help us understand how Chroma is used, to help us prioritize work on new features and bug fixes, and to help us improve Chroma\u2019s performance and stability.\n\n### Opting out\n\nIf you prefer to opt out of telemetry, you can do this in two ways.\n\n#### In Client Code\n\n{% Tabs %}\n\n{% Tab label=\"python\" %}\n\nSet `anonymized_telemetry` to `False` in your client's settings:\n\n```python\nfrom chromadb.config import Settings\nclient = chromadb.Client(Settings(anonymized_telemetry=False))\n# or if using PersistentClient\nclient = chromadb.PersistentClient(path=\"/path/to/save/to\", settings=Settings(anonymized_telemetry=False))\n```\n\n{% /Tab %}\n\n{% Tab label=\"typescript\" %}\n\nDisable telemetry on you Chroma server",
Expand Down