Skip to content

Multi-Language Support for Spec-Insert -> Python #10187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ReveristRealm
Copy link

@ReveristRealm ReveristRealm commented Jul 3, 2025

Description

We are expanding the spec insert tool to translate example OpenSearch HTTP request to different languages. This PR contains the ability to automatically translate examples with spec insert start and end tags to python. This solution can be extended to other languages in the future.

New examples should be formatted like below:

<!-- spec_insert_start
component: example_code
rest: PUT /_snapshot/my_backup
body: |
{
    "type": "fs",
    "settings": {
        "location": "/mnt/backups",
        "compress": true
    }
}
-->

<!-- spec_insert_end -->

Generated example

Everything in between spec_insert_start and spec_insert_end tags is generated by the spec-insert tool:

<!-- spec_insert_start
component: example_code
rest: PUT /_snapshot/my_backup
body: |
{
    "type": "fs",
    "settings": {
        "location": "/mnt/backups",
        "compress": true
    }
}
-->
{% capture step1_rest %}
PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/mnt/backups",
    "compress": true
  }
}
{% endcapture %}

{% capture step1_python %}
response = client.snapshot.create_repository(
  repository = "my_backup",
  body =   {
    "type": "fs",
    "settings": {
      "location": "/mnt/backups",
      "compress": true
    }
  }
)
{% endcapture %}

{% include code-block.html
rest=step1_rest
python=step1_python
%}
<!-- spec_insert_end -->
<!-- spec_insert_start
component: example_code
rest: POST /my-index/_analyze
body: |
{
    "analyzer": "standard",
    "text": "hello world"
}
include_client_setup: true
-->
{% capture step1_rest %}
POST /my-index/_analyze
{
  "analyzer": "standard",
  "text": "hello world"
}
{% endcapture %}

{% capture step1_python %}

from opensearchpy import OpenSearch

host = 'localhost'
port = 9200
auth = ('admin', 'admin') # For testing only. Don't store credentials in code.
ca_certs_path = '/full/path/to/root-ca.pem' # Provide a CA bundle if you use intermediate CAs with your root CA.

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = True, # enables gzip compression for request bodies
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    ssl_assert_hostname = False,
    ssl_show_warn = False,
    ca_certs = ca_certs_path
)


response = client.indices.analyze(
  index = "my-index",
  body =   {
    "analyzer": "standard",
    "text": "hello world"
  }
)

{% endcapture %}

{% include code-block.html
rest=step1_rest
python=step1_python
%}
<!-- spec_insert_end -->
<!-- spec_insert_start
skip: true
component: example_code
rest: GET /{index}/_search?analyzer=standard&expand_wildcards=all
-->
NOTHING
   WILL
      RENDER
USE
   FOR
     MANUAL
        CONVERSION
<!-- spec_insert_end -->

Test Cases

Included these test cases in the test file to verify correct format

<!-- spec_insert_start
component: example_code
rest: GET /_cat/health?pretty=true&human=false
include_client_setup: true
-->
<!-- spec_insert_end -->

<!-- spec_insert_start
component: example_code
rest: GET /{index}/_search?analyzer=standard&expand_wildcards=all
-->
<!-- spec_insert_end -->

<!-- spec_insert_start
skip: true
component: example_code
rest: GET /{index}/_search?analyzer=standard&expand_wildcards=all
-->
<!-- spec_insert_end -->

<!-- spec_insert_start
component: example_code
rest: PUT /_settings?expand_wildcards=all
body: |
  {
    "index": {
      "number_of_replicas": 2
    }
  }
-->
<!-- spec_insert_end -->

<!-- spec_insert_start
component: example_code
rest: POST /_bulk?expand_wildcards=all
body: |
  {"index":{"_index":"test","_id":"1"}}
  {"field1":"value1"}
  {"delete":{"_index":"test","_id":"2"}}
include_client_setup: true
-->
<!-- spec_insert_end -->
Screenshot 2025-08-08 at 1 54 12 AM

Visual Representation

Screenshot 2025-07-31 at 12 42 32 PM

Version

List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

github-actions bot commented Jul 3, 2025

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@kolchfa-aws kolchfa-aws added the In progress Issue/PR: The issue or PR is in progress. label Jul 7, 2025
@dzane17
Copy link
Member

dzane17 commented Jul 7, 2025

  • Update all commits with --signoff to fix GitHub DCO check
  • Add a testing section in PR description. Show an example of spec start/end tags with generated content inside

@kolchfa-aws
Copy link
Collaborator

@nhtruong Would be fantastic if you could review this PR at your convenience. Thanks!

<!-- spec_insert_start
api: snapshot.restore
component: example_code
rest: GET _nodes/stats/indices
Copy link
Contributor

@nhtruong nhtruong Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The comment below is not about this PR specifically but about the direction of this project)

This rest arg implies that the person editing this file must know how to use this API as they are also responsible for filling in this arg. This will be a tall order especially for more complex APIs with large request bodies (e.g. the search API). This will also cause other issues that this very project is meant to solve:

  • Outdated docs: When an API is updated, say renaming a query string param that appears in the rest arg, the change in the spec will not be propagated to the doc due to the hardcoding of the rest arg.
  • Human errors: There's no mechanism to make sure that human-input rest arg is without error. For example, GET _nodes/stats/indices is used in this cat-allocation.md :))
  • Automation: In the final form of this project, when a new API is added, it will automatically generate an md file for that API through the spec, including all spec-insert components and their args. That means the example_code component should depend on args like rest

The OpenSearch API spec repo has very comprehensive tests for most APIs. These tests are run against actual OS clusters and reviewed by experts. They are also updated along with the spec. This project should construct code examples from those tests to resolve the problems mentioned above. This approach shifts most of the burdens onto the engineer(s) maintaining spec-insert, but I'd argue that it's a much more realistic option in the long run. I doubt the docs team will have the time to construct the rest arg for every API in OpenSearch, and even then they will still have to make sure that updates on the spec will be reflected in the APIs with code_example components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dblock what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, i will take these into consideration moving forward and work towards making the improvements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nhtruong I understand your concern about the rest arg being prone to errors. However our main goal here is to translate existing examples into client code rather than automating their creation. Currently all code examples are manually input so changing the format does not affect the error probability or maintainability.

Also, examples have custom info like request body, index names and query params. These cannot be supplied by api-spec. Are you recommending we expand api-spec to contain all info needed to construct documentation examples?

GET /testindex/_search?human=true
{
  "profile": true,
  "query" : {
    "match" : { "title" : "wind" }
  }
}

https://docs.opensearch.org/docs/latest/api-reference/search-apis/profile/#example-requests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzane17 I'm not referring to the spec itself but the tests of the spec. For example, here's the test for the search API: https://github.com/opensearch-project/opensearch-api-specification/blob/main/tests/default/search/template.yaml

That test also includes the setup and teardown steps (the prologues and epilogues), which we can ignore. If the tests for an API doesn't yet exist there, we should make them in the spec repo: the tests will assure you that you got the correct params for the endpoint (else it would fail) and the spec got a new test that it missed. WIN-WIN :)

call_code
end
end
end
Copy link
Contributor

@nhtruong nhtruong Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some advice regarding good coding practice (not specific to Ruby):

  1. Keep each method short and simple: You can't see the beginning and the end of a method on the same screen (i.e. you have to scroll), the method is too long!
  2. Limit the number of methods in a class/file: The human memory's buffer is very limited, if you have 20 methods that call one another, we start losing track of what your code is doing (And your AI agents will start to hallucinate)

These seems to contradict each other since breaking a big method into smaller ones will increase the number of methods. There's an art in finding a balance for this that you will mater overtime. These go hand in hand with the isolation of concerns and do not repeat yourself (DRY) principles: You have less methods and shorter methods when you apply these principles properly :)

@kolchfa-aws
Copy link
Collaborator

@ReveristRealm Please consider matching the example requests to their corresponding API specs using regular expressions. This will greatly simplify this task: there would be no human intervention required to put in tags manually. Thanks!

@kolchfa-aws kolchfa-aws added Tech review PR: Tech review in progress and removed In progress Issue/PR: The issue or PR is in progress. labels Jul 21, 2025
@ReveristRealm ReveristRealm force-pushed the Spec-Insert->HTTP-to-ProLang_Start branch from acb7eb8 to 75837ab Compare July 22, 2025 22:37
Daniel Jackson added 2 commits July 22, 2025 15:38
@ReveristRealm ReveristRealm force-pushed the Spec-Insert->HTTP-to-ProLang_Start branch from 75837ab to 46b0106 Compare July 22, 2025 22:43
@ReveristRealm ReveristRealm marked this pull request as ready for review July 22, 2025 22:47
@ReveristRealm
Copy link
Author

Hi. @Xtansia. Please review this PR when you are available.

Best, Daniel

Daniel Jackson and others added 3 commits July 29, 2025 10:48
…n of API calls into Language SDK.

Refactored code base to make "api" tag optional for quicker tag placements

Examples:

<!-- spec_insert_start
component: example_code
rest: GET /_cat/allocation?v=true&format=json&bytes=mb
-->
<!-- spec_insert_end -->

<!-- spec_insert_start
component: example_code
rest: GET /users/_doc/42?_source=name,email
include_client_setup: T
-->

Signed-off-by: Daniel Jackson <[email protected]>
…rt' into Spec-Insert->HTTP-to-ProLang_Start
base = rest_lines.join("\n")
body = @args.rest.body
if body
body.is_a?(String) ? base + "\n" + body : base + "\n" + JSON.pretty_generate(body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the condition here necessary? Can you put JSON.pretty_generate in a try/catch to guard against exceptions?

- Added test examples to test files.

Signed-off-by: Daniel Jackson <[email protected]>
- Added more test examples to fixtures file.
  - Included test for NDJSON files.
  - Included test to showcase multi-line for body tag

Signed-off-by: Daniel Jackson <[email protected]>
Daniel Jackson and others added 3 commits August 8, 2025 02:13
- Changed "include_client_setup" to be called from "insert_arguments.rb" for scalability and clarity. Also changed it so that you have to state "true" or "false".

- Included "skip" tag. This is for when you come across a spec-insert that cant be converted to a client.You can do it manually, and it won't mess with future renders

- Allowed for the body tag to be in multiple lines instead of one.

Signed-off-by: Daniel Jackson <[email protected]>
…rt' into Spec-Insert->HTTP-to-ProLang_Start
@kolchfa-aws
Copy link
Collaborator

@ReveristRealm Thank you for making this functionality! A couple of requests/questions:

  1. Please add instructions on using this functionality to the DEVELOPER_GUIDE.md. Assume that the person executing the instructions is non-technical.
  2. Are there any restrictions on the requests that can be turned into code examples? I see that multi-line requests are supported. I think Bulk is not supported, correct? Please add any restrictions to the DEVELOPER_GUIDE
  3. Could you add the script that you wrote to automatically insert tags to this PR so it's part of the repo? We'd want to use this script to automatically turn requests into the correct format to be converted into the right format (body, etc.) for the spec tag.

- Fixed minor query params issue.

- Implemented steps in the developer guide.

- Removed un-reachable code  logic.

- Included example with NDJSON body in test files.

Signed-off-by: Daniel Jackson <[email protected]>
@ReveristRealm
Copy link
Author

For number 2 , i was able to get NDJSON request supported, so to my knowledge everything should be able to convert. But i still included skip just incase something comes up later down the line.

The insert-script will come up in the next PR.

@ReveristRealm ReveristRealm changed the title Spec insert >http to pro lang start Multi-Language Support for Spec-Insert -> Python Aug 14, 2025
@ReveristRealm
Copy link
Author

@nhtruong Hi, can you review this PR when you get the chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tech review PR: Tech review in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants