Skip to content

Conversation

@AntonEliatra
Copy link
Contributor

Description

adding delete_source example

Issues Resolved

Closes #7973

Version

all

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anton Rubin <[email protected]>
@github-actions
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@AntonEliatra
Copy link
Contributor Author

@dlvenable could you please review this PR

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. I have some comment/question.

processor:
- csv:
column_names: ["col1", "col2"]
delete_source: true # default is false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need the comment here.

`delete_header` | No | Boolean | If specified, the event header (`column_names_source_key`) is deleted after the event is parsed. If there is no event header, no action is taken. Default value is true.
`column_names_source_key` | No | String | The field in the event that specifies the CSV column names, which will be automatically detected. If there need to be extra column names, the column names are automatically generated according to their index. If `column_names` is also defined, the header in `column_names_source_key` can also be used to generate the event fields. If too few columns are specified in this field, the remaining column names are automatically generated. If too many column names are specified in this field, the CSV processor omits the extra column names.
`column_names` | No | List | User-specified names for the CSV columns. Default value is `[column1, column2, ..., columnN]` if there are no columns of data in the CSV record and `column_names_source_key` is not defined. If `column_names_source_key` is defined, the header in `column_names_source_key` generates the event fields. If too few columns are specified in this field, the remaining column names are automatically generated. If too many column names are specified in this field, the CSV processor omits the extra column names.
`delete_source` | No | Boolean | If `true`, deletes the configured `source` field (default `message`) after CSV parsing. Default is `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also elaborate on the value here. We added this to help reduce memory pressure. So if you know you are going to drop the source after this processor, you can be better memory usage since processing happens in batches.

@AntonEliatra
Copy link
Contributor Author

@dlvenable thats updated now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 3.3 Tech review PR: Tech review in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Add delete_source option to Data Prepper's csv processor

3 participants