-
Notifications
You must be signed in to change notification settings - Fork 75
enh(blog): Add blog post on generative AI peer review policy #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 20 commits
e1be1e4
869132c
6ae999e
5b0d7f1
598bfa6
ce8e499
761c530
5d22fda
c386fed
6e4b299
12dcc38
8927a1c
a9cce3e
b66e70c
86fd7a2
a22e9f5
e119c2a
5cd3d28
8e6c94a
6dd3a9a
ab50154
6a9998d
7b83820
57c4182
d81a83f
06881df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||||||||||||
| --- | ||||||||||||||||
| layout: single | ||||||||||||||||
| title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" | ||||||||||||||||
| excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code. In some cases, this strains volunteer peer review programs like ours. Learn about pyOpenSci's approach to developing a Generative AI policy for our software peer review program." | ||||||||||||||||
| author: "pyopensci" | ||||||||||||||||
| permalink: /blog/generative-ai-peer-review-policy.html | ||||||||||||||||
| header: | ||||||||||||||||
| overlay_image: images/headers/pyopensci-floral.png | ||||||||||||||||
| categories: | ||||||||||||||||
| - blog-post | ||||||||||||||||
| - community | ||||||||||||||||
| classes: wide | ||||||||||||||||
| toc: true | ||||||||||||||||
| comments: true | ||||||||||||||||
| last_modified: 2025-09-16 | ||||||||||||||||
| --- | ||||||||||||||||
|
|
||||||||||||||||
| authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams | ||||||||||||||||
|
|
||||||||||||||||
| ## Generative AI meets scientific open source | ||||||||||||||||
|
|
||||||||||||||||
| Some developers believe that using AI products increases efficiency. However, in scientific open source, speed isn't everything—transparency, quality, and community trust are just as important as understanding the environmental impacts of using large language models in our everyday work. Similarly, ethical questions arise when tools may benefit some communities while harming others. | ||||||||||||||||
|
|
||||||||||||||||
| ## Why we need guidelines | ||||||||||||||||
|
|
||||||||||||||||
| At pyOpenSci, we’ve drafted a new policy for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software. | ||||||||||||||||
|
|
||||||||||||||||
| Our goal is transparency and fostering reproducible research. For scientific rigor, we want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the initial reviewers of generated code. | ||||||||||||||||
|
|
||||||||||||||||
| ## A complex topic: Benefits and concerns | ||||||||||||||||
|
|
||||||||||||||||
| LLMs are perceived as helping developers: | ||||||||||||||||
|
|
||||||||||||||||
| * Explain complex codebases | ||||||||||||||||
| * Generate unit tests and docstrings | ||||||||||||||||
| * In some cases, simplifying language barriers for participants in open source around the world | ||||||||||||||||
| * Speeding up everyday workflows | ||||||||||||||||
|
|
||||||||||||||||
| Some contributors also perceive these products as making open source more accessible. However, LLMs also present | ||||||||||||||||
| unprecedented social and environmental challenges. | ||||||||||||||||
|
|
||||||||||||||||
| ### Incorrectness of LLMs and misleading time benefits | ||||||||||||||||
|
|
||||||||||||||||
| Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis [indicate the contrary](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/). What's more is that the responses of LLMs for complex coding tasks [tend to be incorrect](https://arxiv.org/html/2407.06153v1) and/or overly verbose/inefficient. It is crucial that, if you use an LLM to help produce code, you should independently evaluate code correctness and efficiency. | ||||||||||||||||
|
|
||||||||||||||||
| ### Environmental impacts | ||||||||||||||||
|
|
||||||||||||||||
| Training and running LLMs [requires massive energy consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), raising sustainability concerns that sit uncomfortably alongside much of the global-scale scientific research that our community supports. | ||||||||||||||||
|
|
||||||||||||||||
| ### Impact on learning | ||||||||||||||||
|
|
||||||||||||||||
| Heavy reliance on LLMs risks producing developers who can prompt, but not debug, maintain, or secure production code. This risk undermines long-term project sustainability and growth. In the long run, it will make it [harder for young developers to learn how to code and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). | ||||||||||||||||
|
|
||||||||||||||||
| > We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. *Hamsa Bastani* | ||||||||||||||||
| ### Ethics and inclusion | ||||||||||||||||
|
|
||||||||||||||||
| LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. | ||||||||||||||||
|
|
||||||||||||||||
| ## Our Approach: Transparency and Disclosure | ||||||||||||||||
|
|
||||||||||||||||
| We acknowledge that social and ethical norms, as well as concerns about environmental and societal impacts, vary widely across the community. We are not here to judge anyone who uses or doesn't use LLMs. Our focus centers on supporting informed decision-making and consent regarding LLM use in the pyOpenSci software submission, review, and editorial process. | ||||||||||||||||
|
|
||||||||||||||||
| Our community’s expectation is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process. | ||||||||||||||||
|
|
||||||||||||||||
| * Disclose LLM use in your README and at the top of relevant modules. | ||||||||||||||||
| * Describe how the Generative AI tools were used in your package's development. | ||||||||||||||||
| * Be clear about what human review you performed on Generative AI outputs before submitting the package to our open peer review process. | ||||||||||||||||
|
|
||||||||||||||||
| Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. | ||||||||||||||||
|
|
||||||||||||||||
| ### Human oversight | ||||||||||||||||
|
|
||||||||||||||||
| LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. | ||||||||||||||||
|
|
||||||||||||||||
| * Run your tests and confirm the correctness of the code that you submitted. | ||||||||||||||||
| * Check for security and quality issues. | ||||||||||||||||
| * Ensure style, readability, and concise docstrings. | ||||||||||||||||
| * Explain your review process in your software submission to pyOpenSci. | ||||||||||||||||
|
|
||||||||||||||||
| Please **don’t offload vetting of generative AI content to volunteer reviewers**. Arrive with human-reviewed code that you understand, have tested, and can maintain. | ||||||||||||||||
|
|
||||||||||||||||
| ### Watch out for licensing issues. | ||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section may benefit from another round of editing. I think it's useful to think of the types of content that may or may not be copyrighted. E.g., refactoring your test suite is unlikely to get you in trouble, but implementing a new algorithm is. But, in the latter case you probably wouldn't use an LLM wholesale, because as you say above you'd need to understand the algorithm to vet it, and then it'd probably be easier to construct it yourself. |
||||||||||||||||
|
|
||||||||||||||||
| LLMs are trained on large amounts of open source code; most of that code has licenses that require attribution. | ||||||||||||||||
| The problem? LLMs sometimes spit out near-exact copies of that training data, but without any attribution or copyright notices. | ||||||||||||||||
|
Comment on lines
+85
to
+86
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trying to include more than just verbatim ... that fundamentally, the patterns as well are licensed. Also wondering here - let's say that i produce some code totally on my own that happens to match a pattern of some code with a license that requires attribution. What happens there? (if my production code is legitimately developed on my own and the pattern just happens to be a great one that others use too, and maybe I've even seen it before, but I'm not intentionally copying). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as copyright law is concerned, that's exactly the scenario where the substantial similarity standard would be applied. The more substantial the copying and the more closely in time that you would have observed the original, the more likely your work would be found to have substantial similarity and to be infringing. Protecting against that ambiguity is why clean-room design exists. |
||||||||||||||||
|
|
||||||||||||||||
| Why this matters: | ||||||||||||||||
|
|
||||||||||||||||
| * Using LLM output verbatim could violate the original code's license | ||||||||||||||||
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||
| * You might accidentally commit plagiarism or copyright infringement by using that output verbatim in your code | ||||||||||||||||
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||
| * Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLM's are black boxes) | ||||||||||||||||
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||
|
|
||||||||||||||||
|
||||||||||||||||
| * Using LLM output verbatim could violate the original code's license | |
| * You might accidentally commit plagiarism or copyright infringement by using that output verbatim in your code | |
| * Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLM's are black boxes) | |
| * Using LLM output verbatim could violate the original code's license | |
| * You might accidentally commit plagiarism or copyright infringement by using that output verbatim in your code | |
| * Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLMs are black boxes) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "verbatim" is being leaned on too much here. An LLM can produce verbatim copies of its corpus, but the standard in copyright law is not limited to verbatim copies. If the process involved copying at any stage, refactoring can only obfuscate. The "substantial similarity" standards in copyright law are used as circumstantial evidence of process. Modifying the result by paraphrasing/refactoring is concealing the evidence (and thus reduces the likelihood of being caught), but does not make the process legal. I think we should be careful to not spread that misconception to readers.
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Open an issue first before submitting a pull request to ensure it's welcome and needed | |
| * Open an issue first before submitting a pull request to ensure it's welcome and needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like @choldgraf's framing of this, which resonates with my own: we don't care as much how the code came about, but what we want is to have a conversation with a human. Hence the focus on you understanding your PR, being able to respond to PR feedback (without needing to bring an LLM to interpret for you, etc.).
I've now come across the situation where, when I ask people something, they feed it into an LLM and paste the answer back to me 😅 Drives me nuts.
Uh oh!
There was an error while loading. Please reload this page.