-
Notifications
You must be signed in to change notification settings - Fork 30
fix(repository): replace Pull with Fetch+Reset to avoid go-git non-fast-forward bug #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(repository): replace Pull with Fetch+Reset to avoid go-git non-fast-forward bug #783
Conversation
|
@AlanLonguet and @corrieriluca, let me know what you think |
|
Sorry for introducing this bug with 0.9.0, and thank you for proposing a fix so quickly.
I'm surprised because I thought I handled the case where fast-forward is not possible (e.g. after a forced push) by deleting and re-cloning the repo, is that not enough? 🤔 Before merging this fix can you provide an example on how to reproduce the bug? Thanks |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #783 +/- ##
==========================================
- Coverage 39.79% 39.73% -0.07%
==========================================
Files 94 94
Lines 5465 5474 +9
==========================================
Hits 2175 2175
- Misses 3093 3102 +9
Partials 197 197 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @corrieriluca, Thanks for looking at this! The issue isn't actually about force-push - let me explain: Root CauseWhen Burrito needs to work with a non-default branch (e.g., go-git's Why delete-and-re-clone doesn't help
ReproductionSimply create a TerraformLayer pointing to any non-default branch. No force-push required - the bug happens on the first reconciliation. apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
name: my-layer
spec:
repository: my-repo
branch: "develop" # Any non-default branch
path: "terraform/"You'll see in the logs: This repeats infinitely because re-cloning doesn't fix the lack of tracking. The FixReplace See: go-git/go-git#358 |
@nerdeveloper I failed to reproduce the issue on 0.9.0 🤔 Is there something specific about your repo? (branch setup, git hosting service, anything?...) I tested with a sample GitHub private repo with 2 branches:
I tested multiple orders of resource creation and everything works. 1. When I first create the layer with 2. When I first create the layer for No trace of |
|
Hi @corrieriluca! Thanks for taking the time to review this. Bug Reproduction EvidenceI was able to reproduce this bug. The key insight is that the bug only triggers when there are NEW commits on the remote branch that need to be pulled - not on initial sync. Why Your Test PassedWhen you tested with a fresh setup:
Both local and remote are at the same commit, so Pull() succeeds without actually merging. When The Bug Triggers
Exact Reproduction StepsCaptured Logs - v0.9.0 (Bug - Infinite Loop)Captured Logs - With Fix (Success)Summary
|
|
I tested again but it succeeds to fetch the changes without re-cloning. TBH I don't understand why I cannot reproduce the bug. I did exactly as you said:
What surprises me the most is that, even after the delete, you cannot clone the repository again. That's strange because Burrito starts from scratch: it I'm gonna test your fix and come back to you, we need to ensure it does not create unintended side effects |
|
Sure, we use GitLab so that it may be a platform-specific problem, too. I don't know, but it's one of those edge cases that is hard to reproduce from what we can see |
|
Okay, I'm gonna test with a test repo on gitlab.com to see if I can reproduce the issue |
Additional Context: Multiple TerraformRepositories Sharing Same Git URLOne additional factor in our setup that might help reproduce this: we have multiple TerraformRepository CRDs pointing to the same Git repository URL, with TerraformLayers referencing different branches. Our Configuration# TerraformRepository 1
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRepository
metadata:
name: my-app-repo-1
namespace: burrito-demo
spec:
repository:
url: https://gitlab.com/my-org/my-infrastructure # Same URL
---
# TerraformRepository 2 (SAME Git URL, different name)
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRepository
metadata:
name: my-app-repo-2
namespace: burrito-demo
spec:
repository:
url: https://gitlab.com/my-org/my-infrastructure # Same URL!
---
# TerraformLayer 1 - references repo-1, branch A
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
name: layer-1
namespace: burrito-demo
spec:
repository:
name: my-app-repo-1
namespace: burrito-demo
branch: feature-branch-a
path: terraform/stacks/app-1
---
# TerraformLayer 2 - references repo-2, branch B
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
name: layer-2
namespace: burrito-demo
spec:
repository:
name: my-app-repo-2
namespace: burrito-demo
branch: feature-branch-b
path: terraform/stacks/app-2Why This MattersSince both TerraformRepositories have the same Git URL, they share the same cached directory (URL hash): hash := fmt.Sprintf("%x", sha256.Sum256([]byte(p.RepoURL)))
p.workingDir = filepath.Join(repositoryDir, hash)When both reconcile concurrently with different branches:
This multi-repository scenario might explain why the bug is harder to reproduce with a single TerraformRepository. |
|
@nerdeveloper Okay, a race condition between the two repositories sharing the same folder is likely to be the root cause 🤔 Why do you create multiple TerraformRepositories pointing to the same repo? Are they created in the same tenant/namespace or in different ones? Either way, I don't think the right fix for this issue is to change the git pull + set reference logic. A better fix would be to compute a better hash, to ensure a 1-to-1 mapping between TerraformRepositories resources and repository folders in the controller's pod. WDYT? |
|
Hi @corrieriluca, Thanks for looking deeper into this! I respectfully disagree that changing the hash is the right fix. We are a very large org. Here's why: The Hash Fix Is IncompleteEven with a unique hash per TerraformRepository, the bug still occurs with one TerraformRepository and multiple TerraformLayers on different branches: TerraformRepository: my-infra-repo
└── TerraformLayer 1: branch main, path /stacks/prod
└── TerraformLayer 2: branch develop, path /stacks/stagingWhen the controller reconciles
No race condition. No shared cache. The bug is in the Multi-Tenant Use Cases Should Be SupportedWe shouldn't restrict users to one-repo-one-branch-one-layer. Common patterns include:
Even ArgoCD supports one Git repository serving multiple Applications with different branches/paths. Burrito should too. Why Fetch + Reset Is The Right Fix
The current Happy to discuss further or add tests if needed! |
I am not sure of this assertion because this is the specific thing I tried to reproduce and everytime I tried no bug occured. However, okay I hear your case on supporting the pattern of multiple TerraformRepo for same URL. Burrito should allow this. I'm gonna test your PR in my sandbox environment before merging. |
|
fwiw I worked with @nerdeveloper and I just had a situation where I had a working branch going, a cd process pushed to the branch I resolved the conflict and pushed and that gave me the error. We can see if there is a good way to reproduce it / narrow it down, but just figured it was worth mentioning it came up again and I didn't do a push force, every thing I did was a standard push / work flow. Specifically I think when I got this error that this issue came up, but I can't say for sure.... I replaced the / company name, we'll have to see if we can reproduce the issue or not though. |
…st-forward bug The go-git library's Pull() function has a known issue (padok-team#358) where it returns 'non-fast-forward update' errors even when a fast-forward merge is possible. This happens when local branches don't have proper upstream tracking configured. This commit replaces Pull() with an explicit Fetch() + Hard Reset approach: 1. Fetch latest changes from remote with Force=true 2. Get the remote reference for the target branch 3. Hard reset the worktree to the remote ref 4. Update the local branch reference This approach: - Avoids the go-git Pull() bug entirely - Is more explicit about what we want (always match remote) - Handles force-push scenarios correctly - Works regardless of tracking configuration Fixes: go-git/go-git#358
|
Hey Braden, I looked into this and noticed the deployment was using the wrong image tag - v0.8.1-beta was pushed on Dec 3rd, before this PR existed. Rebuilt with this fix as v0.9.2-beta and it's working now. TerraformRepositories sync correctly, no more non-fast-forward errors. Will share more details internally. |
19ac52a to
b0e6b6f
Compare
|
In the meantime, @corrieriluca, this is another issue @bradenwright faced without my fix, which is this PR. I also tested with the latest version of Burrito and still got the same error. Hopefully, this can be merged into the upstream. Happy holidays, folks. |
Summary
This PR fixes a critical bug where all TerraformRepositories fail with "non-fast-forward update" error, causing the controller to enter an infinite clone/delete loop.
Problem
The go-git library has a known issue (#358) where
worktree.Pull()returns "non-fast-forward update" errors even when a fast-forward merge is possible.This happens because Burrito creates local branches from remote refs using
SetReference()without configuring upstream tracking. WhenPull()is called, go-git can't determine the merge strategy and incorrectly reports a non-fast-forward error.Symptoms
failed to pull latest changes for ref refs/heads/...: non-fast-forward update, deleting local repositorySyncNeededstateSolution
Replace
worktree.Pull()with an explicitFetch()+Hard Resetapproach:Force=trueThis approach:
Pull()bug entirelyTesting
fetching latest changes...→resetting to remote ref...Files Changed
internal/repository/providers/standard/repository.go- Replace Pull with Fetch+Reset inBundle()function