Skip to content

Conversation

@nerdeveloper
Copy link
Contributor

Summary

This PR fixes a critical bug where all TerraformRepositories fail with "non-fast-forward update" error, causing the controller to enter an infinite clone/delete loop.

Problem

The go-git library has a known issue (#358) where worktree.Pull() returns "non-fast-forward update" errors even when a fast-forward merge is possible.

This happens because Burrito creates local branches from remote refs using SetReference() without configuring upstream tracking. When Pull() is called, go-git can't determine the merge strategy and incorrectly reports a non-fast-forward error.

Symptoms

  • Controller logs show: failed to pull latest changes for ref refs/heads/...: non-fast-forward update, deleting local repository
  • TerraformRepositories stuck in SyncNeeded state
  • Infinite loop: clone → pull fails → delete → clone again

Solution

Replace worktree.Pull() with an explicit Fetch() + Hard Reset approach:

  1. Fetch latest changes from remote with Force=true
  2. Get the remote reference for the target branch
  3. Hard Reset the worktree to the remote ref
  4. Update the local branch reference to match

This approach:

  • Avoids the go-git Pull() bug entirely
  • Is more explicit about intent (always sync to remote)
  • Handles force-push scenarios correctly
  • Works regardless of tracking configuration

Testing

  • Code compiles successfully
  • Deployed to production cluster
  • Verified TerraformRepositories now sync successfully
  • Logs show correct behavior: fetching latest changes...resetting to remote ref...

Files Changed

  • internal/repository/providers/standard/repository.go - Replace Pull with Fetch+Reset in Bundle() function

@github-project-automation github-project-automation bot moved this to 📋 Backlog in burrito Dec 6, 2025
@nerdeveloper
Copy link
Contributor Author

nerdeveloper commented Dec 6, 2025

@AlanLonguet and @corrieriluca, let me know what you think

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in burrito Dec 6, 2025
@nerdeveloper nerdeveloper reopened this Dec 6, 2025
@corrieriluca
Copy link
Member

Hi @nerdeveloper

Sorry for introducing this bug with 0.9.0, and thank you for proposing a fix so quickly.

Infinite loop: clone → pull fails → delete → clone again

I'm surprised because I thought I handled the case where fast-forward is not possible (e.g. after a forced push) by deleting and re-cloning the repo, is that not enough? 🤔

Before merging this fix can you provide an example on how to reproduce the bug?

Thanks

@codecov
Copy link

codecov bot commented Dec 7, 2025

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.73%. Comparing base (05349b5) to head (19ac52a).

Files with missing lines Patch % Lines
...ternal/repository/providers/standard/repository.go 0.00% 20 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
- Coverage   39.79%   39.73%   -0.07%     
==========================================
  Files          94       94              
  Lines        5465     5474       +9     
==========================================
  Hits         2175     2175              
- Misses       3093     3102       +9     
  Partials      197      197              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nerdeveloper
Copy link
Contributor Author

Hi @corrieriluca,

Thanks for looking at this! The issue isn't actually about force-push - let me explain:

Root Cause

When Burrito needs to work with a non-default branch (e.g., feature/xyz), it creates a local branch using SetReference() (lines 104-111). This creates the branch but doesn't configure upstream tracking.

go-git's Pull() requires tracking to determine which remote branch to merge from. Without it, Pull() returns "non-fast-forward update" - but this is a misleading error message. It's not actually a non-fast-forward situation; go-git just can't figure out what to do.

Why delete-and-re-clone doesn't help

  1. Clone repo → default branch has tracking ✓
  2. Create local branch feature/xyz via SetReference() → no tracking ✗
  3. Pull() fails → delete repo
  4. Clone again → same thing happens → infinite loop

Reproduction

Simply create a TerraformLayer pointing to any non-default branch. No force-push required - the bug happens on the first reconciliation.

apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
  name: my-layer
spec:
  repository: my-repo
  branch: "develop"  # Any non-default branch
  path: "terraform/"

You'll see in the logs:

failed to pull latest changes for ref refs/heads/develop: non-fast-forward update, deleting local repository

This repeats infinitely because re-cloning doesn't fix the lack of tracking.

The Fix

Replace Pull() with explicit Fetch() + Hard Reset. This bypasses go-git's broken merge logic entirely and works regardless of tracking configuration.

See: go-git/go-git#358

@corrieriluca
Copy link
Member

corrieriluca commented Dec 8, 2025

Simply create a TerraformLayer pointing to any non-default branch. No force-push required - the bug happens on the first reconciliation.

@nerdeveloper I failed to reproduce the issue on 0.9.0 🤔 Is there something specific about your repo? (branch setup, git hosting service, anything?...)

I tested with a sample GitHub private repo with 2 branches:

  • main (default branch)
  • feat/test (1 commit ahead)

I tested multiple orders of resource creation and everything works.

1. When I first create the layer with feat/test, the clone works fine in the controller's logs:

time="2025-12-08T13:15:36Z" level=info msg="repository burrito-project/burrito-demo is out of sync with remote for ref feat/test. Syncing..."
time="2025-12-08T13:15:36Z" level=info msg="cloning repository https://github.com/burrito-demo-org/website-infra.git to /var/run/burrito/repositories/c34920efd7a71adeafa43acd204510347fd934f9e70e909a9878885e0bcc8938/repository"
time="2025-12-08T13:15:37Z" level=info msg="creating local branch refs/heads/feat/test from remote refs/remotes/origin/feat/test"
time="2025-12-08T13:15:37Z" level=info msg="checking out branch refs/heads/feat/test"
time="2025-12-08T13:15:37Z" level=info msg="pulling latest changes for repo https://github.com/burrito-demo-org/website-infra.git on branch refs/heads/feat/test"
time="2025-12-08T13:15:37Z" level=info msg="repository is already up-to-date"
time="2025-12-08T13:15:37Z" level=info msg="stored new bundle for repository burrito-project/burrito-demo ref:feat/test revision:8d848d86950cabcddd69b04d025daccadf3baffe"
time="2025-12-08T13:15:37Z" level=info msg="layer burrito-project/static-website annotated with new revision 8d848d86950cabcddd69b04d025daccadf3baffe"
time="2025-12-08T13:15:37Z" level=info msg="starting reconciliation for layer burrito-project/static-website ..."

2. When I first create the layer for main branch and then the layer for feat/test 1 minute later, it works also (repo is not cloned but only pulled):

time="2025-12-08T13:29:28Z" level=info msg="skipping sync for repository burrito-project/burrito-demo ref main: last sync was at Mon Dec  8 13:29:12 UTC 2025 and no new layer for this branch"
time="2025-12-08T13:29:28Z" level=info msg="latest revision for repository burrito-project/burrito-demo ref feat/test is 8d848d86950cabcddd69b04d025daccadf3baffe"
time="2025-12-08T13:29:28Z" level=info msg="repository burrito-project/burrito-demo is out of sync with remote for ref feat/test. Syncing..."
time="2025-12-08T13:29:28Z" level=info msg="repository already exists at /var/run/burrito/repositories/c34920efd7a71adeafa43acd204510347fd934f9e70e909a9878885e0bcc8938/repository, opening existing clone"
time="2025-12-08T13:29:28Z" level=info msg="fetching latest changes from remote"
time="2025-12-08T13:29:28Z" level=info msg="creating local branch refs/heads/feat/test from remote refs/remotes/origin/feat/test"
time="2025-12-08T13:29:28Z" level=info msg="checking out branch refs/heads/feat/test"
time="2025-12-08T13:29:28Z" level=info msg="pulling latest changes for repo https://github.com/burrito-demo-org/website-infra.git on branch refs/heads/feat/test"
time="2025-12-08T13:29:28Z" level=info msg="repository is already up-to-date"
time="2025-12-08T13:29:28Z" level=info msg="stored new bundle for repository burrito-project/burrito-demo ref:feat/test revision:8d848d86950cabcddd69b04d025daccadf3baffe"
time="2025-12-08T13:29:28Z" level=info msg="layer burrito-project/static-website annotated with new revision 8d848d86950cabcddd69b04d025daccadf3baffe"

No trace of non-fast-forward update, deleting local repository in the logs 🤔

@nerdeveloper
Copy link
Contributor Author

nerdeveloper commented Dec 8, 2025

Hi @corrieriluca! Thanks for taking the time to review this.

Bug Reproduction Evidence

I was able to reproduce this bug. The key insight is that the bug only triggers when there are NEW commits on the remote branch that need to be pulled - not on initial sync.

Why Your Test Passed

When you tested with a fresh setup:

  1. Clone repo → Fetch gets refs/remotes/origin/dev at commit X
  2. Create local dev branch from remote ref using SetReference() → points to X
  3. Checkout dev
  4. Pull() → Fetch (nothing new), merge X→X → "already up-to-date"

Both local and remote are at the same commit, so Pull() succeeds without actually merging.

When The Bug Triggers

  1. Clone repo → local dev created at commit X
  2. Push a NEW commit Y to dev branch on remote
  3. Next reconciliation: Fetch gets commit Y
  4. Pull() tries to merge X→Y
  5. Fails with "non-fast-forward" because SetReference() creates branches without upstream tracking config
  6. Deletes repo, re-clones → infinite loop

Exact Reproduction Steps

1. Deploy Burrito v0.9.0
2. Create TerraformRepository + TerraformLayer pointing to a NON-DEFAULT branch (e.g., "dev")
3. Let it sync once (will succeed - "already up-to-date")
4. Push a NEW commit to that branch
5. Wait for next reconciliation (or restart controller)
6. Bug appears - infinite clone-delete-clone loop

Captured Logs - v0.9.0 (Bug - Infinite Loop)

time="2025-12-08T14:55:39Z" level=info msg="cloning repository https://gitlab.com/.../starwatch-platform to /var/run/burrito/repositories/.../repository"
time="2025-12-08T14:55:41Z" level=info msg="pulling latest changes for repo ... on branch refs/heads/SWP-4205/burrito-for-tenant-manager-infra"
time="2025-12-08T14:55:42Z" level=warning msg="failed to pull latest changes for ref refs/heads/SWP-4205/burrito-for-tenant-manager-infra: non-fast-forward update, deleting local repository"
time="2025-12-08T14:55:42Z" level=error msg="failed to get revision bundle: non-fast-forward update, likely because of force-push, next run will re-clone"

# Loop repeats ~10 seconds later:
time="2025-12-08T14:55:52Z" level=info msg="cloning repository..."
time="2025-12-08T14:55:55Z" level=info msg="pulling latest changes..."
time="2025-12-08T14:55:55Z" level=warning msg="failed to pull... non-fast-forward update, deleting local repository"
# (continues forever...)

Captured Logs - With Fix (Success)

time="2025-12-08T14:57:04Z" level=info msg="cloning repository..."
time="2025-12-08T14:57:06Z" level=info msg="fetching latest changes for repo..."
time="2025-12-08T14:57:07Z" level=info msg="resetting to remote ref refs/remotes/origin/SWP-4205/burrito-for-tenant-manager-infra (871cdd5...)"
time="2025-12-08T14:57:07Z" level=info msg="stored new bundle..."
time="2025-12-08T14:57:07Z" level=debug msg="Repository sync completed"

Summary

Version New commits on remote Result
v0.9.0 No ✓ "already up-to-date"
v0.9.0 Yes ✗ Infinite clone-delete loop
With fix Yes ✓ Fetch + Reset works

@corrieriluca
Copy link
Member

I tested again but it succeeds to fetch the changes without re-cloning.

TBH I don't understand why I cannot reproduce the bug. I did exactly as you said:

  1. Create a layer pointing to feat/test (commit X)
  2. Push a commit to feat/test, last revision is now commit Y
  3. TerraformRepository syncs a few minutes later and succeeds to pull new commit Y:
time="2025-12-08T16:34:54Z" level=info msg="latest revision for repository burrito-project/burrito-demo ref feat/test is 2da3fe8c260867e60a443bc0512c1532f96117a3"
time="2025-12-08T16:34:54Z" level=info msg="repository burrito-project/burrito-demo is out of sync with remote for ref feat/test. Syncing..."
time="2025-12-08T16:34:54Z" level=info msg="repository already exists at /var/run/burrito/repositories/c34920efd7a71adeafa43acd204510347fd934f9e70e909a9878885e0bcc8938/repository, opening existing clone"
time="2025-12-08T16:34:54Z" level=info msg="fetching latest changes from remote"
time="2025-12-08T16:34:54Z" level=info msg="checking out branch refs/heads/feat/test"
time="2025-12-08T16:34:54Z" level=info msg="pulling latest changes for repo https://github.com/burrito-demo-org/website-infra.git on branch refs/heads/feat/test"
time="2025-12-08T16:34:54Z" level=info msg="repository is already up-to-date"
time="2025-12-08T16:34:54Z" level=info msg="stored new bundle for repository burrito-project/burrito-demo ref:feat/test revision:2da3fe8c260867e60a443bc0512c1532f96117a3"
time="2025-12-08T16:34:54Z" level=info msg="layer burrito-project/static-website annotated with new revision 2da3fe8c260867e60a443bc0512c1532f96117a3"

What surprises me the most is that, even after the delete, you cannot clone the repository again. That's strange because Burrito starts from scratch: it rm -rf the repo and clones it again 🤔

I'm gonna test your fix and come back to you, we need to ensure it does not create unintended side effects

@nerdeveloper
Copy link
Contributor Author

Sure, we use GitLab so that it may be a platform-specific problem, too. I don't know, but it's one of those edge cases that is hard to reproduce from what we can see

@corrieriluca
Copy link
Member

Okay, I'm gonna test with a test repo on gitlab.com to see if I can reproduce the issue

@nerdeveloper
Copy link
Contributor Author

nerdeveloper commented Dec 8, 2025

Additional Context: Multiple TerraformRepositories Sharing Same Git URL

One additional factor in our setup that might help reproduce this: we have multiple TerraformRepository CRDs pointing to the same Git repository URL, with TerraformLayers referencing different branches.

Our Configuration

# TerraformRepository 1
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRepository
metadata:
  name: my-app-repo-1
  namespace: burrito-demo
spec:
  repository:
    url: https://gitlab.com/my-org/my-infrastructure  # Same URL

---
# TerraformRepository 2 (SAME Git URL, different name)
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRepository
metadata:
  name: my-app-repo-2
  namespace: burrito-demo
spec:
  repository:
    url: https://gitlab.com/my-org/my-infrastructure  # Same URL!

---
# TerraformLayer 1 - references repo-1, branch A
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
  name: layer-1
  namespace: burrito-demo
spec:
  repository:
    name: my-app-repo-1
    namespace: burrito-demo
  branch: feature-branch-a
  path: terraform/stacks/app-1

---
# TerraformLayer 2 - references repo-2, branch B
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
  name: layer-2
  namespace: burrito-demo
spec:
  repository:
    name: my-app-repo-2
    namespace: burrito-demo
  branch: feature-branch-b
  path: terraform/stacks/app-2

Why This Matters

Since both TerraformRepositories have the same Git URL, they share the same cached directory (URL hash):

hash := fmt.Sprintf("%x", sha256.Sum256([]byte(p.RepoURL)))
p.workingDir = filepath.Join(repositoryDir, hash)

When both reconcile concurrently with different branches:

  1. Repo1 reconciles, checks out feature-branch-a, pulls
  2. Repo2 reconciles concurrently, tries to checkout feature-branch-b
  3. The SetReference() call creates a local branch without upstream tracking
  4. Subsequent Pull() fails with "non-fast-forward" because go-git cannot determine the merge strategy

This multi-repository scenario might explain why the bug is harder to reproduce with a single TerraformRepository.

@corrieriluca
Copy link
Member

@nerdeveloper Okay, a race condition between the two repositories sharing the same folder is likely to be the root cause 🤔

Why do you create multiple TerraformRepositories pointing to the same repo? Are they created in the same tenant/namespace or in different ones?

Either way, I don't think the right fix for this issue is to change the git pull + set reference logic.

A better fix would be to compute a better hash, to ensure a 1-to-1 mapping between TerraformRepositories resources and repository folders in the controller's pod. WDYT?

@nerdeveloper
Copy link
Contributor Author

nerdeveloper commented Dec 9, 2025

Hi @corrieriluca,

Thanks for looking deeper into this! I respectfully disagree that changing the hash is the right fix. We are a very large org. Here's why:

The Hash Fix Is Incomplete

Even with a unique hash per TerraformRepository, the bug still occurs with one TerraformRepository and multiple TerraformLayers on different branches:

TerraformRepository: my-infra-repo
  └── TerraformLayer 1: branch main,    path /stacks/prod
  └── TerraformLayer 2: branch develop, path /stacks/staging

When the controller reconciles my-infra-repo, it processes branches sequentially:

  1. Process main: checkout, pull → ✓ works
  2. Process develop: SetReference() creates branch without tracking → Pull() fails → ✗ BUG

No race condition. No shared cache. The bug is in the SetReference() + Pull() pattern itself.

Multi-Tenant Use Cases Should Be Supported

We shouldn't restrict users to one-repo-one-branch-one-layer. Common patterns include:

  • Mono-repo with multiple environments: One repo, layers for main (prod), develop (staging), feature/* (preview)
  • Shared infrastructure repo: Multiple teams/tenants using the same repo with different branches
  • GitOps with feature branches: Temporary layers for PR previews

Even ArgoCD supports one Git repository serving multiple Applications with different branches/paths. Burrito should too.

Why Fetch + Reset Is The Right Fix

  1. Solves both scenarios: Multi-repo race condition AND multi-branch within a single repo
  2. More robust: Bypasses go-git's problematic Pull() merge logic entirely
  3. Deterministic: Always syncs to the exact remote state, no merge strategy guessing
  4. Efficient: Still allows cache sharing when appropriate

The current Pull() approach assumes a tracking configuration exists, but SetReference() doesn't provide one. Rather than working around this with isolation, we should fix the sync logic itself.

Happy to discuss further or add tests if needed!

@corrieriluca
Copy link
Member

Even with a unique hash per TerraformRepository, the bug still occurs with one TerraformRepository and multiple TerraformLayers on different branches

I am not sure of this assertion because this is the specific thing I tried to reproduce and everytime I tried no bug occured.

However, okay I hear your case on supporting the pattern of multiple TerraformRepo for same URL. Burrito should allow this.

I'm gonna test your PR in my sandbox environment before merging.

@bradenwright
Copy link

bradenwright commented Dec 24, 2025

fwiw I worked with @nerdeveloper and I just had a situation where I had a working branch going, a cd process pushed to the branch I resolved the conflict and pushed and that gave me the error. We can see if there is a good way to reproduce it / narrow it down, but just figured it was worth mentioning it came up again and I didn't do a push force, every thing I did was a standard push / work flow. Specifically I think when I got this error that this issue came up, but I can't say for sure....

braden@bradens-MacBook-Pro helm % git push
Enumerating objects: 24, done.
Counting objects: 100% (24/24), done.
Delta compression using up to 12 threads
Compressing objects: 100% (14/14), done.
Writing objects: 100% (14/14), 3.75 KiB | 3.75 MiB/s, done.
Total 14 (delta 10), reused 0 (delta 0), pack-reused 0
remote: 
remote: To create a merge request for braden-test, visit:
remote:   https://gitlab.com/co/cloud/my/my-platform/-/merge_requests/new?merge_request%5Bsource_branch%5D=braden-test
remote: 
To gitlab.com:co/cloud/my/my-platform.git
   48c6ae1..8187b79  braden-test -> braden-test
braden@bradens-MacBook-Pro helm % git push
To gitlab.com:co/cloud/my/my-platform.git
 ! [rejected]        braden-test -> braden-test (fetch first)
error: failed to push some refs to 'gitlab.com:co/cloud/my/my-platform.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

I replaced the / company name, we'll have to see if we can reproduce the issue or not though.

…st-forward bug

The go-git library's Pull() function has a known issue (padok-team#358) where it
returns 'non-fast-forward update' errors even when a fast-forward merge
is possible. This happens when local branches don't have proper upstream
tracking configured.

This commit replaces Pull() with an explicit Fetch() + Hard Reset approach:
1. Fetch latest changes from remote with Force=true
2. Get the remote reference for the target branch
3. Hard reset the worktree to the remote ref
4. Update the local branch reference

This approach:
- Avoids the go-git Pull() bug entirely
- Is more explicit about what we want (always match remote)
- Handles force-push scenarios correctly
- Works regardless of tracking configuration

Fixes: go-git/go-git#358
@nerdeveloper
Copy link
Contributor Author

Hey Braden, I looked into this and noticed the deployment was using the wrong image tag - v0.8.1-beta was pushed on Dec 3rd, before this PR existed. Rebuilt with this fix as v0.9.2-beta and it's working now. TerraformRepositories sync correctly, no more non-fast-forward errors. Will share more details internally.

@nerdeveloper nerdeveloper force-pushed the fix/replace-pull-with-fetch-reset branch from 19ac52a to b0e6b6f Compare December 29, 2025 19:19
@nerdeveloper
Copy link
Contributor Author

In the meantime, @corrieriluca, this is another issue @bradenwright faced without my fix, which is this PR. I also tested with the latest version of Burrito and still got the same error. Hopefully, this can be merged into the upstream. Happy holidays, folks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants