Skip to content

Conversation

@dannon
Copy link
Member

@dannon dannon commented Oct 6, 2025

Summary

  • FastAPI backend with Docker and nginx reverse proxy
  • Redis caching infrastructure with async operations
  • Health check and cache management endpoints
  • Version syncing from package.json via Docker build args
  • Frontend version display component with graceful fallback
  • E2E tests for API health and version display

@d-callan
Copy link
Collaborator

i just want to make sure i understand well where we are here.. this is effectively a prerequisite to meaningfully incorporating ai, whatever john produces for a sra metadata db, and makes it easy to support the endpoints we need for ncbi linkouts. yes?

what makes this a draft?

@dannon dannon marked this pull request as ready for review October 22, 2025 15:42
Copy link
Collaborator

@NoopDog NoopDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dannon, thanks so much for creating this. I would really love to get a minimal Python backend up and running. Looking at this, I see a few changes I would like to request. I will outline them here as they are a bit structural rather than line by line. So, in the order of the changed-file outline:

  1. Can you remove the VersionDisplay component? It looks like this duplicates or can be merged with the existing version display, and can be done in a separate PR.

  2. In the /backend directory, can you move the contents there one level down in the folder hierarchy to make room for other backend applications, please? For example, I can see us implementing an "aI-service", a "api-service", and a "next-js-service" as discussed. Looks like this one is the api-service?

  3. In /scripts, please co-locate docker-build.sh with the docker app that it will build.

  4. In the top-level directory, can you

    1. Move docker-compose.yml into the backend directory
    2. For nginx.conf, please move this to the backend directory and remove the section about hosting the Next.js static site content. CloudFront currently serves the static site via S3. No need to change this to get an API server running.
    3. Remove Redis from this PR. Redis could always come in later if needed, but it doesn't seem to be required now, and it could be excluded from a minimal PR. Additionally, CloudFront caches items, depending on the cache control and TTL settings.

Thanks for adding playwright and updating the Jest config.

Thanks again for preparing this.

D

@NoopDog
Copy link
Collaborator

NoopDog commented Oct 23, 2025

OK, I updated the requested changes. I removed the request to delete Redis from this PR.

@dannon
Copy link
Member Author

dannon commented Oct 23, 2025

I want to push back again on the multi-service architecture suggestion, at least for now. The functionality here—LLM search/discovery, Galaxy tool execution, ENA data retrieval, NCBI linkouts—belongs behind a single API:

User describes problem (LLM) → Gets workflow suggestions (LLM + catalog) 
  → Retrieves sequencing data (ENA) → Executes Galaxy tools (BioBlend) 
    → Links to NCBI resources

These aren't independent services—splitting them would add network hops and API boundaries to what should be a seamless user experience. These features share:

  • User identity and authentication (single auth boundary, shared user context across all operations)
  • Redis caching (LLM responses, catalog metadata, job status)
  • Configuration and workflow catalog logic
  • Deployment lifecycle

Distributing auth and user context across multiple services would require JWT validation in each service, shared auth middleware, and complex session management—significant overhead for features that naturally operate on the same user request.

FastAPI handles this elegantly through routers:

backend/app/api/v1/
├── llm.py          # Natural language dataset search
├── galaxy.py       # Galaxy tool execution and monitoring
├── ena.py          # Data retrieval
├── more.py         # the next feature

The frontend stays static (Next.js → S3 → CloudFront). There's no requirement for SSR/ISR that would justify a "next-js-service" runtime.

As a likely responsible party for deploying and managing this infrastructure, it's a huge benefit to have a single-service architecture that's straightforward to deploy, monitor, and debug. If we hit actual technical constraints requiring service separation, I'm happy to do the refactoring then. YAGNI—we can always extract a service later if needed, but we can't easily "un-microservice" once we've committed to that complexity. Start simple.

nginx and Static Files

I'd like nginx to serve static files for local dev/E2E testing while production uses CloudFront → S3. This provides a single URL locally (http://localhost:3000), enables E2E tests via docker-compose, and maintains dev/prod parity. The nginx.conf supports both scenarios—static serving only happens when files are volume-mounted for local dev.

VersionDisplay Component

You're right about duplication with the existing version chip. I added it to demonstrate backend connectivity and show server version (critical for debugging version mismatches), but the cleaner solution is enhancing the existing version chip's tooltip. I'll split that into a separate PR to keep this focused on backend infrastructure.

Proposed Structure

/backend/
  ├── app/
  │   ├── api/v1/     # Feature routers
  │   ├── core/       # Shared config, cache, dependencies, auth
  │   └── main.py
  ├── Dockerfile
  ├── docker-compose.yml
  ├── nginx.conf
  └── README.md

I'm happy to accommodate other structural preferences (directory naming, moving docker-compose/nginx.conf into backend/, etc.), but I'd like us to more completely understand technical requirements driving the need for service separation before committing to that complexity.

@d-callan
Copy link
Collaborator

thanks for the detail dannon. im ok w leaving decisions about additional services to a future ticket/ pr. agree that can be refactored as needed, once weve had time to discuss those ideas separately.

Creates the base infrastructure layer for BRC Analytics backend:
- FastAPI application with health and cache endpoints
- Redis caching service with TTL management
- Docker Compose setup (backend + redis + nginx)
- nginx reverse proxy configuration
- uv for dependency management
- Ruff formatting for Python code
- E2E health check tests

This branch serves as the foundation for feature branches to build upon.
Adds GET /api/v1/version endpoint that returns:
- API version (from APP_VERSION env var)
- Environment (development/production)
- Service name

This provides a simple way for the frontend to display version info
and demonstrates the backend infrastructure is working.
Use APP_VERSION build arg to inject version from package.json into
backend container. Adds docker-build.sh script to automate this.
All endpoints now use settings.APP_VERSION instead of hardcoded values.
Remove VersionDisplay component from footer branding as it duplicates
the existing version chip functionality. The version chip already displays
build date, version, and git commit info. Backend version display will
be added to the existing version chip in a separate PR to avoid duplication.

Addresses PR galaxyproject#892 feedback.
Move Docker-related files into the backend directory to co-locate
infrastructure with the backend code:
- docker-compose.yml → backend/docker-compose.yml
- nginx.conf → backend/nginx.conf
- scripts/docker-build.sh → backend/docker-build.sh

Update all path references in:
- backend/README.md: Update build script path
- backend/docker-compose.yml: Update context and volume paths
- backend/docker-build.sh: Update package.json reference

This keeps backend infrastructure self-contained and makes the
deployment setup clearer.

Addresses PR galaxyproject#892 feedback.
@dannon dannon force-pushed the feature/backend-infrastructure branch from a5f4d11 to fcb8d9a Compare October 23, 2025 18:26
@NoopDog
Copy link
Collaborator

NoopDog commented Oct 23, 2025

Hi @dannon, thanks for the update. Yes, happy to discuss splitting the current backend app into AI- and API-focused apps (running on the same machine and isolated Docker network) in a future PR as @d-callan suggests.

  1. Version display - thanks

  2. Thanks - site hosting in nginx is fine locally as long as it won't affect devs who will run the next.js server locally on 3000

  3. Regarding the backend folder structure. I think it makes sense to allow for separate apps here (that could be deployed on the same virtual machine using compose). For example, I will want to add a next.js server shortly to support faster builds, image optimization, and NextAuth login via Google. The next-js-service will require its own folder in the backend directory. I would not want to have to refactor the backend folder to fit this in. Additionally, as the app grows, there may be other applications/processes that we would like to isolate into their own applications with their own lifecycle, package dependencies, security requirements, etc. We can decide then if we want the same server (VM), a different server, etc. For now, I am requesting we make room for the eventuality.

I am requesting that we support a folder structure like:

backend/
├── docker-compose.yml            # orchestrates api + web, redis
├── nginx.conf                    # reverse proxy routes for api/web
├── README.md                     # documentation for the backend stack
│
├── api/                          # FastAPI backend
│   ├── app/
│   │   ├── api/v1/               # routers
│   │   ├── core/                 # config, dependencies, auth, etc.
│   │   └── main.py
│   ├── pyproject.toml            # or requirements.txt
│   ├── Dockerfile                # builds & runs FastAPI
│   └── README.md
│
└── web/                          # Next.js backend server ( NextAuth / ISR / Image optimization)
    ├── src/                      
    ├── public/
    ├── package.json
    ├── next.config.ts
    ├── Dockerfile                # builds & runs the Next.js backend
    └── README.md

For this PR, we would only need like:

backend/
├── docker-compose.yml            # orchestrates api + web, redis
├── nginx.conf                    # reverse proxy routes for api/web
├── README.md                     # documentation for the backend stack
│
├── api/                          # FastAPI backend
│   ├── app/
│   │   ├── api/v1/               # routers
│   │   ├── core/                 # config, dependencies, auth, etc.
│   │   └── main.py
│   ├── pyproject.toml            # or requirements.txt
│   ├── Dockerfile                # builds & runs FastAPI
│   └── README.md

@d-callan
Copy link
Collaborator

I think next js server needs a separate issue and pr, don't want to hold this up for that. More important, in my mind, to move on to talking about ncbi link out stuffs.

As for dir tree, i see it as six of one half dozen of another. If we assume everyone is willing to discuss and iterate, it doesn't seem a blocker?

Let's try to get this done please.

@NoopDog NoopDog requested a review from Copilot October 23, 2025 23:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR establishes the foundational backend infrastructure for BRC Analytics, introducing a FastAPI-based REST API with Redis caching, Docker containerization with nginx reverse proxy, version syncing from package.json, and comprehensive E2E testing.

Key Changes:

  • FastAPI backend with health check, cache management, and version endpoints
  • Redis-based caching service with TTL support and async operations
  • Docker Compose setup with nginx reverse proxy, backend service, and Redis
  • E2E tests for API health checks and version display functionality

Reviewed Changes

Copilot reviewed 27 out of 30 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
backend/app/main.py FastAPI application setup with CORS middleware and API router registration
backend/app/core/config.py Application settings management with environment variable loading
backend/app/core/cache.py Redis cache service implementation with async operations and TTL management
backend/app/core/dependencies.py Dependency injection for cache service singleton
backend/app/api/v1/health.py Health check endpoint for service monitoring
backend/app/api/v1/cache.py Cache health endpoint with connectivity verification
backend/app/api/v1/version.py Version information endpoint
backend/Dockerfile Multi-stage Docker build using uv package manager
backend/docker-compose.yml Service orchestration for nginx, backend, and Redis
backend/nginx.conf Nginx configuration for reverse proxy and static file serving
backend/docker-build.sh Build script to sync version from package.json
backend/pyproject.toml Python project dependencies and configuration
backend/ruff.toml Ruff linter configuration
tests/e2e/03-api-health.spec.ts E2E tests for API health and documentation endpoints
playwright.config.ts Playwright test configuration with dual web server setup
pyproject.toml Root project configuration excluding backend directory
scripts/format-python.sh Python formatting script updated to include backend
package.json Updated test script to pass with no tests
site-config/brc-analytics/local/.env Added backend URL and Galaxy environment variables

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"""Generate a cache key from prefix and parameters"""
# Sort parameters for consistent keys
param_str = json.dumps(params, sort_keys=True, default=str)
hash_val = hashlib.md5(param_str.encode()).hexdigest()[:16]
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using MD5 for hashing cache keys is not secure for cryptographic purposes. While acceptable for cache key generation, consider using a more secure hash function like SHA-256 or explicitly document that this is only for cache key generation, not security.

Suggested change
hash_val = hashlib.md5(param_str.encode()).hexdigest()[:16]
hash_val = hashlib.sha256(param_str.encode()).hexdigest()[:16]

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@NoopDog NoopDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then, in the interest of moving things along, I am happy to deal with the directory structure whenever we add a new service. I will go ahead and approve this PR.

One more thing to note is that it seems like a security issue to expose redis and backend outside the host. Is that happening here? It seems only nginx should be accessible outside the host. Should this be fixed here, or would it be addressed by additional networking config on actual deployment?

@d-callan
Copy link
Collaborator

Thank you! Much appreciated 👍 let's have a few mins next week, maybe Tues, to discuss and ticket any remaining todos

@d-callan d-callan linked an issue Oct 24, 2025 that may be closed by this pull request
@d-callan
Copy link
Collaborator

ok, i added an attached ticket for our future selves sake. when i say that i mean i pointed claude at the pr and told it to draft content for an accompanying ticket that summarizes where we landed and make it sounds like reqs lol then i just added the open questions section and a note to the dir structure section 😂 hopefully that makes it a little easier to parse if needed later. but also, maybe, can serve as a thing we can use to facilitate conversation about what our process should look like going forward. but we can talk about that later. so..

@dannon please look at the ticket and if claude did a bad job of anything feel free to update it to better reflect the current state of this pr, and then merge when youre ready. 🎉

thanks again everyone 💐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Backend Infrastructure Foundation

3 participants