Skip to content

Conversation

@ltetrel
Copy link

@ltetrel ltetrel commented Mar 26, 2020

Hi everyone,

We are reaching out regarding a potential contribution from the Neurolibre team:

The contribution concerns allowing some actions before building a user image (in our case pulling databases on our server using repo2data). For now we were using a pre_span_hook inside the hub. However, this leads to http timeout for big datasets Increasing the http timeout would be far from ideal. Moreover, running repo2data every time a user is assigned to a pod is not optimal either (when it could be done just once every time the repo changes).

I first thought we could use a pod_preset that would inject data like env or init_container into the build pod (check our conversation here), but it is not possible easily. So I started working on a fork to modify directly the build pod spawn. And instead of hard coding stuff, I fought it could be usefull for the community to have additional volumes in the build pod, and init_container before repo2docker container is called (and also making the REPO_URL accessible to containers inside the build pod).

So I added three options to binderhub:

  • extra_volume_build: specify the folder to be mounted (name and path)
  • extra_volume_mount_build: specify where to mount the folder in the container (name and mountPath)
  • init_container_build: defines the image and args for the init container (run before repo2docker)

Here is an example of a working conf.py:

c.BinderHub.extra_volume_build = {'name': "extra-volume" , 'path': "/DATA"}
c.BinderHub.extra_volume_mount_build = {'name': "extra-volume" , 'mountPath': "/data"}
c.BinderHub.init_container_build = {'image': 'binder-registry.conp.cloud/repo2data:v2.2',
                                    'args': ['-r', '$(REPO_URL)']
                                   }

I still need help to properly log the content of the init container (nothing is logged into the webpage).

We'd be glad to hear your opinion to improve our solution and translate it to a contribution, if the community regards this as a useful addition.

… config; REPO_URL env accesible to pod

linting

adding optionnal volumes and init_containers to binderhub config
@betatim betatim changed the title Optionnal build volumes and build init_containers, REPO_URL accessible to pod Optional build volumes and build init_containers, REPO_URL accessible to pod Mar 27, 2020
@betatim
Copy link
Member

betatim commented Mar 27, 2020

First reaction: sounds reasonable and like a good idea.

I think the most controversial things in this PR are:

  • variable naming
  • can we expose this via the helm chart so that the format there looks like what people expect
  • some documentation for use cases so others know when to use this

So you see, really not a lot of controversy :) Maybe we can start with the last bullet point as it would also help me to think this through.

@ltetrel
Copy link
Author

ltetrel commented Mar 27, 2020

Hi,

* some documentation for use cases so others know when to use this

I can start draft something for the documentation, will add a commit for doc changes.

* can we expose this via the helm chart so that the format there looks like what people expect

Oh I thought the parameters were already somehow exposed to the helm chart.. I don't know how to do that but maybe @agahkarakuzu you could help?

ltetrel and others added 3 commits March 30, 2020 17:55
* add pre-build steps doc

* Update customizing.rst

* Update customizing.rst

* Update customizing.rst
@ltetrel
Copy link
Author

ltetrel commented Apr 6, 2020

We finished drafting a doc with @agahkarakuzu, we modified doc/customizing.rst.
What do you think ?

@ltetrel
Copy link
Author

ltetrel commented Apr 27, 2020

Hello,
Just a friendly remainder about this current PR :)

@pfisterer
Copy link

This would also be really useful to inject custom CA certs into the build pod. I'm trying to use a cluster-internal docker registry with a private CA cert.

I've managed to inject the CA's cert into binderhub (cf. https://github.com/pfisterer/edsc-k8s-playbook/blob/54ec18834b19ef10b909c3a89f245047f12262d6/files/binderhub.yaml#L69). However, injecting them into the build pod didn't work.

Maybe I could also use PodPresets, but they are not enabled in my current k8s environment.

I guess being able to add initContainers, extraVolumeMounts, and extraVolumes should be available for all pods being created by Helm or by the KubeSpawner.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/introduce-yourself/17/194

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/feature-idea-jupyterhub-binderhub-jupyter-book-as-a-publishing-platform/8359/3

@ltetrel
Copy link
Author

ltetrel commented Nov 10, 2021

FYI: we are relying mostly on jupyterhub customization.
Running our process just before user session starts:
https://github.com/neurolibre/terraform-binderhub/blob/99c253a06d21fb05be2335a5ad8a2e2101f0f056/terraform-modules/binderhub/assets/config.yaml#L109-L116
https://github.com/neurolibre/terraform-binderhub/blob/master/terraform-modules/binderhub/assets/jb_build.bash
Because our data dir can be exposed (this causes security concerns) we are also using an extraConfig (still jupyterhub):
https://github.com/neurolibre/terraform-binderhub/blob/99c253a06d21fb05be2335a5ad8a2e2101f0f056/terraform-modules/binderhub/assets/config.yaml#L26-L77

Still this raise the following issues:

  1. Security concerns of exposing our backend storage
  2. Bad optimization since our processes are run each time a user starts his repo, where with this PR it could be run just once during build

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/feature-idea-jupyterhub-binderhub-jupyter-book-as-a-publishing-platform/8359/8

@manics
Copy link
Member

manics commented Feb 17, 2022

@ltretrel We're planning to merge an auto-formatting PR in two weeks #1381

Since the auto-formatting will affect most files we can give you some help with rebasing this PR after the merge if necessary.

@ltetrel
Copy link
Author

ltetrel commented Feb 17, 2022

Hey @manics,
So for now we found a way to do what we want without the need for this PR thanks to recent changes on jupyterhub code (as explained in my last message) .
It is also a very old development that needed help and good knowledge on helm variable exposing, binderhub source code, that I don't really have, so I am okay with closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code:python Python changes. new

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

7 participants