-
Notifications
You must be signed in to change notification settings - Fork 297
How to build and run in Docker
This document describes the process of building the OpenWayback from source and running, all in the Docker environment. This can be very handy for development and testing in different environments. The OpenWayback source code includes a Dockerfile. Generated Docker image is kept minimal which makes it suitable for running in production as well.
Docker (version 17.05 or later is required for building the image).
Acquire the source code.
$ git clone https://github.com/iipc/openwayback.git
$ cd openwayback
Make any changes to the source code if needed. Then build the docker image.
$ docker image build -t openwayback .
This will download dependencies, compile the code, run tests, package, and place necessary components in appropriate places to build a minimal Docker image with the name openwayback.
This process may take a while (depending on the network bandwidth and processor speed).
It utilizes Multi-Stage Build feature of Docker to exclude compile-time environment and dependencies from the final image, which makes it both, secure and smaller in size.
By default, the source is built using the latest versions of Maven and JDK then the image is packaged with the latest versions of Tomcat and JRE.
However, it is possible to build and package with custom combinations these dependencies using MAVEN_TAG and TOMCAT_TAG build arguments.
These variations can be helpful for both testing and production needs without making any changes in the Dockerfile.
$ docker image build \
--build-arg=MAVEN_TAG=3.5-jdk-7 \
--build-arg=TOMCAT_TAG=7-jre7-alpine \
-t openwayback:custom .
Above command would build an image named openwayback with tag custom where the source code would be built using Maven 3.5 with JDK 7 and then the built artifacts will be packaged in a small Alpine Linux image with Tomcat 7 and JRE 7.
See available values of MAVEN_TAG and TOMCAT_TAG build arguments.
Another build argument SKIP_TEST is made available which is set to false by default.
To skip tests, use --build-arg=SKIP_TEST=true argument in the Docker build command.
The default configuration of the OpenWayback uses the automatic BDB Indexer and expects WARC files at ${WAYBACK_BASEDIR}/files1/ or ${WAYBACK_BASEDIR}/files2/.
By default the WAYBACK_BASEDIR is set to /data volume in the Docker image.
Create necessary directory structure on the host machine for testing and populate it with some test files.
$ mkdir -p /tmp/owb/files1
$ cp /path/to/sample/*.warc /tmp/owb/files1/
Run a Docker container with appropriately mounted volumes and port mapping. By default the container would run the Tomcat server.
$ docker container run -it --rm -v /tmp/owb:/data -p 8080:8080 openwayback
Once the WARC files are indexed, they should be ready for lookup at http://localhost:8080/.
The OpenWayback allows certain configuration overrides using environment variables that can be customized when running a container, but these customization are very limited.
WAYBACK_HOME=/usr/local/tomcat/webapps/ROOT/WEB-INF
WAYBACK_BASEDIR=/data
WAYBACK_URL_SCHEME=http
WAYBACK_URL_HOST=localhost
WAYBACK_URL_PORT=8080
WAYBACK_URL_PREFIX=http://localhost:8080
However, by strategically mounting certain volumes, it is possible to run the OpenWayback server with custom configuration files.
$ docker container run -it --rm -p 8080:8080 \
-v /tmp/owb:/data \
-v /path/to/custom/wayback.xml:/usr/local/tomcat/webapps/ROOT/WEB-INF/wayback.xml \
-v /path/to/custom/CDXCollection.xml:/usr/local/tomcat/webapps/ROOT/WEB-INF/CDXCollection.xml \
openwayback
This way of mounting configuration files can be handy for testing. However, for production purposes it is better to create derived image and override configuration files with custom files.
The Docker image contains various executable utilities with their necessary dependencies that can be used in one-off mode.
The following command illustrates one possible usage of the cdx-indexer to index WARC files into CDX files on the host machine with appropriate volume mounting while utilizing a one-off container.
$ docker container run -it --rm -v /tmp/owb:/data openwayback cdx-indexer /data/files1/sample1.warc > /tmp/owb/index1.cdx
Alternatively, access the bash prompt of the container to run utility scripts inside or perform debugging.
$ docker container run -it --rm -v /tmp/owb:/data openwayback bash
[CONTAINER ID]# cdx-indexer /data/files1/sample1.warc > /data/index1.cdx
Copyright © 2005-2022 [tonazol](http://netpreserve.org/). CC-BY. https://github.com/iipc/openwayback.wiki.git