-
Notifications
You must be signed in to change notification settings - Fork 110
Open
Labels
Description
Problem Description
The TeX Live installation in install_pdf_deps.sh
is experiencing intermittent network failures that cause Docker build failures, as seen in CI runs.
Recent Failure Example:
TLPDB::_install_data: downloading did not succeed (download_file failed) for https://us.mirrors.cicku.me/ctan/systems/texlive/tlnet/archive/texlive-scripts.tar.xz
Installation failed.
Rerunning the installer will try to restart the installation.
Or you can restart by running the installer with:
install-tl --profile installation.profile [YOUR-EXTRA-ARGS]
error: build error: building at STEP "RUN ./utils/install_pdf_deps.sh": while running runtime: exit status 1
Root Cause Analysis
The current TeX Live installation process lacks:
- Network Resilience: No retry mechanisms for failed downloads
- Mirror Fallback: Single point of failure on specific CTAN mirrors
- Timeout Handling: No protection against hanging downloads
- Error Recovery: Limited ability to resume failed installations
- Dependency Caching: No mechanism to avoid repeated downloads
Current Implementation Issues
Based on the error message, the installation process:
- Relies on external mirrors that may be temporarily unavailable
- Lacks robust download retry mechanisms
- Has no fallback strategies for mirror failures
- Provides limited error diagnostics for network issues
Solution Options
Option 1: Enhanced Network Resilience (Recommended)
Implement comprehensive retry and fallback mechanisms:
# Enhanced installation with retry logic
install_texlive_with_retries() {
local max_retries=3
local retry_delay=30
local mirrors=(
"https://mirror.ctan.org/systems/texlive/tlnet"
"https://ctan.math.utah.edu/ctan/tex-archive/systems/texlive/tlnet"
"https://mirrors.rit.edu/CTAN/systems/texlive/tlnet"
"https://us.mirrors.cicku.me/ctan/systems/texlive/tlnet"
)
for mirror in "${mirrors[@]}"; do
for attempt in $(seq 1 $max_retries); do
echo "Attempting TeX Live installation from $mirror (attempt $attempt/$max_retries)"
if install-tl --location $mirror --profile installation.profile; then
echo "TeX Live installation successful from $mirror"
return 0
fi
if [ $attempt -lt $max_retries ]; then
echo "Installation failed, retrying in ${retry_delay}s..."
sleep $retry_delay
fi
done
echo "All attempts failed for $mirror, trying next mirror..."
done
echo "ERROR: TeX Live installation failed from all mirrors"
return 1
}
Option 2: Container-Based TeX Live Installation
Use pre-built TeX Live containers or packages:
# Alternative: Use system TeX Live packages
RUN dnf install -y texlive-scheme-medium texlive-collection-latexextra \
&& dnf clean all
Option 3: Cached Installation Approach
Implement local caching and validation:
# Cache-aware installation
TEXLIVE_CACHE_DIR="/tmp/texlive-cache"
TEXLIVE_INSTALLER_URL="https://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz"
download_with_cache() {
local url="$1"
local cache_file="$2"
local max_retries=3
if [ -f "$cache_file" ]; then
echo "Using cached file: $cache_file"
return 0
fi
for attempt in $(seq 1 $max_retries); do
if wget --timeout=300 --tries=1 "$url" -O "$cache_file"; then
return 0
fi
[ $attempt -lt $max_retries ] && sleep 30
done
return 1
}
Option 4: Hybrid Installation with Validation
Combine multiple approaches with comprehensive validation:
# Hybrid approach with validation
install_texlive_hybrid() {
# Try system packages first
if command -v dnf &> /dev/null; then
if dnf install -y texlive-scheme-basic; then
echo "System TeX Live packages installed successfully"
return 0
fi
fi
# Fallback to network installation with retries
install_texlive_with_retries
}
Acceptance Criteria
Core Requirements
- TeX Live installation succeeds consistently across different network conditions
- Automatic retry mechanisms for failed downloads
- Multiple mirror fallback support
- Comprehensive error logging and diagnostics
- Installation time optimization through caching
Robustness Features
- Timeout protection for hanging downloads
- Partial download recovery capabilities
- Network connectivity validation before installation
- Graceful degradation when mirrors are unavailable
Monitoring and Diagnostics
- Detailed logging of installation attempts and failures
- Mirror response time tracking
- Installation success/failure metrics
- Clear error messages for troubleshooting
Implementation Guidance
Phase 1: Basic Resilience
- Add retry logic to existing installation process
- Implement timeout protection for downloads
- Add basic error handling and logging
Phase 2: Mirror Fallback
- Configure multiple CTAN mirrors
- Implement automatic failover between mirrors
- Add mirror health checking
Phase 3: Caching and Optimization
- Implement local caching for downloaded packages
- Add checksum validation for cached files
- Optimize installation profile for required packages only
Phase 4: Alternative Approaches
- Evaluate system package manager alternatives
- Consider container-based TeX Live distributions
- Implement hybrid installation strategies
Testing Approach
Network Resilience Testing
- Test installation with simulated network failures
- Verify retry mechanisms work correctly
- Test timeout handling for slow mirrors
Mirror Fallback Testing
- Test with individual mirrors disabled
- Verify automatic failover functionality
- Test with all mirrors temporarily unavailable
Performance Testing
- Measure installation time improvements
- Test caching effectiveness
- Verify resource usage optimization
Related Issues
- Previous install_pdf_deps.sh improvement: Issue Investigate cleaner alternatives to directory move operation in install_pdf_deps.sh #1291 (directory move operation alternatives)
- Build robustness patterns: Similar to Docker glob pattern improvements (🐛 Docker build robustness: rm glob patterns fail on empty matches in Jupyter images #1337)
Context
- PR: RHOAIENG-28774: update Python 3.12 codeserver image Dockerfile for using TARGETARCH #1357 - RHOAIENG-28774: update Python 3.12 codeserver image Dockerfile for using TARGETARCH
- Reporter: @jiridanek
- CI Failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_notebooks/1357/pull-ci-opendatahub-io-notebooks-main-images/1942865679711473664
- Affected Script: jupyter/utils/install_pdf_deps.sh
Impact
Network-related TeX Live installation failures cause:
- CI pipeline interruptions and delays
- Inconsistent Docker build success rates
- Developer productivity loss due to flaky builds
- Potential production deployment risks for PDF functionality
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog