Skip to content

Conversation

eminaruk
Copy link

Video output saving on ultralytics_file_example.py & draw window / zone -> JSON saving in draw_zones.py

Description

Summary / change:
This change contains two related enhancements:

  1. draw_zones.py: The drawing UI/window now adapts to the input video resolution so the drawing canvas never overflows the physical screen. The code computes a scale factor relative to an allowed maximum window size (based on the video resolution and the detected screen area), displays a scaled canvas for the user to draw zones (polygons), and — after drawing — rescales the drawn coordinates back to the original video resolution and writes them to a JSON file. This ensures the saved zone coordinates are accurate for the original video size and that the interactive window is always visible on-screen.

  2. ultralytics_file_example.py: Added video output saving capability. The detection/rendering loop now initializes an OpenCV VideoWriter (configurable codec, fps and output path), writes processed frames to a file, and cleanly finalizes the writer on exit. This allows generated/annotated video output to be persisted as a standard video file (MP4) for review or downstream pipelines.

Motivation & context:

  • Prevent user frustration where draw UI extends off-screen for high-resolution videos (UX improvement).
  • Persist annotated output so users can review detection overlays and created zones offline and share results.
  • Useful when integrating with pipelines that expect zone coordinates in video-native coordinates (not canvas-scaled).
  • Implementation follows OpenCV common patterns for namedWindow + resizeWindow and VideoWriter usage.

(References: OpenCV resizeWindow/namedWindow and VideoWriter usage patterns; ultralytics inference loop that yields frames to be rendered and saved.)

What is saved in the JSON:
Each zone is stored as a polygon array in video native coordinates (pixels) so the JSON can be immediately used for runtime logic (counting, alerts, cropping). Example schema below.

List any dependencies that are required for this change

  • opencv-python (cv2) — for windowing, drawing and VideoWriter.
  • numpy — coordinate arrays / transforms.
  • pathlib / json (stdlib) — path handling and JSON writing.
  • ultralytics (or whichever detection package used previously) — unchanged, only integrated with writer.

Minimum recommendations:

  • opencv-python >= 4.5
  • numpy >= 1.19

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Manual test (draw_zones):

  1. Run with a sample video:
python scripts/draw_zones.py --source_path tests/assets/sample_1280x720.mp4 --zone_configuration_path zones_1280x720.json
  1. The script should:

    • Detect the input video resolution (e.g. 1280×720).
    • Compute a scaling factor so the displayed drawing window fits the screen (e.g. scale down to 960×540 if needed).
    • Open the interactive drawing window where you can draw polygons.
    • After you finish and press the configured save key (s), the script rescales drawn coordinates back to 1280×720 and writes them to zones_1280x720.json.
  2. Verify JSON content (example below) and visually validate by running a small overlay script (below) that loads the JSON and draws zones on the original video frames.

Manual test (ultralytics_file_example):

  1. Run detection + save:
python ultralytics_file_example.py --source_video_path tests/assets/sample_1280x720.mp4 --zone_configuration_path zones_1280x720.json --output_video_path output/annotated_1280x720.mp4
  1. The script should:

    • Initialize a cv2.VideoWriter with matching size (1280×720), fps (inferred from the source), and codec (H264 with XVID fallback).
    • For each processed frame, write the annotated frame to the writer.
    • Release the writer gracefully on completion or interruption.
  2. Play output/annotated_1280x720.mp4 — annotations and zones should align with the original video.

Quick validation overlay script (example):

# validate_overlay.py (run after draw_zones saved zones JSON)
import cv2, json
from pathlib import Path

video_path = "tests/assets/sample_1280x720.mp4"
zones_json = "zones_1280x720.json"

with open(zones_json, "r") as f:
    zones = json.load(f)  # expect list of polygon arrays

cap = cv2.VideoCapture(video_path)
out = None
fourcc = cv2.VideoWriter_fourcc(*"H264")
fps = cap.get(cv2.CAP_PROP_FPS) or 25
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter("output/overlay_check.mp4", fourcc, fps, (w, h))

while True:
    ret, frame = cap.read()
    if not ret: break
    for polygon in zones:
        # polygon is array of [x,y] coordinates
        if len(polygon) > 2:
            pts = np.array(polygon, np.int32)
            cv2.polylines(frame, [pts], True, (0,255,0), 2)
    out.write(frame)

cap.release()
out.release()

Automated / unit test ideas:

  • Create a synthetic frame (e.g., 320×240) and a mock draw input (predefined scaled coordinates). Run the rescaling function and assert that computed native coordinates equal expected values.
  • Validate that writing frames using VideoWriter results in a readable mp4 file (use ffprobe or cv2 to open it in a test).

Any specific deployment considerations

  • Screen / DPI / multi-monitor: Determining the "maximum allowed window size" must handle OS DPI scaling and multi-monitor setups. On some high-DPI systems, logical pixels vs physical pixels cause differences — consider using a platform DPI-aware API or a conservative scale factor.
  • Video codecs: cv2.VideoWriter requires available codecs on the host. H264 is commonly available, but some environments require ffmpeg or platform-specific codecs. The implementation includes XVID fallback for better compatibility.
  • Large video resolutions: For very high resolutions (4K+), the drawing window will be scaled down. Ensure you still save coordinates in native resolution; but warn users that drawing very small on extremely scaled canvases reduces precision.
  • Permissions & disk space: Writing output video and JSON requires write access; large video files consume disk space—document expected sizes and disk requirements.
  • Threading / performance: If detection and writing are done in the same thread, writing may add latency. For heavy loads consider an async writer queue or background thread.
  • Cross-platform behavior: Window behavior (resizing, always-on-top, key events) slightly differs across Windows/Linux/macOS — test on target platforms.
  • Security / secrets: No new secrets introduced. Output paths should be sanitized if provided by user input.

Docs

  • Docs updated? What were the changes:

    • Added a short section in docs/ (or README) describing:

      • how the draw window scales to screen size, how to draw & save;
      • saved JSON schema and example;
      • how to enable/disable video saving in ultralytics_file_example.py and codec/fps options;
      • platform notes about codecs and DPI.

Example JSON schema (actual output format)

[
  [[100, 200], [400, 200], [400, 350], [100, 350]],
  [[500, 150], [600, 150], [600, 250], [500, 250]]
]

Coordinates above are in video native pixels (width × height of the source video). Each polygon is an array of [x,y] coordinate pairs.


Usage Examples

Drawing Zones

# Draw zones on a video
python scripts/draw_zones.py --source_path video.mp4 --zone_configuration_path zones.json

# Draw zones on an image
python scripts/draw_zones.py --source_path image.jpg --zone_configuration_path zones.json

Running Detection with Video Output

# Basic detection with video output
python ultralytics_file_example.py \
    --source_video_path input.mp4 \
    --zone_configuration_path zones.json \
    --output_video_path output.mp4 \
    --weights yolov8s.pt \
    --confidence_threshold 0.3

# With custom classes (person detection only)
python ultralytics_file_example.py \
    --source_video_path input.mp4 \
    --zone_configuration_path zones.json \
    --output_video_path output.mp4 \
    --classes 0 \
    --confidence_threshold 0.5

Key Controls

Drawing Interface:

  • Left click: Add point to current polygon
  • Enter: Complete current polygon
  • Escape: Cancel current polygon
  • 's': Save all polygons and exit
  • 'q': Quit without saving

Video Processing:

  • 'q': Quit video processing
  • Video automatically saves if --output_video_path is provided

Technical Details

Coordinate Scaling

The drawing interface automatically scales the video to fit your screen while maintaining aspect ratio. The scaling logic:

  1. Calculates scale factors for both width and height
  2. Uses the smaller scale factor to ensure the entire video fits
  3. Never upscales (scale factor ≤ 1.0)
  4. Converts drawn coordinates back to original video resolution when saving

Video Codec Support

The video writer tries multiple codecs in order of preference:

  1. H264 - Best compatibility with modern players and social media
  2. XVID - Fallback for systems without H264 support

JSON Format

The saved JSON contains an array of polygons, where each polygon is an array of [x, y] coordinate pairs representing the vertices of the zone in the original video resolution.

[
  [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],  // First polygon
  [[x5, y5], [x6, y6], [x7, y7]]             // Second polygon
]

This format is directly compatible with the load_zones_config() function used by the detection pipeline.

…deo output saving on ultralytics_file_example.py
@eminaruk eminaruk requested a review from SkalskiP as a code owner September 19, 2025 20:35
@CLAassistant
Copy link

CLAassistant commented Sep 19, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants