Perception Pipeline

Ball Toss Analysis

Motion detection results from a 2-minute D455 camera recording. Frame differencing reveals 8 distinct ball toss events in extremely low-light lab footage.

6,457
Frames Scanned
8
Events Detected
27,902
Peak Motion (px)
4,400
Baseline Noise
Scroll
107 Seconds of Frame Differencing
Each bar represents 3 consecutive frames. Height = number of pixels that changed by more than 15 intensity levels. Orange bars exceed the detection threshold.

Pixel motion intensity over time

Below threshold Ball toss detected
0:00 0:20 0:40 1:00 1:20 1:47
8 Ball Toss Events
Each event is a cluster of consecutive high-motion frames — corresponding to a ball being thrown across the table.
Captured at the Moment of Impact
Frames extracted at peak motion intensity. The lab was nearly dark — hover to see the brightness-enhanced version.
Where the Motion Happened
Frame-to-frame pixel difference, amplified 8× and color-mapped. Bright regions show where the ball (and thrower) moved between consecutive frames.
15 Seconds Around the Action
Extracted from 42–57s in the recovered recording. Raw on the left, brightness-enhanced on the right.
Raw D455 Footage 848×480 · 60fps
Enhanced (γ=3.0) brightness +30%
We Were Detecting the Wrong End
Peak motion = ball closest to camera = too late. The robot needs to detect the ball at release — 1 second earlier, when it's small and far. These early-flight frames are the real training targets.
🏓
Ball Released
Far from camera · Small · ~9K px motion
✅ DETECT HERE
→ → →
📸
Ball Arrives
Close to camera · Big · ~28K px motion
❌ TOO LATE
How Motion Grows Along the Trajectory
Each bar shows motion (pixels changed) at a specific time offset from the peak. Negative = before impact. The ball is detectable 1 full second before it arrives.
Event 2 peak @ 47.6s
-60
-50
-30
-20
-5
0
1s beforepeak
Event 8 peak @ 107.5s
-60
-50
-30
-20
-5
0
1s beforepeak
The Ball at Release — What the Robot Needs to See
CLAHE (Adaptive Histogram Equalization) brings out local contrast without blowing highlights. Scroll through each event's trajectory — left = ball far (early), right = ball close (peak).
How It Works
📹

Recording

Intel D455 RealSense at 848×480 60fps. MJPEG → MP4 via OpenCV. Recording was corrupted, recovered with untrunc.

🔍

Frame Differencing

Every consecutive pair compared. Pixels with Δ > 15 counted as motion. Threshold: baseline + 3σ.

📊

Event Clustering

Consecutive spike frames within 15 frames (250ms) grouped into single events. 24 spike frames → 8 events.

🌡️

Heatmaps

Absolute frame diff amplified , mapped through COLORMAP_HOT. Shows spatial distribution of motion.

🤖

YOLO v6 Result

Synthetic-trained model found 0 balls — only a fixed false positive at (627, 245). Domain gap too large.

💡

Next Steps

Label real frames from this video, fine-tune YOLO on real data, or use classical frame differencing for ball tracking.