Video compression standards rely heavily on eliminating spatial and temporal redundancy within and across video frames. Intra-frame encoding targets redundancy within blocks of a single video frame, whereas inter-frame coding focuses on removing redundancy between the current frame and its reference frames. The level of spatial and temporal redundancy, or complexity, is a crucial factor in video compression. Generally, videos with higher complexity require a greater bitrate to maintain a specific quality level. Understanding the complexity of a video beforehand can significantly enhance the optimization of video coding and streaming workflows. While Spatial Information (SI) and Temporal Information (TI) are traditionally used to represent video complexity, they often exhibit low correlation with actual video coding performance. In this challenge, the goal is to find innovative methods that can quickly and accurately predict the spatial and temporal complexity of a video, with a high correlation to actual performance. These methods should be efficient enough to be applicable in live video streaming scenarios, ensuring real-time adaptability and optimization.
The efficiency of video streaming workflows is closely tied to the content of the video. Understanding video encoding complexity is crucial for determining the most effective encoding parameters. For instance, achieving a specific video quality requires different bitrates depending on the video's complexity; videos with lower complexity need significantly less bitrate compared to those with higher complexity. If the complexity of the video is known, other encoding parameters can also be optimized, such as the ideal resolution and frame rate for a given bitrate. This knowledge of video complexity enables more precise and efficient streaming solutions.