A Better 360° View
To navigate through a space virtually, entertainment and news operations increasingly offer 360° videos. Viewed either with a virtual reality headset or by moving controls on a flat screen, like a computer or smartphone, one common feature of the experience is that there’s content in every direction. That can make it easy for a viewer to miss what’s most important, particularly if action occurs outside the current field of view.
Kristen Grauman, who holds a professorship in computer sciences, has helped solve this problem by training a system to move a camera angle automatically through a 360° environment to capture the most important information, mimicking how human eyes seek out and home in on key visual content.
To train the system, Grauman and her team had it process a lot of unlabeled YouTube videos captured with traditional field of view cameras. The system learned to distinguish which shots tend to be the focus when people film videos and how shots are composed. Applying these lessons to a 360° video, the program divides videos into all possible viewing angles and zoom levels for each chunk of time and picks pieces that best match the properties human video-makers prefer.
“Now, we optimize for a path through those glimpses that tries to keep the most capture-worthy parts, while also preserving a smooth camera motion. The system learns to frame the shot the way a human photographer would,” Grauman said. “It’s essentially learning to record its own video by observing how people capture videos.”
More generally, Grauman’s research for training systems to “look around” has implications not just for human video consumption but also for robotics. Mobile robots need to be able to enter a new environment and quickly gain an understanding of it, without a lot of superfluous motions. For example, autonomous search-and-rescue robots need to intelligently direct their cameras in the rubble following a disaster.