Press X To Jump: It's Time For Us To Remove The Scene From The Video Game.

TOAt the end of Metal Gear Solid 4, right after Snake pulverizes Liquid Ocelot, there are a series of cutscenes that never end. Well, that’s not strictly true. does end – After 71 minutes – I’ve never seen that game. I understand that the game’s director, Hideo Kojima, is a committed film buff who has been greatly inspired by cinema, but I don’t care. These are minutes of my life that I will never get back.

I also don’t like the 20-minute cinematics that pepper Xenoblade Chronicles or Final Fantasy, or the hundreds of non-interactive cutscenes that detail every single plot point in Assassin’s Creed adventures. It’s unnecessarily aggressive to deprive the player of their agency and then bully them into paying attention for extended periods. I think it’s time we retired the entire convention.

The origins of video game cinematics are both technical and situational: in the 1990s, games simply couldn’t play out real-time cutscenes, and besides, much of the narrative talent in video games came from cinema, using the tools they knew. This interestingly mirrors the evolution of cinema: from the 1920s to the early 1930s, narrative cinema was largely inspired by the theatre. This made sense because the early film industry drew most of its talent (actors, directors, screenwriters, crew) from the theatre, and these people brought their techniques with them.

From stage to screen… Greta Garbo in the 1930 adaptation of Eugene O’Neill’s Anna Christie. Photography: Mgm/Sportsphoto/Allstar

Cameras tended to be static, with long takes between cuts, so that you could watch the action as an audience member; filming was done on specially constructed sets rather than outdoors; acting was somewhat mannered and histrionic, because performers were accustomed to exaggerating their movements and emotions so that people 18 rows back could see them. Early movie viewers were also familiar with stage conventions, so their use helped them adjust to the cinematic experience.

But as film developed as a medium of its own, new and intimate methods of storytelling emerged. Thanks in part to the invention of the dolly and the crane, the camera was transformed from a member of the audience into an observer moving within the world. Actors found they could communicate with small gestures and facial expressions. From German Expressionism to the French New Wave to the American arthouse cinema of the 1970s, new storytelling techniques emerged, and at the same time a wealth of film-specific conventions developed in lighting, direction, design and special effects. The medium came into its own.

This process is happening in games, too: we see it in the increasingly sophisticated disciplines of environmental storytelling, user experience and user interface, and narrative design. Yet in a medium that relies on interactivity and immersion, we’re so wedded to cinematics. If you look at some of the biggest, most moving narrative games of the past five years (The Last of Us, God of War, Marvel’s Spider-Man), most of the emotional moments happen in non-interactive cinematic sequences, where the controls are taken away from us. Like children, we can’t be trusted to participate. We’re required to just sit back and watch the spectacle.

No time to talk… Half-Life. Photography: Valve

The argument is that sometimes the emotional arc of a scene needs to be precisely timed and crafted in order to convey its emotional charge. In that case, we’re making the wrong kind of scenes. If a mature interactive medium can only tell emotional stories through non-interactive sequences, something’s wrong. It’s frustrating because Valve made great strides in this area 25 years ago: the narrative-driven sci-fi shooter Half-Life contained no cutscenes or cinematic sequences at all. Characters (the scientists and security guards at the Black Mesa facility) offered in-game exposition as you explored, and at the same time the increasingly unstable environment told its own story of destruction and suspense. Valve did it again a decade later with the Portal games, combining an entertaining, chatty robot antagonist with a world in which signs, symbols, and audio announcements communicated all the rules and background details the player needed to know to become intellectually and emotionally engaged.

Game designer Fumito Ueda made very sparing use of cinematics in his classic adventures Ico and Shadow of the Colossus, instead taking us into mysterious, oblique worlds where a lack of information inspired players to create their own mythologies. Indie studio thatgamecompany’s 2012 masterpiece Journey showed us mute characters in a desert, but still moved thousands of players to tears. Campo Santo’s Firewatch created rich mystery in the Wyoming desert and a disembodied voice on a walkie-talkie.

In our era of near-photorealism in gaming, the reliance on cinematics for dramatic, cathartic effect feels even more jarring and alienating. We can explore and exist in worlds of great clarity, surrounded by characters capable of communicating a range of emotions through a combination of motion capture, cutting-edge AI, and physics; that’s more than enough. These are dynamic, immersive worlds: if we, as players, can control highly sophisticated weapons, vehicles, and progression systems, we can participate in the stories.

A voice in the desert… Firewatch. Photography: Campo Santo

Or we can simply allow the narrative to exist in the background as something we live or experience vicariously – the interactive version of direct cinema. FromSoftware’s works are a prime example of this. There are cutscenes, but they are brief, usually serving to introduce a new enemy or show the player a moment when the world has reacted to them. Otherwise, the narrative is conjured simply by moving through these wild gothic landscapes. As writer and historian Holly Nielsen puts it, expressed in X Recently, “I put about 300 hours into Elden Ring. I couldn’t tell you anything about the world, the characters, or the story beyond some vague feelings.”

A few years ago I interviewed Todd Howard, the head of Bethesda Game Studios, and asked him what he thought was most important when telling a story in a video game. “You have to find the tone,” he said after a long pause. “We looked at a lot of old John Ford films, studying how to capture a space. Ford’s shots put you in a certain mood. There’s a tone. As a designer, you have to know how you want the player to feel. Find things outside of games that have that tone and just look at them.” Yes, this is an example taken from a film once again, but Howard isn’t talking about the story of The Searchers or Rio Grande, he’s talking about the feel of the spaces Ford created.

Tone. Vibrations. Sensation. These are different words for the same concept, and perhaps they are the foundations of a post-cinematic theory of mainstream video game narrative. In an immersive environment, story is something the player enters rather than looks into, a space of discovery rather than performance, a playground rather than a theater. It should be widely (and wildly) interpretive, and perhaps even entirely optional or subliminal. If the player is taken over, it should be a radical moment employed sparingly, such as turning the camera or plunging the setting into darkness.

Cinematics is a tyrannical impostor. It’s time to kick it out.

Press x to jump: It’s time for us to remove the scene from the video game.

Fuller’s toasts boost euro as slowing inflation lifts margins

Florida braces for “dirty rain” as massive Saharan dust cloud approaches

You may also like