How Your Eyes Actually Move During Vertical Video Scrolling

By Viral Roast Research Team — Content Intelligence · Published 2026-02-20 · Updated 2026-03-31

The neural circuits governing saccades, smooth pursuit, and predictive tracking evolved for horizontal landscapes — not portrait-mode feeds. Understanding this mismatch unlocks a neuroscience-grounded approach to content design that works with the oculomotor system rather than against it.

Neural Systems Governing Eye Movements During Vertical Scrolling

Every scroll through a vertical video feed activates a cascade of neural computations spanning the brainstem, midbrain, cerebellum, and cortex. The superior colliculus (SC), a layered midbrain structure, serves as the primary integration hub for saccadic target selection during scrolling. Its intermediate layers contain a topographic motor map that translates visual salience signals — received from both the retina directly and cortical areas like V1 and the frontal eye fields — into the precise burst signals that drive saccadic eye movements. When a viewer scrolls past content, the SC must rapidly compute new saccadic vectors to orient the fovea toward emerging areas of interest. Simultaneously, the lateral intraparietal area (LIP) in the posterior parietal cortex maintains a dynamic priority map that ranks competing visual targets based on a combination of bottom-up salience (contrast, motion onset, color pop-out) and top-down relevance (task goals, learned reward associations). In the context of scrolling, LIP activity reflects the competitive interaction between the content currently being viewed and the new content entering the visual field from below. This competition determines saccadic latency — the 150 to 250 millisecond delay before the eyes jump to a new target — and directly influences whether a viewer's gaze lingers on current content or is captured by incoming stimuli. The medial temporal area (MT/V5), specialized for motion processing, handles the smooth pursuit component: tracking the apparent motion of content as it slides through the viewport during an active scroll gesture.

The cerebellum plays a critical and often underappreciated role in scrolling behavior through its contribution to predictive eye movements. The flocculus and paraflocculus of the vestibulocerebellum continuously calibrate the gain and timing of both smooth pursuit and the vestibulo-ocular reflex, enabling the visual system to anticipate where content will appear rather than merely reacting to it. Experienced scrollers develop cerebellar-mediated internal models of feed behavior — predicting content block heights, anticipating transition boundaries, and pre-programming saccadic amplitudes based on prior scrolling sessions. This predictive capacity explains why habitual users of platforms like TikTok, Instagram Reels, and YouTube Shorts exhibit measurably different scanning patterns than novice users: their oculomotor systems have literally been calibrated by repetitive exposure to consistent content formats. The smooth pursuit system, driven primarily by MT/V5 projections through the dorsolateral pontine nuclei to the cerebellum and then to ocular motor nuclei, must continuously match eye velocity to content velocity during active scrolling. When the scroll velocity exceeds the smooth pursuit system's maximum tracking speed (approximately 30 to 40 degrees per second for most individuals), the system switches to a catch-up saccade strategy, creating brief periods of retinal blur that degrade content perception. This velocity threshold has direct implications for feed scroll speed defaults and for how quickly content must establish its visual hook.

The fundamental challenge of vertical scrolling lies in an evolutionary mismatch between the human visual system's design parameters and the demands of portrait-mode content consumption. Human oculomotor control evolved primarily for horizontal environmental scanning — our eyes sit side by side in a horizontally oriented head, the horizontal rectus muscles are stronger and faster than the vertical recti and obliques, and the useful visual field extends roughly 200 degrees horizontally but only about 130 degrees vertically. Saccadic peak velocities are consistently 10 to 15 percent faster along the horizontal meridian compared to equivalent-amplitude vertical saccades, a difference attributable to the biomechanical properties of the extraocular muscles and their innervation patterns. Furthermore, many individuals exhibit significant vertical visual field asymmetries: the lower visual field typically receives preferential processing for near-space and manual tasks (a legacy of our tool-using ancestry), while the upper visual field is biased toward far-space and navigation-relevant processing. Vertical scrolling forces constant large-amplitude saccades along this biomechanically disadvantaged axis, creating higher oculomotor fatigue and cognitive load compared to equivalent horizontal scanning tasks. The inferior oblique and superior rectus muscles, which execute upward saccades, fatigue more rapidly than the lateral rectus driving horizontal movements. This asymmetry means that content positioned in the upper portion of a vertical frame demands more effortful oculomotor targeting, and creators who concentrate key visual information in the lower two-thirds of the frame are unconsciously aligning with the natural bias of the vertical saccadic system.

Content Design Optimization for Natural Oculomotor Control

Optimizing content for vertical scrolling requires understanding the distinction between two fundamentally different oculomotor modes that alternate during feed consumption: the smooth pursuit tracking mode and the saccadic exploration mode. During an active scroll gesture, the smooth pursuit system attempts to stabilize retinal images by matching eye velocity to the upward flow of content — this is the tracking mode, and it places continuous demand on the MT/V5 to cerebellum to ocular motor nuclei pathway. During pauses between scrolls, the saccadic system takes over, executing rapid ballistic eye movements to explore regions of interest within the currently visible content — this is the exploration mode, driven by the SC and frontal eye field circuitry. The critical design insight is that these two modes cannot operate simultaneously with full efficiency. Smooth pursuit suppresses saccadic initiation (and vice versa), meaning content that demands both tracking and exploration simultaneously creates oculomotor conflict. The practical application: use discrete content blocks with clear spatial boundaries rather than continuous flowing motion during transitions. When content scrolls into view, the smooth pursuit system handles the initial tracking, but the content should resolve into a stable configuration quickly — ideally within 300 to 500 milliseconds — to allow the saccadic exploration system to engage. Content that continuously drifts, pans, or auto-scrolls text forces the viewer into sustained pursuit mode, preventing the foveal fixations necessary for detailed feature extraction and text reading. This is why static text overlays with hard cuts outperform scrolling text tickers in retention metrics: they respect the saccadic exploration mode that enables actual information extraction.

The concept of foveal capture within a single fixation represents one of the most actionable principles from oculomotor research for 2026 content creators. The fovea — the 1.5-degree central region of highest acuity — must land directly on a visual element for detailed recognition to occur. Each fixation lasts approximately 200 to 300 milliseconds, and the useful information extraction window is even shorter, roughly 100 to 150 milliseconds after the saccade lands and before the next saccade is programmed. An Area of Interest (AOI) that fits within approximately 3 to 4 degrees of visual angle (about the size of a face or text block at typical phone viewing distances of 25 to 35 centimeters) can be fully processed in a single fixation. AOIs larger than this require multiple fixations and inter-saccadic integration, which roughly doubles processing time and introduces the risk of saccadic interruption by competing stimuli. For vertical video specifically, this means key visual elements — faces, text hooks, product shots, reaction triggers — should be sized and positioned to allow single-fixation capture. Clustering multiple small AOIs forces rapid saccadic scanning sequences that increase cognitive load and reduce the probability of complete information extraction. The optimal approach is a clear visual hierarchy with one primary AOI per content beat, positioned within the central 60 percent of the frame width and the lower 65 percent of the frame height, matching both the foveal capture constraint and the vertical visual field bias discussed earlier. Understanding saccadic latency — the 150 to 250 millisecond delay between stimulus onset and the initiation of a targeting saccade — allows creators to predict when viewers will discover key elements and to time visual changes accordingly, avoiding the introduction of new information during the latency window when the previous saccade is still being programmed.

The vertical-specific challenges of oculomotor control create a powerful argument for portrait-mode content optimization that transcends simple aspect ratio matching. Most discourse around vertical video focuses on filling the screen and avoiding letterboxing, but the deeper optimization involves matching content pacing and spatial layout to the inherent constraints of the vertical oculomotor system. Because vertical saccades are slower, less accurate, and more fatiguing than horizontal ones, content that requires extensive vertical scanning — tall text blocks, vertically distributed multi-element compositions, top-to-bottom visual narratives — imposes disproportionate oculomotor cost compared to content that concentrates its information in a horizontally compact zone within the vertical frame. The pacing implications are equally important: given that vertical saccadic latencies run 10 to 20 milliseconds longer than horizontal ones on average, and that vertical saccadic accuracy shows greater endpoint scatter (landing position errors), content beats in vertical video should allow slightly longer dwell times than equivalent horizontal content to accommodate the less efficient vertical scanning. A 2-second beat in horizontal video might need 2.2 to 2.4 seconds in vertical format to achieve the same level of information extraction. This is not about slowing content down — it is about matching temporal structure to oculomotor reality. Creators who intuitively pace their content to allow complete oculomotor acquisition of each visual beat see higher completion rates not because their content is necessarily more powerful, but because the viewer's visual system can actually keep up. The interaction between oculomotor capability and perceived content quality is bidirectional: content that overtaxes the oculomotor system feels confusing and low-quality even when the underlying production value is high, while content that respects oculomotor constraints feels clear and professional even with minimal production resources.

Saccadic Load Mapping for Vertical Compositions

Analyzing the saccadic demand profile of a vertical video frame involves computing the expected number of fixations required to fully process all salient AOIs, the total saccadic path length across the frame, and the proportion of vertical versus horizontal saccade vectors. Frames with high vertical saccadic load — requiring frequent large-amplitude up-down eye movements to connect spatially distributed elements — generate measurably higher oculomotor fatigue and correlate with earlier scroll-away behavior. Optimal compositions minimize total saccadic path length by clustering related AOIs within a single fixation zone or arranging them along a compact horizontal band, reducing the need for the biomechanically disadvantaged vertical saccade system to carry the perceptual workload.

Smooth Pursuit Velocity Matching for Scroll-Stop Transitions

The transition from active scrolling to content viewing creates a critical oculomotor handoff between the smooth pursuit and saccadic systems. Content that enters the viewport during a scroll gesture must present an initial visual anchor — typically high-contrast, foveally sized, and positioned near the vertical center of the frame — that enables the smooth pursuit system to decelerate tracking and hand off to a stabilizing fixation within 300 milliseconds. Designs that distribute visual weight uniformly across the frame fail to provide this anchor, resulting in an extended pursuit-to-fixation transition that delays content engagement. Measuring the pursuit deceleration profile and first-fixation latency for content openings reveals whether the visual anchor is functioning effectively or whether the viewer's oculomotor system is searching for a stable target.

Oculomotor Load Analysis and Vertical Video Optimization with Viral Roast

Viral Roast's analysis engine evaluates oculomotor load across each frame of a vertical video by modeling expected fixation sequences, saccadic amplitude distributions, and pursuit demand during transitions. The system identifies frames where AOI placement forces excessive vertical saccadic travel, flags text elements that exceed single-fixation capture thresholds, and detects temporal pacing mismatches where new visual information arrives during predicted saccadic latency windows. By surfacing these oculomotor bottlenecks alongside standard engagement metrics, creators can see which moments in their content are literally difficult for the human visual system to process, connecting frame-level design decisions to the neuroscience of how eyes actually move through vertical content.

Predictive Saccade Programming and Content Beat Timing

Experienced viewers develop cerebellar-mediated predictive models of content timing, pre-programming saccades to anticipated target locations before visual information fully arrives. Content that establishes a consistent temporal rhythm — regular beat intervals between scene changes, text appearances, or focal point shifts — enables this predictive saccade system to reduce effective saccadic latency from the typical 150-250 millisecond reactive range down to near-zero for anticipated transitions. Analyzing beat timing regularity and spatial consistency of focal transitions reveals whether content supports or disrupts predictive oculomotor control. Irregular timing and spatial unpredictability force the viewer back into reactive saccadic mode, increasing cognitive load and reducing the sense of effortless viewing flow that characterizes high-retention content.

How does oculomotor control differ between vertical scrolling and horizontal scrolling?

Vertical scrolling imposes greater oculomotor demand because the human visual system is biomechanically optimized for horizontal eye movements. The lateral rectus muscles that drive horizontal saccades generate higher peak velocities (up to 15% faster) and more accurate landing positions than the superior rectus and inferior oblique muscles responsible for vertical saccades. Vertical saccadic latencies run 10 to 20 milliseconds longer on average, and endpoint scatter is greater in the vertical meridian. Additionally, the useful vertical visual field is asymmetric — the lower field is preferentially processed for near-space tasks while the upper field biases toward far-space navigation — creating uneven perceptual processing across a vertically scrolling feed. This means vertical content demands more oculomotor effort per unit of information extracted, and creators must account for this by optimizing AOI placement, reducing vertical saccadic path length, and allowing slightly longer dwell times per content beat.

What is the role of the superior colliculus in scrolling behavior?

The superior colliculus (SC) serves as the primary midbrain hub for saccadic target selection during scrolling. Its intermediate layers contain a retinotopic motor map that converts visual salience signals into precise burst commands for saccadic eye movements. During scrolling, the SC continuously receives competing inputs — from the retina (bottom-up salience like contrast edges and motion onset), from cortical areas V1 and V4 (feature-based salience), and from the frontal eye fields (top-down intentional targeting). The SC integrates these signals to determine which emerging content element wins the competition for the next saccade. When new content enters the visual field from below during a downward scroll, the SC must rapidly compute a saccadic vector to orient the fovea toward the most salient feature of the incoming content. The speed and accuracy of this computation determine how quickly a viewer's eyes lock onto a new video's visual hook, making SC-level target competition the neural substrate of the first-impression effect in scrolling feeds.

Why does saccadic latency matter for content design in vertical video?

Saccadic latency — the 150 to 250 millisecond delay between a visual stimulus appearing and the eyes initiating a movement toward it — creates a critical temporal blind spot in content perception. During this latency period, the oculomotor system is programming the next saccade: computing target coordinates, calculating the required amplitude and direction, and preparing the muscle activation sequence. Any new visual information introduced during this programming window (such as a text change, scene cut, or graphic appearance) may not be registered as a saccadic target until the current saccade completes and a new programming cycle begins, effectively doubling the time before the viewer's fovea reaches the new element. For creators, this means visual transitions should be spaced at least 250 to 350 milliseconds apart to avoid landing in each other's latency windows. Rapid-fire visual changes that seem energetic in editing may actually create perceptual gaps where key information is never foveated at all, explaining why some high-energy edits paradoxically reduce information retention.

How does smooth pursuit tracking interact with content comprehension during scrolling?

Smooth pursuit and saccadic exploration are functionally antagonistic: activating the smooth pursuit system to track moving content suppresses saccadic initiation, and vice versa. During an active scroll gesture, the pursuit system engages to stabilize the retinal image of upward-flowing content, matching eye velocity to content velocity via the MT/V5 to cerebellum to brainstem pathway. While pursuit is active, the viewer cannot execute the saccadic fixations necessary for detailed feature extraction — text reading, face recognition, and fine detail processing all require stable foveal fixation, which is incompatible with pursuit tracking. This means that actual content comprehension only begins after scrolling stops and the oculomotor system transitions from pursuit to fixation mode. Content that relies on motion elements (scrolling text, continuous panning, parallax effects) during the critical first 500 milliseconds after scroll-stop forces the pursuit system to remain engaged, delaying the transition to saccadic exploration and reducing the effective window for information extraction. Discrete, stable visual elements allow immediate saccadic engagement and faster content comprehension.

Does Instagram's Originality Score affect my content's reach?

Yes. Instagram introduced an Originality Score in 2026 that fingerprints every video. Content sharing 70% or more visual similarity with existing posts on the platform gets suppressed in distribution. Aggregator accounts saw 60-80% reach drops when this rolled out, while original creators gained 40-60% more reach. If you cross-post from TikTok, strip watermarks and re-edit with different text styling, color grading, or crop framing so the visual fingerprint feels native to Instagram.