MotionPyramid: Controllable Motion Synthesis via Stylized Phase Manifolds

Li, Jingyuan; Li, Peizhuo; Aristidou, Andreas; Sorkine-Hornung, Olga

MotionPyramid: Controllable Motion Synthesis via Stylized Phase Manifolds

Jingyuan Li¹, Peizhuo Li¹, Andreas Aristidou^2,3, Olga Sorkine-Hornung¹

¹ETH Zurich, Switzerland; ²University of Cyprus, Cyprus; ³CYENS Centre of Excellence, Cyprus

Computer Graphics Forum (Proceedings of SCA 2026)

Paper Code

Our model supports synthesizing motions from multiple input signals. By manipulating the stylized phase manifold through phase warping, style shifting, and other operations, we enable intuitive and precise control over motion timing, semantics, and style.

Paper Video

Abstract

We introduce stylized phase manifolds, a compact, interpretable latent representation that disentangles motion content (e.g., "jumping", "walking"), the temporal structure (e.g., motion cycle frequency, gait timing), and style (i.e., how the motion is performed). Learned in an unsupervised manner and inherently low-dimensional, the manifold offers intuitive and flexible editing. Building on this representation, we develop a diffusion-based motion generator that enables fine-grained control over semantic, temporal, and stylistic aspects of motion. To connect high-level intent with low-level motion, we treat the stylized manifold as an intermediate representation, a structured bridge between natural language and motion. By first mapping text into this manifold, our two-stage pipeline improves control over text-based motion generation, while producing high-quality, diverse motion outputs.

Stylized Manifold

Stylized phase manifold architecture — A motion sequence is encoded into a latent representation, split into timing and style components, and reconstructed through phase parameters estimated from frequency, pivot phase, and codebook amplitude.

Stylized phase manifold visualization — Visualization of our stylized manifold.

MotionPyramid stylized manifold — Average poses. Our manifold groups semantically similar motions into the same circle, the sub-manifold corresponding to a constant amplitude, while disentangling a continuous, temporally invariant style attribute in an unsupervised manner.

Diffusion Pipeline

Text to phase maps a prompt to manifold embeddings and trajectories that support reconfiguration operations such as concatenation, repetition, deletion, and permutation. Phase to motion then synthesizes motion from the edited embeddings and trajectory, aligning semantic, timing, stylistic, and spatial constraints.

Motion Editing and Control

MotionPyramid result figure 5 — Trajectory control: for the same manifold embedding, motions preserve content while following user-specified trajectories.

MotionPyramid result figure 6 — Reconfiguration: walking embeddings can be repeated, shortened, and concatenated with a climbing phase.

MotionPyramid result figure 7 — Phase warping: changing phase frequency compresses or stretches motion cycles while keeping motion plausible.

MotionPyramid result figure 8 — Style shift: for the same phase embedding, the style code increases from left to right, changing how the motion is executed without changing its content.

Acknowledgements

This work was supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme (ERC Consolidator Grant, agreement No. 101003104, MYCLOTH) and by the EU Commission's Horizon Europe program (grant No. 101178362). Open access publishing was facilitated by ETH Zurich, as part of the Wiley-ETH Zurich agreement via the Consortium Of Swiss Academic Libraries.