RA-L 2026  ·  IEEE Robotics and Automation Letters

Visual Sculpting

Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting

Peter Schaldenbrand    Jean Oh

Carnegie Mellon University

scroll

Abstract

Clay sculpting is a nuanced, artistic task involving dexterous manipulation with long-horizon planning to achieve high-level goals. As a robotics problem, we formulate clay sculpting as a shape-to-shape matching challenge.

Prior deformable object manipulation work either requires retraining a policy per goal or relies on dynamics models representing state as sparse point clouds — which do not capture important clay features like textures well.

We present a method for modeling the dynamics of deformable materials and planning for robotic sculpting in a representation that is visually-aligned, capturing lighting and texture features. With three different deformable materials and various end-effectors, our dynamics model is comparable to the state-of-the-art with the added benefit of being compatible with visual planning.

Our actions are parametrized pushes proved suitable for long-horizon (>100 actions) clay relief sculptures. We show the benefits of planning in a visually-aligned representation, and provide analysis of its challenges compared to 3D representations.

Robot sculpting a Celtic knot

Watch the Explainer

Why Visual Representations?

Most works sample point clouds at one point per 6 mm. At this resolution you can perceive rough 3D shape, but you miss important texture and shading information. Instead, we represent state as dense depth maps at 512×512 pixels along with the spatial gradient, from which lighting and texture features can be derived.

Interactive 3D Point Cloud — drag to rotate

Reference sculpture

Reference sculpture

Depth map & spatial gradient

Depth map Spatial gradient

Prior Robot Sculpting Work

Existing approaches to robotic clay manipulation, which our work builds on and extends.

Physical Setup

Dynamics Model

Given the action parameters and current state, our robot follows trajectories to make deformations along the surface of the material. We model these deformations by training a neural network, param2deform, to predict the changes in state at a constant pose.

Dynamics model diagram

Planning

A user specifies an image which is converted to depth. The depth map is altered to make it feasible for the robot based on the current state, forming a target state. Our planning algorithm optimizes a set of randomly initialized actions such that the dynamics model predicted state is accurate in both 3D and visual representations compared to the target.

Planning diagram

Results

Dynamics Model

The top row shows the true change in depth — red is indentation, blue shows displaced material. The bottom row shows the predicted change from param2deform.

Dynamics model results

Below, we vary action parameters and show the dynamics model prediction. Small deviations in action parameters can lead to large changes in material state, especially in the visual representation.

Planning Results

Our model plans from target depth maps or images. Plans are randomly initialized and optimized using a combination of three algorithms, updatable online in a model predictive control framework.

Full Sculpting Results

More than 400 actions

Paper & Citation

Visual Sculpting: Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting

@article{schaldenbrand2026visualSculpting, title={Visual Sculpting: Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting}, author={Schaldenbrand, Peter and Oh, Jean}, journal={IEEE Robotics and Automation Letters}, year={2026}, publisher={IEEE} }

Acknowledgements

Thank you to Alison Bartsch and Uksang Yoo for helpful advice, and to Hyun Parke for end-effector designs. This work was supported in part by NSF IIS-2112633 and the JPMorgan Chase Faculty Research Award.

This was my final work as a PhD student. I am incredibly grateful for the opportunity to work on projects that I was passionate about during my studies and to be able to dedicate my final year to this work. Thank you to everyone who helped me along the way, especially Jean Oh, for her support, creativity, and guidance. — Peter