Visually-Aligned Planning Representations for Long-Horizon Robot Clay Sculpting
Carnegie Mellon University
01 — Overview
Clay sculpting is a nuanced, artistic task involving dexterous manipulation with long-horizon planning to achieve high-level goals. As a robotics problem, we formulate clay sculpting as a shape-to-shape matching challenge.
Prior deformable object manipulation work either requires retraining a policy per goal or relies on dynamics models representing state as sparse point clouds — which do not capture important clay features like textures well.
We present a method for modeling the dynamics of deformable materials and planning for robotic sculpting in a representation that is visually-aligned, capturing lighting and texture features. With three different deformable materials and various end-effectors, our dynamics model is comparable to the state-of-the-art with the added benefit of being compatible with visual planning.
Our actions are parametrized pushes proved suitable for long-horizon (>100 actions) clay relief sculptures. We show the benefits of planning in a visually-aligned representation, and provide analysis of its challenges compared to 3D representations.
Robot sculpting a Celtic knot
02 — Overview Video
03 — State Representation
Most works sample point clouds at one point per 6 mm. At this resolution you can perceive rough 3D shape, but you miss important texture and shading information. Instead, we represent state as dense depth maps at 512×512 pixels along with the spatial gradient, from which lighting and texture features can be derived.
Interactive 3D Point Cloud — drag to rotate
Reference sculpture
Depth map & spatial gradient
04 — Context
Existing approaches to robotic clay manipulation, which our work builds on and extends.
05 — Hardware
06 — Method
Given the action parameters and current state, our robot follows trajectories to make deformations along the surface of the material. We model these deformations by training a neural network, param2deform, to predict the changes in state at a constant pose.
A user specifies an image which is converted to depth. The depth map is altered to make it feasible for the robot based on the current state, forming a target state. Our planning algorithm optimizes a set of randomly initialized actions such that the dynamics model predicted state is accurate in both 3D and visual representations compared to the target.
07 — Evaluation
Dynamics Model
The top row shows the true change in depth — red is indentation, blue shows displaced material. The bottom row shows the predicted change from param2deform.
Below, we vary action parameters and show the dynamics model prediction. Small deviations in action parameters can lead to large changes in material state, especially in the visual representation.
Planning Results
Our model plans from target depth maps or images. Plans are randomly initialized and optimized using a combination of three algorithms, updatable online in a model predictive control framework.
Full Sculpting Results
More than 400 actions
08 — Publication
Acknowledgements
Thank you to Alison Bartsch and Uksang Yoo for helpful advice, and to Hyun Parke for end-effector designs. This work was supported in part by NSF IIS-2112633 and the JPMorgan Chase Faculty Research Award.
This was my final work as a PhD student. I am incredibly grateful for the opportunity to work on projects that I was passionate about during my studies and to be able to dedicate my final year to this work. Thank you to everyone who helped me along the way, especially Jean Oh, for her support, creativity, and guidance. — Peter