What question did this study set out to answer?

The aim is to improve pose estimation for long-staple cotton harvesting by using growth-relation constraints and a vision-language model.

March 2, 2026Open Access

Pose estimation for long-staple cotton picking based on growth-relationship keypoint constraints and vision–language model reasoning

Read Full Paperexternally

Key Points

The aim is to improve pose estimation for long-staple cotton harvesting by using growth-relation constraints and a vision-language model.
Developed a YOLO model for fine-grained boll-stem segmentation.
Established mathematical models for the fractal characteristics of cotton plants.
Utilized a vision-language model for cotton boll orientation reasoning and keypoint extraction.
Formulated a cooperative localization method using spatial geometric reasoning.
Achieved an orientation success rate of 94.1% in complex conditions.
74% of depth keypoint errors were below 3 mm.
Reached an 80% grasping success rate.
Over 65% of picking attempts succeeded within a ± 20° tolerance.

Abstract

To address the challenge of pose estimation in intelligent long-staple cotton harvesting caused by dense plant distribution and pronounced spatial heterogeneity of bolls, this study introduces a pose estimation method that integrates growth-relation keypoint constraints with vision–language model reasoning. A YOLO for cotton and stem segmentation (YOLO-CSS) model is developed to achieve fine-grained boll–stem segmentation under complex field conditions. Mathematical representations of the fractal characteristics and structural complexity of cotton plants are established, and growth structure modeling is used to analyze boll–stem spatial relationships, providing structural priors for subsequent orientation estimation. The study developed a cotton boll orientation reasoning method based on vision–language model understanding (VLM-OR), where the model evaluates the reliability of stem-growth keypoint extraction, establishes growth direction priors from these keypoints, and incorporates boll attachment direction for rule-based reasoning, enabling orientation estimation under weak-texture conditions. Furthermore, A boll–stem cooperative localization method is formulated through spatial geometric reasoning, using stem-growth keypoints as spatial anchors to derive the 3D picking pose of bolls and compensate for depth-direction positioning errors of the end effector, thus supporting dynamic alignment between perceptual outputs and execution parameters. Experimental results show that the proposed VLM-OR achieves an orientation success rate of 94.1 % in complex scenarios. Additionally, 74 % of depth keypoint errors remain below 3 mm, the orientation-based grasping success rate reaches 80 %, and more than 65 % of picking attempts succeed within a ± 20° tolerance range. These findings confirm the method’s accuracy and operational adaptability, offering strong methodological support for visual perception in long-staple cotton picking robots. • A pose estimation method for long-staple cotton picking using growth-relation keypoints and vision-language model reasoning. • A mathematical representation method of the fractal characteristics and structural complexity of cotton plants. • A cotton boll orientation reasoning method based on vision-language model understanding. • Cotton boll-stem coordinated positioning method, compensating for depth positioning errors in actuators. • Achieves 74 % of depth localization errors within 3 mm and a pose-grasping success rate of 80 %.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zhi Liang

Xiaojuan Li

Zhonglong Lin

Journals

Industrial Crops and Products

Actions

Institutions

Chinese Academy of Agricultural Sciences

Xinjiang University

Xinjiang Production and Construction Corps

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pose estimation for long-staple cotton picking based on growth-relationship keypoint constraints and vision–language model reasoning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study