Redes Generativas Adversariales (GANs) para Imágenes de Profundidad: Reducción de la Brecha Simulación-Realidad en la Percepción de UAVs

Pablo José Salazar Villacis

doi:10.64424/rcu41202560

Autores/as

Pablo José Salazar Villacis Loughborough University https://orcid.org/0000-0001-7137-477X

DOI:

https://doi.org/10.64424/rcu41202560

Palabras clave:

Adaptación de dominio, Imágenes de profundidad, Redes Generativas Adversariales, Brecha de simulación a realidad

Resumen

Los vehículos aéreos no tripulados (UAV) dependen de la percepción de profundidad para la navegación autónoma y la evasión de obstáculos. Sin embargo, los modelos entrenados en simulación tienen dificultades para generalizar debido a la brecha entre imágenes de profundidad sintéticas y reales, causada por diferencias en el ruido del sensor, la variabilidad del entorno y las texturas de los objetos, lo que reduce su eficacia en aplicaciones reales. Este estudio aborda la adaptación de dominio mediante redes generativas adversariales (GAN) para transformar imágenes de profundidad simuladas en representaciones más realistas. Se implementan dos enfoques: Pix2Pix, un modelo supervisado que requiere datos emparejados, y CycleGAN, un método no supervisado que adapta imágenes sin correspondencias directas. Para una evaluación rigurosa, se construye un conjunto de datos alineado con imágenes sintéticas y reales.Los resultados muestran que Pix2Pix supera a CycleGAN en la replicación de características de profundidad del mundo real al minimizar errores de intensidad, mientras que CycleGAN, aunque conserva la geometría, tiene dificultades para modelar el ruido del sensor. La adaptación adversarial reduce significativamente la brecha simulación-realidad, mejorando la precisión de la imagen de profundidad para la percepción de UAV. Para validar su aplicabilidad, las imágenes adaptadas se integran en el Sistema Operativo de Robots (ROS), permitiendo la percepción en tiempo real. Los hallazgos demuestran que la adaptación de dominio basada en GAN mejora la visión robótica basada en profundidad, facilitando una navegación más fiable de los UAV en entornos complejos.

Citas

Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2017- Janua, IEEE; 2017, p. 95–104.

Godard C, Aodha O Mac, Firman M, Brostow G. Digging Into Self-Supervised Monocular Depth Estimation. 2019 IEEE/CVF Int. Conf. Comput. Vis., vol. 2019- Octob, IEEE; 2019, p. 3827–37.

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst., vol. 3, Wiesbaden: Springer Fachmedien Wiesbaden; 2014, p. 2672–80.

Isola P, Zhu J-Y, Zhou T, Efros AA. Image-to-Image Translation with Conditional Adversarial Networks. CVPR 2017.

James S, Johns E. 3D Simulation for Robot Arm Control with Deep Q-Learning 2016.

James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, Ibarz J, et al. Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks 2019:12627–37.

Jing Chen, Tianbo Liu, Shaojie Shen. Online generation of collision-free trajectories for quadrotor flight in unknown cluttered environments. 2016 IEEE Int. Conf. Robot. Autom., vol. 2016- June, IEEE; 2016, p. 1476–83.

Le H, Saeedvand S, Hsu CC. A Comprehensive Review of Mobile Robot Navigation Using Deep Reinforcement Learning Algorithms in Crowded Environments. J Intell Robot Syst Theory Appl 2024;110:1–22.

Liu MY, Tuzel O. Coupled generative adversarial networks. Adv. Neural Inf. Process. Syst., vol. 29, 2016, p. 469–77.

Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019- June, IEEE; 2019, p. 2332–41.

Quigley M, Gerkey B, Conley K, Faust J, Foote T, Leibs J, et al. ROS: an open-source Robot Operating System 2019.

Al Radi M, AlMallahi MN, Al-Sumaiti AS, Semeraro C, Abdelkareem MA, Olabi AG. Progress in artificial intelligence-based visual servoing of autonomous unmanned aerial vehicles (UAVs). Int J Thermofluids 2024;21:100590.

Sadeghi F, Levine S. CAD2RL: Real Single-Image Flight Without a Single Real Image. Robot. Sci. Syst. XIII, vol. 13, Robotics: Science and Systems Foundation; 2017.

Sampedro C, Bavle H, Rodriguez-Ramos A, de la Puente P, Campoy P. Laser-Based Reactive Navigation for Multirotor Aerial Robots using Deep Reinforcement Learning. 2018 IEEE/RSJ Int. Conf. Intell. Robot. Syst., IEEE; 2018, p. 1024–31.

Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R. Learning From Simulated and Unsupervised Images Through Adversarial Training 2017:2107–16.

Sikang Liu, Watterson M, Tang S, Kumar V. High speed navigation for quadrotors with limited onboard sensing. 2016 IEEE Int. Conf. Robot. Autom., vol. 2016- June, IEEE; 2016, p. 1484–91.

Tzeng E, Devin C, Hoffman J, Finn C, Abbeel P, Levine S, et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints. Springer Proc. Adv. Robot., vol. 13, Springer, Cham; 2020, p. 688–703.

Wang F, Zhang Q, Zhao Q, Wang M, Sun F. Unsupervised image-to-image translation with multiscale attention generative adversarial network. Appl Intell 2024;54:6558–78.

Westerski A, Teck FW. Synthetic Data for Object Detection with Neural Networks: State of the Art Survey of Domain Randomisation Techniques. ACM Trans Multimed Comput Commun Appl 2023;21.

Wu K, Esfahani MA, Yuan S, Wang H. Depth-based Obstacle Avoidance through Deep Reinforcement Learning. Proc. 5th Int. Conf. Mechatronics Robot. Eng., vol. Part F1476, New York, NY, USA: ACM; 2019, p. 102–6.

Xu C, Zhou M, Ge T, Jiang Y, Xu W. Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation 2023:10114–23.

Xu X, Chen Z, Yin F. Multi-Scale Spatial Attention-Guided Monocular Depth Estimation With Semantic Enhancement. IEEE Trans Image Process 2021;30:8811–22.

Xu Y, Cao H, Xie L, Li X-L, Chen Z, Yang J. Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey. ACM Comput Surv 2024;56:36.

Zhu J-Y, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. 2017 IEEE Int. Conf. Comput. Vis., vol. 2017- Octob, IEEE; 2017, p. 2242–51.