A Hybrid GAN Architecture for Stable and High-Quality Image Generation: Integrating AdaptiveMix, DiffAugment, and EIGGAN-Inspired Techniques
محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمند
Negar Tahvili
1
,
shahla Nemati
2
,
Mohammad Ehsan Basiri
3
1 - MSc. Student, Department of Computer Engineering, Shahrekord University, Shahrekord, Iran
2 - Associate Professor, Department of Computer Engineering, Shahrekord University, Shahrekord, Iran
3 - Associate Professor, Department of Computer Engineering, Shahrekord University, Shahrekord, Iran
کلید واژه: شبکههاي مولد تخاصمي, AdaptiveMix, EIGGAN, DiffAugment, ميانگين متحرک نمايي, Inception Score,
چکیده مقاله :
Generative Adversarial Networks (GANs) have become influential tools in unsupervised image generation, significantly impacting fields like computer vision and the creative arts. However, challenges such as training instability and mode collapse often hinder their performance and the quality of generated images. This study introduces a hybrid GAN architecture that combines techniques from AdaptiveMix, which enhances model stability, with insights from EIGGAN, known for its innovative image generation methods. The primary goal is to improve both training stability and the visual quality of generated images.
The generator incorporates differentiable data augmentation (DiffAugment) and Exponential Moving Average (EMA) updates. DiffAugment introduces dynamic transformations to training data, enhancing diversity and robustness, while EMA updates stabilize training by smoothing parameter changes, resulting in more consistent outputs. The discriminator is regularized using the R1 penalty, improving its ability to distinguish between real and generated images, and benefits from feature space shrinkage through AdaptiveMix to maintain compact feature representations.
Trained on the CIFAR-10 dataset for 1000 epochs, the model achieved a peak Inception Score (IS) of 6.80 ± 0.22, indicating significant improvements in generative quality and diversity, along with a best Fréchet Inception Distance (FID) score of 20.90, reflecting high realism in generated samples. The discriminator's accuracy remained stable between 50% and 60%, suggesting a balanced adversarial relationship. These findings demonstrate the effectiveness of the hybrid model.
They also open new directions for improving stability and image quality in GAN training.
Generative Adversarial Networks (GANs) have become influential tools in unsupervised image generation, significantly impacting fields like computer vision and the creative arts. However, challenges such as training instability and mode collapse often hinder their performance and the quality of generated images. This study introduces a hybrid GAN architecture that combines techniques from AdaptiveMix, which enhances model stability, with insights from EIGGAN, known for its innovative image generation methods. The primary goal is to improve both training stability and the visual quality of generated images.
The generator incorporates differentiable data augmentation (DiffAugment) and Exponential Moving Average (EMA) updates. DiffAugment introduces dynamic transformations to training data, enhancing diversity and robustness, while EMA updates stabilize training by smoothing parameter changes, resulting in more consistent outputs. The discriminator is regularized using the R1 penalty, improving its ability to distinguish between real and generated images, and benefits from feature space shrinkage through AdaptiveMix to maintain compact feature representations.
Trained on the CIFAR-10 dataset for 1000 epochs, the model achieved a peak Inception Score (IS) of 6.80 ± 0.22, indicating significant improvements in generative quality and diversity, along with a best Fréchet Inception Distance (FID) score of 20.90, reflecting high realism in generated samples. The discriminator's accuracy remained stable between 50% and 60%, suggesting a balanced adversarial relationship. These findings demonstrate the effectiveness of the hybrid model.
They also open new directions for improving stability and image quality in GAN training.
[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (Vol. 27).
[2] Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In International conference on machine learning (pp. 313-321). PMLR.
[3] Liu, H., Zhang, W., Li, B., Wu, H., He, N., Huang, Y., ... & Zheng, Y. (2023). Improving GAN Training via Feature Space Shrinkage. arXiv preprint arXiv:2303.01559.
[4] Tian, C., Gao, H., Wang, P., & Zhang, B. (2024). An Enhanced GAN for Image Generation. Computers, Materials & Continua, 80(7), 852097.
[5] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
[6] Mescheder, L., Nowozin, S., & Geiger, A. (2018). Which training methods for GANs do actually converge? In International Conference on Machine Learning (pp. 3470-3479). PMLR.
[7] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. Advances in neural information processing systems, 29.
[8] Zhao, S., Liu, D., Lin, C., Zhu, J. Y., & Han, S. (2020). Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, 33, 16251-16263.
[9] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401-4410).
[10] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, A., & Courville, A. (2017). Improved training of wasserstein gans. Advances in neural information processing systems, 30.
[11] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[12] Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. In International Conference on Learning Representations.
[13] Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019). Self-attention generative adversarial networks. In International Conference on Machine Learning (pp. 7354-7363). PMLR.
[14] Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Unpublished manuscript.MIPAV. https://mipav.cit.nih.gov/
[15] Naveen Kodali, Jacob Abernethy, James Hays & Zsolt Kira, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA, ON CONVERGENCE AND STABILITY OF GANS
[16] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). arXiv preprint arXiv:1703.00573, 2017
[17] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2226–2234, 2016
[18] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, and Aaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016
[19] David Berthelot, Thomas Schumm, Luke Metz,BEGAN: Boundary Equilibrium Generative Adversarial Networks
[20] Junbo Zhao, Michael Mathieu, and Yann LeCun. Energybased generative adversarial network. In International Conference on Learning Representations, 2017.
[21]
[22] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2794–2802, 2017
[23] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[24] Yuanbo Xiangli, Yubin Deng, Bo Dai, Chen Change Loy, and Dahua Lin. Real or not real, that is the question. International Conference on Learning Representations, 2020.
