Photorealistic attention based text guided human image editing with a latent mapper for StyleGAN2.
This work is a reimplementation of the paper FEAT: Face Editing with Attention with additional changes and improvements.
- Clone this repository
https://github.com/Psarpei/GanVinci.git - CD into this repo:
cd GanVinci - Create conda environment from environment.yml
conda env create -f environment.yml - Download StyleGAN2 config-f weights from here
- Place StyleGAN2 weights under
checkpoints/
To train a text guided image edit (e.g. beard, smiling_person, open_mouth, blond_hair etc.) execute:
python3 train_FEAT.py
with the following parameters
--clip_texttypestr, help "edit text e.g. beard, smile or open_mouth",--batch_sizebatch size (need to be one if --(fe)male_only is activated, typeint, default1--lrlearnrate, typefloat, default=0.0001--lambda_attlatent attention regression loss factor, typefloat, default=0.005--lambda_tvtotal variation loss factor, typefloat, default0.00001--lambda_l2l2 loss factor, typefloat, default0.8--att_layerlayer of attention map, typeint, default8--att_channelnumber of channels of attention map, typeint, default32--att_startstart attention layer of the latent mapper, typeintdefault0--lr_step_sizelearning rate step size for scheduler, typeint, default5000--lr_gammagamma for learning rate of scheduler, typefloat, default0.5--alphafactor of latent mapper typefloat, default0.5--clip_only_stepsamount of steps training only using clip loss for better convergence in some edits, typeint, default0--sizeoutput image size of the generator, typeint, default1024--iterationsnumber of samples to be generated for each image, typeint, default20000--truncationtruncation ratio, typefloat, default1--truncation_meannumber of vectors to calculate mean for the truncation, typeint, default4096--stylegan2_ckptpath to the StyleGAN2 model checkpoint, typestr, defaultstylegan2-ffhq-config-f.pt--channel_multiplierchannel multiplier of the generator. config-f = 2, else = 1, typeint, default2--male_onlyflag that only uses images of male people--female_onlyflag that only uses images of female people
In the bash_examples/folder are a few inference invokes provided.
For inference it is required to have the trained edit checkpoints placed under the folder structure like the following example
edits/
├── 0-8/
│ ├── beard/
│ │ │ ├── checkpoints/
│ │ │ │ ├── 01000_beard.pt
│ │ │ │ ├── 02000_beard.pt
│ │ ... ... ...
│ │ │ │ └── 20000_beard.pt
│ ...
...
To apply a trained text guided image edit execute:
python3 generate.py
with the following parameters
--clip_textname of edit (e.g.beard,smileetc.), if "" standard styleGAN2 image generation is applied, typestr, default""--alphafactor of latent mapper, typefloat, default0.1--att_layerlayer of attention map, typeint, default8--att_channelnumber of channels of attention map, typeint, default32--att_startstart attention layer of the latent mapper, typeint, default0--mask_thresholdthreshold for mask apply based on predicted pixels,, typefloat, default0.8--train_iteriteration steps of edit checkpoint, typestr, default""--sizeoutput image size of the generator, typeint, default1024--samplenumber of samples to be generated for each image type,int, default1--picsnumber of images to be generated, type,int, default20--truncationtruncation ratio, typefloat, default1--truncation_meannumber of vectors to calculate mean for the truncation, typeint, default4096--ckptpath to the model checkpoint, typestr, defaultstylegan2-ffhq-config-f.pt--channel_multiplierchannel multiplier of the generator. config-f = 2, else = 1, typeint, default2--seedrandom seed for image generation, typeint, default0--male_onlyflag that only uses images of female people--female_onlyflag that only uses images of female people
In the bash_examples/folder are a few inference invokes provided.
You can download some weights of pre-trained edits here.
To apply a pre-trained edit leave the folder structure as it is and place everything under edits/ how explained in the inference section.
This code borrows heavily from stylegan2-pytorch and the model is based on the paper FEAT: Face Editing with Attention.
