Here I have used Stable Diffusion with the diffusers from hugging face, took one image as input and then added elements into it by using diffusion algorithm, and iterated this process three/four times with different type of elements, adding by text inputs.
Credits: @1littlecoder, @StabilityAI, @huggingface