A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.
With the creation of a whole new field called "Generative AI", whether you like the term or not, research hasn't slowed its frenetic pace, especially the industry, which has seen its biggest boom in implementation of AI technologies ever. Artificial intelligence and our understanding of the human brain and its link to AI are constantly evolving, showing promising applications improving our life's quality in the near future. Still, we ought to be careful with which technology we choose to apply.
"Science cannot tell us what we ought to do, only what we can do."
- Jean-Paul Sartre, Being and Nothingness
Here's curated list of the latest breakthroughs in AI and Data Science by release date with a clear video explanation, link to a more in-depth article, and code (if applicable). Enjoy the read!
The complete reference to each paper is listed at the end of this repository. Star this repository to stay up to date and stay tuned for next year! ⭐️
Maintainer: louisfb01
Subscribe to my newsletter - The latest updates in AI explained every week.
Feel free to message me any interesting paper I may have missed to add to this repository.
Tag me on Twitter @Whats_AI or LinkedIn @Louis (What's AI) Bouchard if you share the list! And come chat with us in our Learn AI Together Discord community!
👀 If you'd like to support my work, you can check to Sponsor this repository or support me on Patreon. You can also support me by following my favorite daily AI newsletter to get frequent updates on new papers like those!
Or support me by wearing cool merch!
- Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers [1]
- InstructPix2Pix: Learning to Follow Image Editing Instructions [2]
- MusicLM: Generating Music From Text [3]
- Paper references
Last year we saw the uprising of generative AI for both images and text, most recently with ChatGPT. Now, within the first week of 2023, researchers have already created a new system for audio data called VALL-E.
VALL-E is able to imitate someone’s voice with only a 3-second recording with higher similarity and speech naturalness than ever before. ChatGPT is able to imitate a human writer; VALL-E does the same for voice.
- Short Video Explanation:
- Short read: VALL-E: An AI Generating Voice from Text!
- Paper: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
- Code
We know that AI can generate images; now, let’s edit them!
This new model called InstructPix2Pix does precisely that; it edits an image following a text-based instruction given by the user. Just look at those amazing results… and that is not from OpenAI or google with an infinite budget.
It is a recent publication from Tim Brooks and collaborators at the University of California, including prof. Alexei A. Efros, a well-known figure in the computer vision industry. As you can see, the results are just incredible.
- Short Video Explanation:
- Short read: Image Editing from Text Instructions: InstructPix2Pix
- Paper: InstructPix2Pix: Learning to Follow Image Editing Instructions
- Code
- Demo
We recently covered a model able to imitate someone’s voice called VALL-E. Let’s jump a step further in the creative direction with this new AI called MusicLM. MusicLM allows you to generate music from a text description.
Let's not wait any longer and dive right into the results... what you will hear will blow you away!
- Short Video Explanation:
- Short read: Generating music with AI!
- Paper: MusicLM: Generating Music From Text
- Listen to some results
If you would like to read more papers and have a broader view, here is another great repository for you covering 2022: 2022: A Year Full of Amazing AI papers- A Review and feel free to subscribe to my weekly newsletter and stay up-to-date with new publications in AI for 2023!
Tag me on Twitter @Whats_AI or LinkedIn @Louis (What's AI) Bouchard if you share the list!
[1] Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., Chen, Z., Liu, Y., Wang, H., Li, J. and He, L., 2023. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv preprint arXiv:2301.02111.
[2] Brooks et al., 2022: InstructPix2Pix, https://arxiv.org/pdf/2211.09800.pdf
[3] Agostinelli et al., 2023: MusicLM, https://arxiv.org/pdf/2301.11325.pdf