The race for AI music production has just become more interesting. At a recent launch event, Amazon revealed its new DeepComposer keyboard and demonstrated its music production capabilities.
According to Amazon, the generative AI technology that they’ve developed gives “everybody access to an expert.” Using their DeepComposer keyboard and AI platform, users can transform a melody into a complete song with multiple instrumental tracks. In addition, users can also enhance a melody in a style of any composer (such as Bach).
In this article, I will examine Amazon’s DeepComposer in detail.
Summary of the Tool
Amazon’s DeepComposer is targeted towards machine learning developers, who have an interest in developing generative applications. It is not intended to be used by the general audience such as musicians, music producers, and songwriters.
Amazon’s DeepComposer is accessed through the Amazon Web Services (AWS) platform; users need an AWS account to access the tool. DeepComposer is split up into three parts: music studio, model training, and learning capsules. The music studio is free to use, and great for testing out the capabilities of the tool. The model training is for more advanced users, where users can train up to 4 models for free.
Users can start on the AWS DeepComposer Getting Started page.
Within the music studio, users can use DeepComposer to perform melody transformations. Users can select a sample melody, such as "Twinkle Twinkle Little Star," or record an original melody. The use of the DeepComposer keyboard is not mandatory; melodies can be created by using any MIDI keyboard or even a computer keyboard.
There are two AI architectures to choose from, and each architecture produces different results. The autoregressive architecture (AR-CNN) is used to enhance a melody in a style of a composer. Using the preset model which is trained on songs by Bach, any melody can be transformed into the style of Bach.
The generative adversarial network (GAN) architecture is used to create accompaniment tracks for a melody. Users can select a pre-trained model of music from a specific genre, such as rock, pop, jazz, or symphony.
The music that is generated from the default parameters is passable. The music from AR-CNN sounds more musical than the music from the GAN architectures.
Within the model training section, users can train a MuseGAN or U-Net model, both of which are used by the GAN architecture in the music studio.
DeepComposer comes with learning capsules, which explains the underlying technology in great detail.
I could not get the DeepComposer team to divulge the training dataset or pre-processing procedures for their MuseGAN architecture.
The training dataset used for U-Net and AR-CNN is a collection of Bach chorale songs. The 229 MIDI files can be found in the DeepComposer GitHub repository.
The notes in the dataset are quantized and time-divided into 8th note intervals.
Machine Learning Architectures
For generating music, most development teams decide to use an autoregressive machine learning architecture. DeepComposer, on the other hand, uses a CNN based architecture that is typically used for image processing.
By using a CNN based architecture, generative applications can be build using a GAN model. In theory, GANs should be able to generate superior results compared to an autoregressive model, since the generator is trained by a discriminator.
However, CNNs perform worse than autoregressive models when used to process sequential data. CNNs have difficulty capturing information over long sequences, because of the information loss at each convolution layer. This loss of information can be mitigated by using an U-Net architecture, which DeepComposer has provided as one of the trainable models.
DeepComposer also features an autoregressive CNN architecture, called AR-CNN. The AR-CNN works in a similar way as a transformational encoder-decoder architecture. The training data is a dataset with random noise introduced, and the model tries to map this noisy dataset to the ground truth dataset.
Amazon's DeepComposer is a great tool that introduces machine learning developers to the powers of AI generative technology. The true value of the tool lies in the learning capsules, which introduce users to the fundamentals of data pre-processing and generative machine learning architectures.
Although DeepComposer is not targeted towards the general audience, I expect that we will see a commercial version of DeepComposer in the near future.
About the Author
Wayne Cheng is the founder and machine learning engineer at Audoir. His focus is on the use of generative deep learning to build songwriting tools. Prior to starting Audoir, he worked as a hardware design engineer for Silicon Valley startups, and an audio engineer for creative organizations. He has a MSEE from UC Davis, and a Music Technology degree from Foothill College.