

AI SONG CONTEST
Author : Wayne Cheng
April 23, 2021
We decided to take part in the AI Song Contest to showcase our AI music technology to the world. This post describes how we built SAM, and how we used SAM to create "Start It Up," which is our entry into the contest.
Motivation
We decided to take part in the AI Song Contest to showcase our AI music technology to the world.
In today’s music industry, over 90% of streams go to the top 1% of artists. This means that it is very difficult for 99% of artists to gain an audience, unless they are able to consistently create hit songs.
Our mission is to bring equality to the music industry. We do this by training our AI with hit songs, and building songwriting tools to help artists create hit songs. Our suite of AI tools is called “SAM.”
Workflow and Collaboration with AI
We determined which songs to use for the training data based on a music value metric. The value metric is the Billboard Hot 100 chart, which lists the top 100 songs on a weekly basis.
To curate the training data, we used songs which have available MusicXML files. We found that the quality of the MusicXML files are higher than MIDI files, because they were created by reputable sources, such as the original song publisher. We used over 5,000 songs for our training data, which was segmented into over 40,000 data points.
The MusicXML files were engineered into a lead sheet format. The lead sheet format has been in use since the 1930’s, and it is the most efficient way of representing music ideas. The essential information of a song’s composition is distilled into the main melody and chords for the harmony.
We used deep learning technologies to build our AI, specifically autoregressive or time-series models. What this means in a music context, is that the AI predicts a music note based on the past sequence of notes. We found that autoregressive models work better for autoregressive problems (such as music), rather than convolution based deep learning models.
We used an autoencoder architecture to build our AI tools. This allowed us to create music from scratch, or create from context. For this song, we used “SAM Agent C” to create music from a chord progression, and “SAM Agent N” to create music that is compatible with a given motif.
We observed that our generative AI technology excelled at creating a large quantity of music, but there were two challenges.
First, plagiarism was unavoidable. Plagiarism in this context is where a sequence of notes in the generated music is exactly the same as a sequence of notes in the training data. Depending on the tuning of the hyperparameters, we found that the AI generated more or less plagiarised music. In order for our AI tool to be used in an ethical way, we designed a plagiarism checker, and songs with detected instances of plagiarism were discarded.
Second, the generated music had a wide range of quality. Because it was very tedious to evaluate the hundreds of generated songs, we designed an AI critic to rank the songs by quality. This way, we only had to evaluate the top songs of each batch.
To create our song “Start It Up,” we first used SAM Agent C to write music using a “1 5 6 4” chord progression. The generated music is the AI’s best interpretation of how to write a hit song using this chord progression.
From the numerous songs that the AI created, we evaluated each song, and selected a song with happy and catchy motifs. We then manually refined the motifs, keeping some of the AI generated ideas, while adding our own ideas. This human-AI collaboration is essential, given the current limitations of AI music technology. The refined motifs from SAM Agent C are used in the Verse of the song.
This video shows the progression of the SAM Agent C output, to final draft, to final production :
This lead sheet was generated by SAM Agent C, where the relevant motifs are highlighted :

This is the lead sheet of the final draft, where the refined motifs are highlighted (in the same color) :

While using SAM, we noticed that in certain cases, it was difficult to stitch multiple motifs together into the same song. The motifs were incompatible with each other, and made the song incohesive. This problem led to the development of SAM Agent N, which creates music that is compatible with a given motif.
The motif that was used in the Verse is fed to SAM Agent N, to generate compatible music. Similar to the process of the creating the Verse, we evaluated each song, and selected a song with a happy and catchy motif. We then manually refined the motif, which is used in the Chorus.
This video shows the progression of the SAM Agent N output, to final draft, to final production :
This lead sheet was generated by SAM Agent N, where the relevant motif is highlighted :

This is the lead sheet of the final draft, where the refined motifs are highlighted (in the same color) :

Surprisingly, SAM is not limited to using notes in the Major scale, even though the majority of hit music is written in the Major scale. SAM also has the ability to perform key modulations. In the Verse, SAM created a melody with a b7 note, which gives the melody a bluesy sound. In the Chorus, SAM temporarily modulated to the key of F from the original key of D.
SAM is capable of creating lyrics that best fit the generated songs. These lyrics are intended to be used as placeholder lyrics for the generated music. Much like the motifs, we manually refined the lyrics, and then added to the lyrics as needed.
The progression of the AI generated lyrics, to placeholder lyrics, to final draft is shown here :


The emotion and story of this song was inspired by both the motifs and the lyrics generated by SAM. We started with a general idea of a happy pop song, and followed to where the AI led us.
Collaboration as a Team
After the song was written, we then brought the song to life with instrumentation. The vocalist provided her vocals, and all the elements were mixed together into the final music recording.
This flowchart summarizes our workflow :

Because we are a small organization, certain team members needed to fill multiple roles of the music creation process. We are in the process of expanding our team.
Creative Use of AI tools for Musicality
We believe that the process of creating music is a sequential process. First, motifs are written, and then the motifs are arranged into a full song. The song is then taken to production, with added instrumentation, recording, mixing, and mastering.
We believe that tasks at the beginning of the sequence contribute more to the overall song value than tasks towards the end of the sequence. For example, if the motifs were not well written, then the production will not help to improve the song. This is why we chose to focus our AI development on symbolic music, and address tasks at the beginning of the music creation process.
When placing constraints on the AI, we observed a reduction in its capacity to create unique and high-quality music. Although the use of SAM Agent C and Agent N requires a set of user defined constraints, these are merely suggestions for the AI. We allowed the AI to use its imagination to create the best possible song, and override the constraints as necessary.
While using SAM Agent C, we wanted a happy pop song with a “1 5 6 4” chord progression. However, for the generated AI output that we chose, the AI did not follow the progression strictly. Similarly, we wanted to generate compatible music with SAM Agent N, and ended up choosing an output that modulated to a different key from the given motif.
Therefore, in our opinion, the most optimal human-AI interaction involves a human providing suggestions to an AI, but allows the AI freedom to explore out-of-the-box options. Once the AI generates the output, the human then refines the generated output.
Diversity
Our team is made up of members with unique backgrounds, such as engineering, music technology, production, and recording.
Wayne Cheng has a background in computer and audio engineering. He has experience working in both the technology and creative industry. Robert Thomas has a background as a music artist and producer. He has experience playing in bands and producing music recordings. We were very happy to work with a professional session musician, Michelle Rescigno, who provided her wonderful vocal talents.
Ethical Considerations
We had concerns about using copyrighted material to train our AI, but discovered that this practice does not violate any copyright laws. Using copyrighted material for “transformative uses,” such as generative AI, falls in the Fair Use doctrines under US copyright law.
We put in a lot of effort (months in development time) to ensure that there are zero instances of plagiarism in the generated music. We believe that our plagiarism checker makes our AI songwriting tool one of the most ethical AI music tools ever developed.
As an organization focused on developing AI music technology, we only produce music to showcase our technology. This song, and all songs we have produced in the past, are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Final Thoughts
We work very hard to build the most viable and ethical AI tool that music artists can use. We hope that with our technology, more artists will have an opportunity to make a living in an extremely competitive and unbalanced industry.
Please connect with us on LinkedIn, and let’s grow the AI music community!
About the Author
Wayne Cheng is the founder and AI mobile app developer at Audoir. His focus is on the use of generative deep learning to build songwriting tools. Prior to starting Audoir, he worked as a hardware design engineer for Silicon Valley startups, and an audio engineer for creative organizations. He has a MSEE from UC Davis, and a Music Technology degree from Foothill College.