Generative AI Algorithms to Create Music

Computational Creativity (CC) aims to emulate or replicate human-like creativity using a computer. Its algorithms are used to generate novel and recombined music with aesthetic value. Machine Learning-based intelligence involves the synthesis of novel ideas or insights, given data. If intelligence can be learned, so, can creativity.

The paradox in programming creativity is its open-ended nature exploited by AI algorithms which instantiate deterministic rules to reduce the scope of the problem. Creativity is open-ended, nonconformist, absurd, and infinitely complex. So “rules” need to be put in place to address the complexity and guide structure, efficiency, and quality while compromising on variation, abstractness, and quantity.

Music metacreation means programming a system that creates a novel variety of music. There are many concepts and algorithms to create music, some of which I discuss.

A Markov chain is a sequence of nodes or events wherein the probability of the current state depends on the previous one. So you can compute transition probabilities from a state to the other and consolidate them in a matrix. For example, you’re playing Mario and the current instance leads to various next instances — you can jump, stay put, run, etc. Each instance will have a probability of occurrence given Mario’s current position, player behavior, and game constraints. There would be a sequence of ideal actions, or many, which will generate the highest reward — coins for example.

Similarly, using notes as states, there would be sequences of notes that sound better. This is used to determine pieces that have a high probability of sounding good together in a permutation. This concept can be used to generate aesthetic melodies. It overlooks more intricate patterns between notes as it only considers the previous note to arrive at the current note. It’s also prone to reusing patterns from training data leading to less original outputs.

Generative Grammar rules can be used to describe structures of music just like how grammar can be used to describe language. In grammar, there are terminal symbols and variables. The generated language is the set of all the strings of terminal symbols that can be obtained from a starting point and applying any number of rewriting rules in sequence.

A grammar to describe musical material is used. This can be customized for a particular genre or sound. If you want to create Jazz, you may use Steedman’s grammar on Jazz chord sequences, or you may create your own grammar. You can represent chord sequences as symbols and use AI to generate chord progressions that agree with the grammar rules.

A related idea is to use pattern-matching algorithms to analyze “signatures,” short musical sequences that define the style being analyzed, and to determine when and how to use those signatures, perhaps embedding their symbolic representations into grammar. A Transitional network can then be used to generate music in the style of the composer. This technique usually leads to the reuse of material taken from the learned corpus in a novel way. The creativity can further be programmed to be more original.

Neural nets are data structures that facilitate machine/deep learning. The design of neural nets is inspired by the biological neural network in animals. Neural nets are great in capturing archetypes of melodies and using that information to generate new melodies.

Recurrent Neural Networks (RNNs) are used in circumstances of sequential data where time is important and consequential. They perform the same function for every single element of a sequence, with the result being dependent on previous computations. Timing is fundamental in music and RNNs encode timing to inform and create new pieces.

Long-Short Term Memory (LSTM) networks are effective RNNs capable of learning long-term patterns. The networks employ special gates to decide the amount of information taken from novel input and what is maintained from older inputs. The conceptual space is complex and outputs nuanced. It does take more time to compute though. DeepBach is a creative AI program that generates new chorales, in the style of Bach, consisting of contemporaneous notes. It does a pretty good job.

Generative Adversarial Networks (GANs) use two neural networks opposing each other to create more robust outputs. One generates sounds imitating what is learned from real-world examples, and the other tries to discriminate between real and imitated sounds. As one gets better, the other must get better as well in order to “beat” the other network (hence “Adversarial”). This algorithmic framework can be used to create RNNs feeding forward and backward leading to more creative results. The black-box nature of neural nets makes AI’s creative process hard to understand.

Evolutionary algorithms start from a population of random suitable solutions to a problem. The algorithms combine these solutions to obtain new solutions by selecting solutions that better answer the problem. You progressively get closer to the optimal solution, which has to be represented. You also need a fitness function to gauge how good the generated music is. The music generated gets better with more iterations of combination, evaluation, and exploration.

Music, like art, has subjective perceptions so it’s hard to pick certain-sounding music over the other. Popularity or complexity can be utilized to come close to providing an evaluation framework for the created music. Music theory rules can be used to design the fitness function. Having explicit control or using humans to evaluate fitness might provide more creative results. Having an ideal solution may in itself be a limitation constricting the music pieces generated.

Multi-agent system (MAS) — An agent is a representation of a person/entity in an environment. The algorithm, Voyager, uses 64 player agents that generate melodies according to one of various pitch generation algorithms written by the programmer, according to his own taste. He then constructs a behaviour model that describes the general timbre, tempo, pitch range, and other features that regulate the development of the piece. This models a band where everybody is improvising, but still follows some general agreement.

MASs can be used to emulate social interactions with agents having different emotions, beliefs, idiosyncrasies, and the ability to express them through sound. An idea is to simulate two agents — a composer and an evaluator. The composer uses music theory and programmed beliefs to generate music. The intentions of the two agents are represented by the algorithms implemented to apply and verify the theoretic rules that form their beliefs. Intentional communication between the two results in the creative piece.

Similarly, agents can be programmed based on established beliefs and desires. An agent may channel a specific emotion through its singing and that may affect another agent with its own tastes and inform that agent’s singing. Subjective tastes can be programmed into the agents to generate music with more personality.

Some of the biggest challenges in the domain are the generated music’s evaluation, structure, and creativity. It’s hard to agree to an objective evaluation framework — an idea that highlights that certain music is always better than the other. Many times, a human is used to evaluate how creative the music is resulting in a subjective evaluation. It is hard to generate long pieces with varying sections in a musical form. Structure can be programmed into the grammar rules perhaps. It’s hard to define and quantify creativity. It’s hard to compose pieces with developing narratives and emotions.

AI-generated music is only going to get better. Ideas from human songs can be used to influence AI music to make it sound more human. Understanding and controlling the creative process can generate better results, some of which are already remarkable. This is a song made by AI for Eurovision. I look forward to seeing AI hits on the charts.

More content at plainenglish.io

Generative AI Algorithms to Create Music

Share this:

Like this: