Text-to-Music Generative AI : Stability Audio, Google's MusicLM And More
Music, an art form that resonates with the human soul, has been a constant companion of us all. Creating music using artificial intelligence began several decades ago. Initially, the attempts were simple and intuitive, with basic algorithms creating monotonous tunes. However, as technology advanced, so did the complexity and capabilities of AI music generators, paving the way for deep learning and Natural Language Processing (NLP) to play pivotal roles in this tech.
Today platforms like Spotify are leveraging AI to fine-tune their users' listening experiences. These deep-learning algorithms dissect individual preferences based on various musical elements such as tempo and mood to craft personalized song suggestions. They even analyze broader listening patterns and scour the internet for song-related discussions to build detailed song profiles.
The Origin of AI in Music: A Journey from Algorithmic Composition to Generative ModelingIn the early stages of AI mixing in the music world, spanning from the 1950s to the 1970s, the focus was primarily on algorithmic composition. This was a method where computers used a defined set of rules to create music. The first notable creation during this period was the Illiac Suite for String Quartet in 1957. It used the Monte Carlo algorithm, a process involving random numbers to dictate the pitch and rhythm within the confines of traditional musical theory and statistical probabilities.
During this time, another pioneer, Iannis Xenakis, utilized stochastic processes, a concept involving random probability distributions, to craft music. He used computers and the FORTRAN language to connect multiple probability functions, creating a pattern where different graphical representations corresponded to diverse sound spaces.
The Complexity of Translating Text into MusicMusic is stored in a rich and multi-dimensional format of data that encompasses elements such as melody, harmony, rhythm, and tempo, making the task of translating text into music highly complex. A standard song is represented by nearly a million numbers in a computer, a figure significantly higher than other formats of data like image, text, etc.
The field of audio generation is witnessing innovative approaches to overcome the challenges of creating realistic sound. One method involves generating a spectrogram, and then converting it back into audio.
Another strategy leverages the symbolic representation of music, like sheet music, which can be interpreted and played by musicians. This method has been digitized successfully, with tools like Magenta's Chamber Ensemble Generator creating music in the MIDI format, a protocol that facilitates communication between computers and musical instruments.
While these approaches have advanced the field, they come with their own set of limitations, underscoring the complex nature of audio generation.
Transformer-based autoregressive models and U-Net-based diffusion models, are at the forefront of technology, producing state-of-the-art (SOTA) results in generating audio, text, music, and much more. OpenAI's GPT series and almost all other LLMs currently are Google's MusicLM
Google's MusicLM was released in May this year. MusicLM can generate high-fidelity music pieces, that resonate with the exact sentiment described in the text. Using hierarchical sequence-to-sequence modeling, MusicLM has the capability to transform text descriptions into music that resonates at 24 kHz over extended durations.
The model operates on a multi-dimensional level, not just adhering to the textual inputs but also demonstrating the ability to be conditioned on melodies. This means it can take a hummed or whistled melody and transform it according to the style delineated in a text caption.
Technical InsightsThe MusicLM leverages the principles of AudioLM, a framework introduced in 2022 for audio generation. AudioLM synthesizes audio as a language modeling task within a discrete representation space, utilizing a hierarchy of coarse-to-fine audio discrete units, also known as tokens. This approach ensures high-fidelity and long-term coherence over substantial durations.
To facilitate the generation process, MusicLM extends the capabilities of AudioLM to incorporate text conditioning, a technique that aligns the generated audio with the nuances of the input text. This is achieved through a shared embedding space created using MuLan, a joint music-text model trained to project music and its corresponding text descriptions close to each other in an embedding space. This strategy effectively eliminates the need for captions during training, allowing the model to be trained on massive audio-only corpora.
MusicLM model also uses SoundStream as its audio tokenizer, which can reconstruct 24 kHz music at 6 kbps with impressive fidelity, leveraging residual vector quantization (RVQ) for efficient and high-quality audio compression.
Moreover, MusicLM expands its capabilities by allowing melody conditioning. This approach ensures that even a simple hummed tune can lay the foundation for a magnificent auditory experience, fine-tuned to the exact textual style descriptions.
The developers of MusicLM have also open-sourced MusicCaps, a dataset featuring 5.5k music-text pairs, each accompanied by rich text descriptions crafted by human experts. You can check it out here: MusicCaps on Hugging Face.
Ready to create AI soundtracks with Google's MusicLM? Here's how to get started:
Below are a few example prompts I experimented with:
https://www.Unite.Ai/wp-content/uploads/2023/09/audio.Wav"Meditative song, calming and soothing, with flutes and guitars. The music is slow, with a focus on creating a sense of peace and tranquility."
https://www.Unite.Ai/wp-content/uploads/2023/09/jazz-with-saxophone.Wav"jazz with saxophone"
When compared to previous SOTA models such as Riffusion and Mubert in a qualitative evaluation, MusicLM was preferred more over others, with participants favorably rating the compatibility of text captions with 10-second audio clips.
Stability AudioStability AI last week introduced "Stable Audio" a latent diffusion model architecture conditioned on text metadata alongside audio file duration and start time. This approach like Google's MusicLM has control over the content and length of the generated audio, allowing for the creation of audio clips with specified lengths up to the training window size.
Technical InsightsStable Audio comprises several components including a Variational Autoencoder (VAE) and a U-Net-based conditioned diffusion model, working together with a text encoder.
The VAE facilitates faster generation and training by compressing stereo audio into a data-compressed, noise-resistant, and invertible lossy latent encoding, bypassing the need to work with raw audio samples.
The text encoder, derived from a CLAP model, plays a pivotal role in understanding the intricate relationships between words and sounds, offering an informative representation of the tokenized input text. This is achieved through the utilization of text features from the penultimate layer of the CLAP text encoder, which are then integrated into the diffusion U-Net through cross-attention layers.
An important aspect is the incorporation of timing embeddings, which are calculated based on two properties: the start second of the audio chunk and the total duration of the original audio file. These values, translated into per-second discrete learned embeddings, are combined with the prompt tokens and fed into the U-Net's cross-attention layers, empowering users to dictate the overall length of the output audio.
The Stable Audio model was trained utilizing an extensive dataset of over 800,000 audio files, through collaboration with stock music provider AudioSparx.
Stable Audio offers a free version, allowing 20 generations of up to 20-second tracks per month, and a $12/month Pro plan, permitting 500 generations of up to 90-second tracks.
Below is an audio clip that I created using stable audio.
"Cinematic, Soundtrack Gentle Rainfall, Ambient, Soothing, Distant Dogs Barking, Calming Leaf Rustle, Subtle Wind, 40 BPM"
The applications of such finely crafted audio pieces are endless. Filmmakers can leverage this technology to create rich and immersive soundscapes. In the commercial sector, advertisers can utilize these tailored audio tracks. Moreover, this tool opens up avenues for individual creators and artists to experiment and innovate, offering a canvas of unlimited potential to craft sound pieces that narrate stories, evoke emotions, and create atmospheres with a depth that was previously hard to achieve without a substantial budget or technical expertise.
Prompting TipsCraft the perfect audio using text prompts. Here's a quick guide to get you started:
In this article, we have delved into AI-generated music/audio, from algorithmic compositions to the sophisticated generative AI frameworks of today like Google's MusicLM and Stability Audio. These technologies, leveraging deep learning and SOTA compression models, not only enhance music generation but also fine-tune listeners' experiences.
Yet, it is a domain in constant evolution, with hurdles like maintaining long-term coherence and the ongoing debate on the authenticity of AI-crafted music challenging the pioneers in this field. Just a week ago, the buzz was all about an AI-crafted song channeling the styles of Drake and The Weeknd, which had initially caught fire online earlier this year. However, it faced removal from the Grammy nomination list, showcasing the ongoing debate surrounding the legitimacy of AI-generated music in the industry (source). As AI continues to bridge gaps between music and listeners, it is surely promoting an ecosystem where technology coexists with art, fostering innovation while respecting tradition.
AI: Is The Intelligence Artificial Or Amplified?
Mark Heymann, Managing Partner. Mark Heymann & Assoc. HFTP Hall of Fame; BA Economics Brown Univ, MS Business, Columbia Univ.
gettyIn today's environment, there's barely a day that goes by when there isn't some discussion or article written about the latest in artificial intelligence. It's a very exciting time as we look at what computers can accomplish with or without human intervention.
To take a half a step back and to even the playing field in order to ensure clarity in the discussion that's going to ensue, I will highlight four key areas of what is called artificial intelligence.
• Machine Learning: This is a simple process by which a system gains more information that enables it to parse data. Based on all of this historical information, it makes predictions about what is going to happen in the future.
• Deep Learning: This refers to a machine learning approach that utilizes artificial neural networks, employing multiple layers of processing to progressively extract more advanced features from data.
• Natural Language Processing: Natural language processing (NLP) employs machine learning techniques to unveil the underlying structure and significance within textual content. Through NLP applications, businesses can analyze text data and gain insights about individuals, locations and events, enabling a deeper comprehension of social media sentiment and customer interactions.
• Cognitive Computing: Cognitive computing pertains to technology frameworks that, in a general sense, draw from the scientific domains of artificial intelligence and signal processing. These frameworks encompass a range of technologies, including machine learning, logical reasoning, natural language processing, speech recognition, visual object recognition, human-computer interaction, as well as dialog and narrative generation, among other capabilities. There is currently no agreed-on definition for cognitive computing in the industry or academia.
Computers And Decision MakingMy intent here is not to rehash a group of definitions, but with this as a baseline, I want to specifically turn to decision making and how much involvement computers should have in this process.
I think one of the keys to where the final decision lies depends upon not just the impact of a decision on the business but also the risk profile of the decision's outcome. Further, when that decision is assessed and reviewed, who will be held accountable for the result? This does not seem to be an area that discussions of artificial intelligence focus on very much.
Years ago—literally over 40 years ago—we developed some initial technology to help hotels predict revenue center activity. These centers not only accounted for daily room occupancy but also factored in the anticipated number of guests to other facilities, such as restaurants and bars. This process resembled the familiar task of forecasting widget production to align with demand while avoiding any significant inventory excesses.
The approach at that time was what we now commonly call machine learning. Over time, these technologies and algorithms have evolved to now fall more into the category of deep learning. But at the end of the day, regardless of any computer-generated predictions, it was still up to the manager of the specific revenue center or production environment to make the final decision on projected volume.
Once that decision was made, one of the key areas influenced by these projections was staffing levels. This pertained not only to daily staffing but, in the service industry, often extended to staffing levels in half-hour increments as needed.
As systems have advanced and the scope of data analysis has expanded, the accuracy of predictions has consistently improved. However, it remains a rarity for the manager overseeing this specific aspect of the operation to be fully removed from the final predictions, which encompass staffing and cost levels that will be incurred.
Where Human Intervention Is NeededTurning now to the broader economy and taking a look at where AI is being tried, we see examples where the systems that are being used have no human intervention whatsoever. At times, it is clear that human intervention is absolutely needed.
Consider, for example, trading systems within the stock market. In such systems, human intervention has proven critical in preventing excessively wide market fluctuations. This is just one area, but I'm sure if you take a moment to sit back and think about other areas where computers are making decisions based on some level of AI, you'll find many more examples of where human intervention is still crucial.
The Business Impact Of DecisionsAs we look at the application of what is broadly called artificial intelligence, it becomes more and more important to understand the risk impact of specific decisions on business results. Simply put, the larger the impact of a decision on an operation, the more important it is to ensure that the decision is not left completely to the computer.
If the decision going to be made has a very low risk of business failure and/or the cost of failure is very low, then it's easy to turn to the computer for determination.
We all remember when Deep Blue played chess and, at first, suffered defeat. However, as it continued to learn, it won chess matches, sparking our excitement about the computer's capabilities. Nevertheless, it's important to recognize that winning a chess game, which holds little real-world consequence, is quite different from the task of making decisions such as estimating the demand for breakfast service or predicting the number of travelers heading to Chicago.
The cost of getting that number wrong or the impact on other revenue centers can be significant, counting both direct and indirect impacts.
Therefore, I believe it benefits us to understand the consequences of the decisions being made, as well as the associated costs and risks of potential failures. This understanding can guide us in determining the appropriate level of management involvement in making the final decision. Final accountability for decision making in key areas needs to remain with management, especially when the cost of failure is high.
Over time, computer information and interpretation will become more important and enlightening. But as we look for accountability in management decisions, we may want to think more about AI being defined as "amplified" intelligence as compared to purely "artificial."
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
10 Best AI Writing Tools 2023
Artificial intelligence writing tools are designed to save busy professionals' time and provide them with high-quality content in a fraction of the time it would take to write manually.
These AI writing tools leverage artificial intelligence algorithms, natural language processing, and machine learning techniques to generate human-like text.
We compiled a list of the best AI writing software, including their features, pricing, and areas of strength and weaknesses.
Jump to:
Top AI writing tools: Comparison chart Vendor Best for Built-in plagiarism checker Grammar checker Free plan Starting price Copy.Ai Beating writer's block No No Yes $49 per month or $36 per month when billed annually Rytr Copywriters Yes No Yes $9 per month or $90 per year QuillBot Students and academics Yes Yes Yes $9.95 per month Frase.Io SEO teams and content managers No Yes No $14.99 per user per month or $12.66 per user per month if paid annually Anyword Blog writing Yes Yes No $49 per user per month or $39 per user per month billed yearly Grammarly Grammatical and punctuation error detection Yes Yes Yes $12 per month, billed annually or $30 when billed monthly Hemingway Editor Content readability measurement No Yes Yes Free Writesonic Freelancers and social media marketers No No Yes $19 per user per month billed monthly or $15 per user per month billed yearly AI Writer High-output bloggers No No No $19 per user per month ContentatScale.Ai Creating long form content No No No $250 per month Copy.Ai: Best for beating writer's blockCopy.Ai is an artificial intelligence writing tool designed for freelance writers, marketers, business owners, and copywriters to create various forms of content, including website copy, sales landing pages, email, social media bio, and blog sections.
The tool allows you to add a brand voice by creating content that reflects your brand voice – this helps it generate copy that aligns with your organization's persona.
Copy.Ai templates page PricingAlso see: AI Detector Tools
Rytr: Best for copywritersRytr is an AI-powered writing assistant capable of producing content on various topics.
The platform supports 40 use cases, including blog ideas, email, job descriptions, blog section writing and more – you can also manually create your use cases by training Rytr for your specific needs. Note that only paid plan users can create custom use cases.
Rytr copy generator environment PricingAlso see: Best Artificial Intelligence Software
QuillBot: Best for students and academicsQuillBot is an AI-powered writing assistant that allows you to paraphrase and summarize texts and also functions as a translator and citation generation tool.
If you need a quality paraphrasing tool, I recommend QuillBot, especially for content marketers. But students and people in academia may not prefer the tool because its output doesn't consistently pass AI detection tools.
QuillBot paraphrasing environment PricingAlso see: Top Generative AI Apps and Tools
Frase.Io: Best for SEO teams and content managersFrase.Io is an AI writing assistant designed to help you generate content, improve grammar and spelling, provide suggestions for better writing, and even help with SEO optimization. It can be used for brainstorming ideas and improving writing skills for beginners.
Frase user interface PricingUnlike the other tools we reviewed above, Frase doesn't offer a free plan and their free trial costs $1 for five days.
Also see: ChatGPT: Understanding the ChatGPT ChatBot
Anyword: Best for blog writingAnyword is an AI writing tool that uses machine learning algorithms to generate content. It can assist with writing tasks such as creating ad copy, crafting social media posts, generating blog content, and more.
The tool has a Copy Intelligence functionality analyzes your previously published content to determine which messaging works best on your website, ads, social, and email channels.
Anyword various templates view PricingGrammarly is a popular online writing assistance tool that can help you improve your written communication by checking for grammatical and spelling mistakes and offering suggestions for enhancing clarity, conciseness, and style.
It can be used in various contexts, such as writing emails, reports, essays, and social media posts. Grammarly can be used as a browser extension, a desktop application, or a mobile app and is free with limited features or as a premium subscription with additional functionalities.
Grammarly grammar checker environment PricingAlso see: 100+ Top AI Companies
Hemingway Editor: Best for content readability measurementHemingway Editor is a writing tool that helps you enhance the clarity and readability of your written work. It analyzes text and provides various readability suggestions. It highlights lengthy, complex sentences, excessive adverbs, passive voice, and hard-to-read phrases.
It assigns a readability score based on the grade level required to understand the text. It is available both as a web-based application and as a desktop app.
Hemingway Editor interface PricingThe platform is free to use.
Key featuresAlso see: ChatGPT vs. Google Bard
Writesonic: Best for freelancers and social media marketersWritesonic uses artificial intelligence technology, specifically natural language processing (NLP), to provide content generation services.
Writesonic's AI can generate text based on prompts and user input, making it useful for content marketing, copywriting, and other writing-related tasks. While it can be a helpful resource for content creation, the quality of generated content may vary, and human editing is often required to ensure accuracy and coherence.
Writesonic Library dashboard view PricingAI Writer is designed to generate full-length articles in minutes.
The platform allows you to tailor the AI's writing to your specific needs by selecting from a long list of recommended keywords for your topic or by manually inputting your chosen keywords. It also suggests sub-topics for your article and helps you structure your content with headings; it cites its sources.
AI Writer Research & Write dashboard PricingThose looking for an AI writing tool to create blog posts and other long-form content may find ContentatScale features suitable.
The platform claims to pass AI detection tests, indicating that its generated content mimics human writing and is not easily distinguishable from human-written content. ContentatScale also offers an AI detector solution that ranks as one of our review's best AI detector tools.
If you'd like to check ContentatScale's claims of passing AI detection tests, see: AI Detector Tools
ContentatScale project dashboard PricingThe best AI writing tool for your business depends on your AI writing needs:
If you need a paraphraser, QuillBot and Writesonic may be the best for you.
Some factors to consider when choosing the best AI writing software for your business:
These top tools can be used together for better and quality content. For instance, you can use QuillBot to paraphrase Anyword's AI-generated text, use Grammarly to correct spelling and punctuation errors and use the Hemmingway App to improve readability.
How We Evaluated the Best AI Writing ToolsWe weighed the best tools across five categories –each category has sub-categories that helped us evaluate and compare the AI writing tools.
Cost – 20%We examined the different pricing plans offered by each AI writing tool. This included evaluating the cost of the tool on a monthly or annual basis, as well as any additional fees or hidden costs. We compared each tool's cost to its value, looking for tools that offer a high level of functionality for a reasonable price.
Features set – 30%We assessed the writing capabilities of each tool, including its grammar and spelling correction, sentence rephrasing, and content generation capabilities. We looked for tools that provided accurate and high-quality writing suggestions.
Ease of use – 10%We looked for tools that have an intuitive and user-friendly interface, allowing users to navigate and utilize the tool's features easily.
Quality of output – 25%We evaluated the accuracy and coherence of the generated content produced by each AI writing tool. Tools that could generate clear, well-structured, and error-free content received higher scores.
Support – 15%We assessed the availability and responsiveness of customer support channels, such as email, live chat, or phone support. Prompt and helpful customer support is essential for users who may encounter issues or need assistance with the tool. We also considered the availability of resources and documentation, such as user guides, tutorials, or knowledge bases.
Bottom Line: Top AI Writing ToolsOur review identified the best AI writing tools to help your writing needs. While the above choices are excellent for now, please check back for updates – this market is constantly changing.
Remember, too, that though these tools are solid, they may not fit everyone perfectly. Ultimately even with a leading AI writing tool, the best results derive from a skilled human writer with the help of an AI writing tool. That's not likely to change anytime soon.
Read next: We also analyzed the best AI chatbots and the top conversational AI platforms to further your research.