How to Turn Speech to Music

Erik
L.A.
Olsen

One of the most interesting potential applications for music perception theory is how it applies to the musicality of human speech. A number of researchers have examined the presence of 12-tone musical intervals in natural Western speech, and and as a result we have a great script for the open-source sound analysis program Praat that we can use. The script is called Prosogram and it can be used to do many things, but here I will be focusing on using it to generate a list of the pitches of every syllable in a given recording. From that list you can make music.

Download the software

Download and install Praat for Windows or Mac.

Download Prosogram and make sure you extract the folder to a place where you can access it easily. In order to read the data output from Prosogram, you'll need to download and instalall Ghostscript and GSview Also, if you're using Windows and don't have an extraction program, 7zip will do the trick.

Download Musescore

If you don't have a recording/audio editing program, Audacity is good.

Next - Pick a speech sample

Processing the recording with Prosogram

You could really use any recording of speech, but some will work better than others. As a general rule -- avoid .mp3s. This is mostly because due to the low sample rate of most mp3s there isn't enough information on them to analyze properly, .wav or .flac (flawless lossless audio codec) files will be your best bet. Ideally, you should have a recording with a sampling rate of at least 44.1 khz. For my project, I chose an old recording -- Martin Luther King's "I Have a Dream" speech, so while the file is in .flac format, the sample rate is lower than optimal since it was recorded in 1963 at a rally.

Open Praat

Go the Praat menu

Select "Open Praat script" that should open a file browser and from there you should

Find the Prosogram that you placed in an accessible location

Open it and select "prosogram.praat"

that should trigger something that looks like this to pop up

press ctrl+r

that should bring you to this settings page

click on the Task: button where is says prosogram and prosodic profile, select interactive prosogram from the list

click "Save intermediate data" so that the box is checked

clear the first text box at the top where it says "C:/corpus/*.wav"

click "OK"

that should open a file browser and from there

select your recording file and it will process it.

Processing the Data

It will have added a handful of objects to your Praat window and will look something like this

From here you can alter the sound file in playback, record the playback using audacity, view rough pitch contours etc.

what you're gonna wanna focus on are Pitch <filename> & PitchTier Stylization.

If you select "Pitch <filename>" and then click View and edit you end up with something like what's below. When played, this gives you the recorded pitches for the spoken portion of the entire recording in a series of little computerized snippets of voice. This is one base from which you can turn the speech to music.

You can also use "PitchTier stylization". When you select it, you will get the following options:

Play pulses plays the bare audio in a tinny computerized whining sound

Hum plays a legato version of the noises in the pink sound file before, a sort of synthesized voice.

Play sine plays the pure sine wave sound of each pitch in the recording

Modify, Synthesize, and Convert all change the sound in a range of different ways -- experiment to find out!

Whichever method of observing the pitch changes that you choose, once you've decided which is the best (probably through trying them all)

You can record it through audacity.

Once you've recorded it you can work it into a piece of sheet music on Musescore.

Method 2 (Harder but more accurate... and potentially cooler?)

The only initial difference is that instead of selecting "interactive prosogram", you select "prosogram and prosodic profile. Then you still check the box that says "save intermediate data", and still click OK. The difference this time around is that the processing time will be longer, and the file where you kept the recording will now have a whole slew of analysis readouts in it. Among these readouts will be text file spreadsheet that you can import to a spreadsheet manager like Excel, R, or Calc -- I used calc. All you have to do is set it as separated by tabs. This spreadsheet has a lot of information in it, the full list of which is on the prosogram user's guide. Among that information is the exact frequency of every syllable in the recording. With this info, you can find the 12-TET note of every syllable by using a reference list or algorithm. I did not have the programming know-how to set up a frequency-to-note generator for a collection of frequencies, so I did it by hand by referencing this list of the frequencies of notes. and I went through and translated all of the frequencies to note names.

Then if you look in your base folder, there will be a series of postscript files, if you save them as PDF's, you can annotate them by listening to the recording to weed out the false positives in the prosogram, and combine them to get something like this:

after that you can attach a note to each syllable by referencing the spreadsheet and use the PDF as a graphical notation or, transcribe it to musescore.

TURNING RECORDED SPEECH INTO MUSIC USING GREAT FREE SOFTWARE

Download the software

Next - Pick a speech sample

Processing the recording with Prosogram

Processing the Data

Method 2 (Harder but more accurate... and potentially cooler?)