Thursday, February 20, 2014

VexFlow Goodies

As announced on the VexFlow mailing list, here are some fun things you might enjoy.

VexTab Audio Player

The VexTab audio player, which My VexFlow uses to play notated music, is now available in the VexTab repository (player.coffee). The player is based on MIDI.js and requires a set SoundFont instruments.

Tablature with Rhythm Stems

VexFlow now supports tablature with rhythm information thanks to Cyril Silverman.

The stems work well with modifiers such as dots and articulations (fermatas, strokes, etc.)

Vex Fretboard

The code to render fretboard diagrams is now available on Github. It lets you render guitar and bass fretboards like these:

VexWarp

VexWarp is a web audio experiment I've been working on which allows you to slow down, pitch shift, and loop your audio and video files, all in the browser. There's also a chrome plugin for offline access. At its core is a Javascript implementation of a phase vocoder variant.

Thats all folks!

Tuesday, January 15, 2013

Hear your Music!

My VexFlow now lets you hear what you write. Click the play button on the demo below to hear for yourself.

To enable playback, all you need to do is add the line options player=true to the beginning of your VexTab notation. Here's a bebop example:

See A Jazz Blues Progression for more musical examples.

To see the rest of the new features for this release, go to New in VexFlow.

Wednesday, January 09, 2013

New in VexFlow!

It's been a year since I've posted anything, so how about some news?

Lots of new features in VexFlow and VexTab.

My VexFlow

We launched My VexFlow, a content publishing platform for musicians based on VexFlow. The platform allows musicians, music teachers, students to create, publish, and share content with beautiful music notation, tablature, chord charts, diagrams, etc.

Here are some strawmen that I created to show what My VexFlow can do:

You can also embed content from My VexFlow into your own sites and blogs, just like you would a YouTube video.

Here's an example.

The above notation was created on My VexFlow, and embedded into this blog post.

A New VexTab

VexTab was overhauled once again, this time rewritten in CoffeScript and Jison. The code is much cleaner, more flexible, and more manageable.

The new VexTab is no longer just for guitarists -- it's got note/octave syntax in addition to the fret/string syntax. It also supports almost all VexFlow features, such as tuplets, rests, articulations, score symbols and modifiers, etc.

It's also moved to its own repository: github.com/0xfe/vextab

Guitar and Bass Fretboards

My VexFlow has support for guitar fretboards, which you can customize and use for descriptive diagrams.

Slash Notation

Render chord charts with rhythmic hints using slash notation. This is also handy for some drum notation.

Lots More!

This isn't all. There are score annotations, tuplets, stave annotations, etc. etc. See the complete list at: New in My VexFlow.

Experiment and enjoy!

Monday, January 02, 2012

More K-Means Clustering Experiments on Images

I spent a little more time experimenting with k-means clustering on images and realized that I could use these clusters to recolor the image in interesting ways.

I wrote the function save_recolor to replace pixels from the given clusters (replacements) with new ones of equal intensity, as specified by the rgb_factors vector. For example, the following code will convert pixels of the first two clusters to greyscale.

> save_recolor("baby.jpeg", "baby_new.jpg", replacements=c(1,2),
               rgb_factors=c(1/3, 1/3, 1/3))

It's greyscale because the rgb_factors distributes the pixel intensity evenly among the channels. A factor of c(20/100, 60/100, 20/100) would make pixels from the cluster 60% more green.

Let's get to some examples. Here's an unprocessed image, alongside its color clusters. I picked k=10. You can set k by specifying the palette_size parameter to save_recolor.

Here's what happens when I remove the red (the first cluster).

> save_recolor("baby.jpeg", "baby_new.jpg", replacements=1)

In the next image, I keep the red, and remove everything else.

> save_recolor("baby.jpeg", "baby_new.jpg", replacements=2:10)

Below, I replace the red cluster pixels, with green ones of corresponding intensity.

> save_recolor("baby.jpeg", "baby_new.jpg", replacements=1,
               rgb_factors=c(10/100, 80/100, 10/100))

And this is a fun one: Get rid of everything, keep just the grass.

> save_recolor("baby.jpeg", "baby_new.jpg", replacements=c(1,3:10))

I tried this on various images, using different cluster sizes, replacements, and RGB factors, with lots of interesting results. Anyhow, you should experiment with this yourselves and let me know what you find.

I should point out that nothing here is novel or new -- it's all well known in image processing circles. It's still pretty impressive what you can do when you apply simple machine learning algorithms to other areas.

Okay, as in all my posts, the code is available in my GitHub repository:

https://github.com/0xfe/experiments/blob/master/r/recolor.rscript

Happy new year!

Saturday, December 31, 2011

K-Means Clustering and Art

Cross posted from Google+.

My coworker at Google, Tony Rippy, has for a while been working on a fascinating problem. Take all the pixels of a photograph, and rearrange them so that the final image looks like an artist's palette -- something to which you can take a paintbrush and recreate the original image.

He's got some really good looking solutions which he might post if you ask him nicely. :-)

This turns out to be a tricky problem, and its hard to come up with an objective measure of the quality of any given solution. In fact, the quality is very subjective.

Anyhow, while studying the K-means clustering algorithm from ml-class, it struck me that k-means could be used to help with extracting a small palette of colors from an image. For example, by using each of the RGB channels as features, and euclidian distance as the similarity metric, one could run stock k-means to generate clusters of similar colors.

I coded up a quick R script to test this and got some interesting results. Here is an example of an image with its potential palette. Recall that the second image is simply the first image with the pixels rearranged.

I experimented with various values of k (number of clusters) for the different images. It turns out that it's pretty hard to algorithmically pre-determine this number (although there are various techniques that do exist.) The water villa pic above has 15 clusters, the nursery pic below has 20, and the cartoon has 6.

Note that this is only one subproblem of the original one; there is also the subproblem of placement, which I skirted around by simply arranging the colors in vertical bands across the final image. I'm pretty sure no artist's palette looks like this.

Also, these palettes aren't very "clean". Since the original pictures themselves are noisy, some of this noise arbitrarily creep into the various clusters. Working with a filtered version of the picture would be cheating, so we won't do that. But we might be able to extract the noisy pixels, put them in a special cluster, and run k-means on the remaining pixels.

Okay, enough talk. Here's the code: https://github.com/0xfe/experiments/blob/master/r/palette.rscript

First install cclust and ReadImages packages from CRAN, and try out the algorithm in an R console:

> source('/path/to/palette.rscript')
> plot_palette('/path/to/some/image.jpg')

This will produce a plot with the original image and the transformed one next to each other, like the attached pics below. It uses 10 clusters by default, for a palette of 10 colors. You can change this by passing the cluster count as the second parameter to plot_palette.

> plot_palette('/path/to/some/image.jpg', 20)

That's all folks!

Sunday, August 21, 2011

A Web Audio Spectrum Analyzer

In my last post, I went over some of the basics of the Web Audio API and showed you how to generate sine waves of various frequencies and amplitudes. We were introduced to some key Web Audio classes, such as AudioContext, AudioNode, and JavaScriptAudioNode.

This time, I'm going to go take things a little further and build a realtime spectrum analyzer with Web Audio and HTML5 Canvas. The final product plays a remote music file, and displays the frequency spectrum overlaid with a time domain graph.

The demo is here: JavaScript Spectrum Analyzer. The code for the demo is in my GitHub repository.

The New Classes

In this post we introduce three new Web Audio classes: AudioBuffer, AudioBufferSourceNode, and RealtimeAnalyzerNode.

An AudioBuffer represents an in-memory audio asset. It is usually used to store short audio clips and can contain multiple channels.

An AudioBufferSourceNode is a specialization of AudioNode that serves audio from AudioBuffers.

A RealtimeAnalyzerNode is an AudioNode that returns time- and frequency-domain analysis information in real time.

The Plumbing

To begin, we need to acquire some audio. The API supports a number of different formats, including MP3 and raw PCM-encoded audio. In our demo, we retrieve a remote audio asset (an MP3 file) using AJAX, and use it to populate a new AudioBuffer. This is implemented in the RemoteAudioPlayer class (js/remoteaudioplayer.js) like so:

RemoteAudioPlayer.prototype.load = function(callback) {
  var request = new XMLHttpRequest();
  var that = this;
  request.open("GET", this.url, true);
  request.responseType = "arraybuffer";
  request.onload = function() {
    that.buffer = that.context.createBuffer(request.response, true);
    that.reload();
    callback(request.response);
  }

  request.send();
}
Notice that the jQuery's AJAX calls aren't used here. This is because jQuery does not support the arraybuffer response type, which is required for loading binary data from the server. The AudioBuffer is created with the AudioContext's createBuffer function. The second parameter, true, tells it to mix down all the channels to a single mono channel.

The AudioBuffer is then provided to an AudioBufferSourceNode, which will be the context's audio source. This source node is then connected to a RealTimeAnalyzerNode, which in turn is connected to the context's destination, i.e, the computer's output device.

var source_node = context.createBufferSource();
source_node.buffer = audio_buffer;

var analyzer = context.createAnalyser();
analyzer.fftSize = 2048; // 2048-point FFT
source_node.connect(analyzer);
analyzer.connect(context.destination);
To start playing the music, call the noteOn method of the source node. noteOn takes one parameter: a timestamp indicating when to start playing. If set to 0, it plays immediately. To start playing the music 0.5 seconds from now, you can use context.currentTime to get the reference point.
// Play music 0.5 seconds from now
source_node.noteOn(context.currentTime + 0.5);
It's also worth noting that we specified the granularity of the FFT to 2048 by setting the analyzer.fftSize variable. For those unfamiliar with DSP theory, this breaks the frequency spectrum of the audio into 2048 points, each point representing the magnitude of the n/2048th frequency bin.

The Pretty Graphs

Okay, it's now all wired up -- how do I get the pretty graphs? The general strategy is to poll the analyzer every few milliseconds (e.g., with window.setInterval), request the time- or frequency-domain data, and then render it onto a HTML5 Canvas element. The analyzer exports a few different methods to access the analysis data: getFloatFrequencyData, getByteFrequencyData, getByteTimeDomainData. Each of these methods populate a given ArrayBuffer with the appropriate analysis data.

In the below snippet, we schedule an update() function every 50ms, which breaks the frequency-domain data points into 30 bins, and renders a bar representing the average magnitude of the points in each bin.

canvas = document.getElementById(canvas_id);
canvas_context = canvas.getContext("2d");

function update() {
  // This graph has 30 bars.
  var num_bars = 30;

  // Get the frequency-domain data
  var data = new Uint8Array(2048);
  analyzer.getByteFrequencyData(data);

  // Clear the canvas
  canvas_context.clearRect(0, 0, this.width, this.height);

  // Break the samples up into bins
  var bin_size = Math.floor(length / num_bars);
  for (var i=0; i < num_bars; ++i) {
    var sum = 0;
    for (var j=0; j < bin_size; ++j) {
      sum += data[(i * bin_size) + j];
    }

    // Calculate the average frequency of the samples in the bin
    var average = sum / bin_size;

    // Draw the bars on the canvas
    var bar_width = canvas.width / num_bars;
    var scaled_average = (average / 256) * canvas.height;

    canvas_context.fillRect(i * bar_width, canvas.height, bar_width - 2,
                         -scaled_average);
}

// Render every 50ms
window.setInterval(update, 50);

// Start the music
source_node.noteOn(0);
A similar strategy can be employed for time-domain data, except for a few minor differences: Time-domain data is usually rendered as waves, so you might want to use lot more bins and plot pixels instead of drawing bars. The code that renders the time and frequency domain graphs in the demo is encapsulated in the SpectrumBox class in js/spectrum.js.

The Minutiae

I glossed over a number of things in this post, mostly with respect to the details of the demo. You can learn it all from the source code, but here's a summary for the impatient:

The graphs are actually two HTML5 Canvas elements overlaid using CSS absolute positioning. Each element is used by its own SpectrumBox class, one which displays the frequency spectrum, the other which displays the time-domain wave.

The routing of the nodes is done in the onclick handler to the #play button -- it takes the AudioSourceNode from the RemoteAudioPlayer, routes it to node of the frequency analyzer, routes that to the node of the time-domain analyzer, and then finally to the destination.

Bonus: Another Demo

That's all folks! You now have the knowhow to build yourself a fancy new graphical spectrum analyzer. If all you want to do is play with the waves and stare at the graphs, check out my other demo: The Web Audio Tone Analyzer (source). This is really just the same spectrum analyzer from the first demo, connected to the tone generator from the last post.

References

As a reminder, all the code for my posts is available at my GitHub repository: github.com/0xfe.

The audio track used in the demo is a discarded take of Who-Da-Man, which I recorded with my previous band Captain Starr many many years ago.

Finally, don't forget to read the Web Audio API draft specification for more information.

Enjoy!

Saturday, August 13, 2011

Generating Tones with the Web Audio API

The Web Audio API is a W3C draft standard interface for building in-browser audio applications. Although the draft is well specified, it is almost impossible to find useful documentation on building applications with it.

In my quest to deeper understand HTML5 audio, I spent some time figuring out how the API works, and decided to write up this quick tutorial on doing useful things with it.

We will build a sine wave tone-generator entirely in JavaScript. The final product looks like this: Web Audio Tone Generator.

The full code is available in my GitHub repository: https://github.com/0xfe/experiments/tree/master/www/tone

Caveats

The Web Audio API is draft, is likely to change, and does not work on all browsers. Right now, only the latest versions of Chrome and Safari support it.

Onwards We Go

Getting started making sounds with the Web Audio API is straightforward so long as you take the time to study the plumbing, most of which exists to allow for real-time audio processing and synthesis. The complete specification is available on the W3C Web Audio API page, and I'd strongly recommend that you read it thoroughly if you're interested in building advanced applications with the API.

To produce any form of sound, you need an AudioContext and a few AudioNodes. The AudioContext is sort of like an environment for audio processing -- it's where various attributes such as the sample rate, the clock status, and other environment-global state reside. Most applications will need no more than a single instance of AudioContext.

The AudioNode is probably the most important component in the API, and is responsible for synthesizing or processing audio. An AudioNode instance can be an input source, an output destination, or a mid-stream processor. These nodes can be linked together to form processing pipelines to render a complete audio stream.

One kind of AudioNode is JavaScriptAudioNode, which is used to generate sounds in JavaScript. This is what what we will use in this tutorial to build a tone generator.

Let us begin by instantiating an AudioContext and creating a JavaScriptAudioNode.

var context = new webkitAudioContext();
var node = context.createJavaScriptNode(1024, 1, 1);
The parameters to createJavaScriptNode refer to the buffer size, the number of input channels, and the number of output channels. The buffer size must be in units of sample frames, i.e., one of: 256, 512, 1024, 2048, 4096, 8192, or 16384. It controls the frequency of callbacks asking for a buffer refill. Smaller sizes allow for lower latency and higher for better overall quality.

We're going to use the JavaScriptNode as the source node along with a bit of code to create sine waves. To actually hear anything, it must be connected to an output node. It turns out that context.destination gives us just that -- a node that maps to the speaker on your machine.

The SineWave Class

To start off our tone generator, we create a SineWave class, which wraps the AudioNode and wave generation logic into one cohesive package. This class will be responsible for creating the JavaScriptNode instances, generating the sine waves, and managing the connection to the destination node.

SineWave = function(context) {
  var that = this;
  this.x = 0; // Initial sample number
  this.context = context;
  this.node = context.createJavaScriptNode(1024, 1, 1);
  this.node.onaudioprocess = function(e) { that.process(e) };
}

SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = Math.sin(this.x++);
  }
}

SineWave.prototype.play = function() {
  this.node.connect(this.context.destination);
}

SineWave.prototype.pause = function() {
  this.node.disconnect();
}
Upon instantiation, this class creates a JavaScriptAudioNode and attaches an event handler to onaudioprocess for buffer refills. The event handler requests a reference to the output buffer for the first channel, and fills it with a sine wave. Notice that the handler does not know the buffer size in advance, and gets it from data.length.

The buffer is of type ArrayBuffer which is a JavaScript Typed Array. These arrays allow for high throughput processing of raw binary data. To learn more about Typed Arrays, check out the Mozilla Developer Documentation on Typed Arrays.

To try out a quick demo of the SineWave class, add the following code to the onload handler for your page:

var context = new webkitAudioContext();
var sinewave = new SineWave(context);
sinewave.play();
Notice that sinewave.play() works by wiring up the node to the AudioContext's destination (the speakers). To stop the tone, call sinewave.pause(), which unplugs this connection.

Generating Specific Tones

So, now you have yourself a tone. Are we done yet?

Not quite. How does one know what the frequency of the generated wave is? How does one generate tones of arbitrary frequencies?

To answer these questions, we must find out the sample rate of the audio. Each data value we stuff into the buffer in out handler is a sample, and the sample rate is the number of samples processed per second. We can calculate the frequency of the tone by dividing the sample rate by the length of a full wave cycle.

How do we get the sample rate? Via the getSampleRate() method of AudioContext. On my machine, the default sample rate is 44KHz, i.e., 44100 samples per second. This means that the frequency of the generated tone in our above code is:
freq = context.getSampleRate() / 2 * Math.PI
That's about 7KHz. Ouch! Lets use our newfound knowledge to generate less spine-curdling tones. To generate a tone of a specific frequency, you can change SineWave.process to:
SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = Math.sin(this.x++ / (this.sample_rate / 2 * Math.PI * this.frequency));
  }
}
Also make sure you add the following two lines to SineWave's constructor:
this.sample_rate = this.context.getSampleRate();
this.frequency = 440;
This initializes the frequency to pitch standard A440, i.e., the A above middle C.

The Theremin Effect

Now that we can generate tones of arbitrary frequencies, it's only natural that we connect our class to some sort of slider widget so we can experience the entire spectrum right in our browsers. Turns out that JQueryUI already has such a slider, leaving us only a little plumbing to do.

We add a setter function to our SineWave class, and call it from our slider widget's change handler.

SineWave.prototype.setFrequency = function(freq) {
  this.next_frequency = freq;
}
A JQueryUI snippet would look like this:
$("#slider").slider({
    value: 440,
    min: 1,
    max: 2048,
    slide: function(event, ui) { sinewave.setFrequency(ui.value); }
});

Going up to Eleven

Adding support for volume is straightforward. Add an amplitude member to the SineWave constructor along with a setter method, just like we did for frequency, and change SineWave.process to:

SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = this.amplitude * Math.sin(this.x++ / (this.sample_rate / 2 * Math.PI * this.frequency));
  }
}
Folks, we now have a full fledged sine wave generator!

Boo Hiss Crackle

But, we're not done yet. You've probably noticed that changing the frequency causes mildly annoying crackling sounds. This happens because when the frequency changes, discontinuity occurs in the wave, causing a high-frequency pop in the audio stream.

Discontinuity when Changing Frequencies

We try to eliminate the discontinuity by only shifting frequencies when the cycle of the previous frequency completes, i.e., the sample value is (approximately) zero. (There are better ways to do this, e.g., windowing, LPFs, etc., but these techniques are out of the scope of this tutorial.)

Although this complicates the code a little bit, waiting for the cycle to end significantly reduces the noise upon frequency shifts.

SineWave.prototype.setFrequency = function(freq) {
  this.next_frequency = freq;
}

SineWave.prototype.process = function(e) {
  // Get a reference to the output buffer and fill it up.
  var data = e.outputBuffer.getChannelData(0);

  // We need to be careful about filling up the entire buffer and not
  // overflowing.
  for (var i = 0; i < data.length; ++i) {
    data[i] = this.amplitude * Math.sin(
        this.x++ / (this.sampleRate / (this.frequency * 2 * Math.PI)));

    // This reduces high-frequency blips while switching frequencies. It works
    // by waiting for the sine wave to hit 0 (on it's way to positive territory)
    // before switching frequencies.
    if (this.next_frequency != this.frequency) {
      // Figure out what the next point is.
      next_data = this.amplitude * Math.sin(
        this.x / (this.sampleRate / (this.frequency * 2 * Math.PI)));

      // If the current point approximates 0, and the direction is positive,
      // switch frequencies.
      if (data[i] < 0.001 && data[i] > -0.001 && data[i] < next_data) {
        this.frequency = this.next_frequency;
        this.x = 0;
      }
    }
  }
}

The End

As mentioned in the beginning of this tutorial, a demo of the full code is available at http://0xfe.muthanna.com/tone/ and the entire source code is available at https://github.com/0xfe/experiments/tree/master/www/tone.

Do check out the W3C Web Audio API specification. Do check out the Mozilla document on JavaScript Typed Arrays.

Comments, criticism, and error reports welcome. Enjoy!

Thursday, June 30, 2011

I'm on Google+

And I'm quite digging it.

Yes, it isn't lost on me that as a Googler, I'm biased.

You can add me to your circles -- I'm right here.

To make up for this post's utter lack of content, I have you a haiku:

in my machine world,
all the digits of my toes,
are ones and zeroes.

I should stick to what I know.