Saturday, December 31, 2011

K-Means Clustering and Art

Cross posted from Google+.

My coworker at Google, Tony Rippy, has for a while been working on a fascinating problem. Take all the pixels of a photograph, and rearrange them so that the final image looks like an artist's palette -- something to which you can take a paintbrush and recreate the original image.

He's got some really good looking solutions which he might post if you ask him nicely. :-)

This turns out to be a tricky problem, and its hard to come up with an objective measure of the quality of any given solution. In fact, the quality is very subjective.

Anyhow, while studying the K-means clustering algorithm from ml-class, it struck me that k-means could be used to help with extracting a small palette of colors from an image. For example, by using each of the RGB channels as features, and euclidian distance as the similarity metric, one could run stock k-means to generate clusters of similar colors.

I coded up a quick R script to test this and got some interesting results. Here is an example of an image with its potential palette. Recall that the second image is simply the first image with the pixels rearranged.

I experimented with various values of k (number of clusters) for the different images. It turns out that it's pretty hard to algorithmically pre-determine this number (although there are various techniques that do exist.) The water villa pic above has 15 clusters, the nursery pic below has 20, and the cartoon has 6.

Note that this is only one subproblem of the original one; there is also the subproblem of placement, which I skirted around by simply arranging the colors in vertical bands across the final image. I'm pretty sure no artist's palette looks like this.

Also, these palettes aren't very "clean". Since the original pictures themselves are noisy, some of this noise arbitrarily creep into the various clusters. Working with a filtered version of the picture would be cheating, so we won't do that. But we might be able to extract the noisy pixels, put them in a special cluster, and run k-means on the remaining pixels.

Okay, enough talk. Here's the code: https://github.com/0xfe/experiments/blob/master/r/palette.rscript

First install cclust and ReadImages packages from CRAN, and try out the algorithm in an R console:

> source('/path/to/palette.rscript')
> plot_palette('/path/to/some/image.jpg')

This will produce a plot with the original image and the transformed one next to each other, like the attached pics below. It uses 10 clusters by default, for a palette of 10 colors. You can change this by passing the cluster count as the second parameter to plot_palette.

> plot_palette('/path/to/some/image.jpg', 20)

That's all folks!

Sunday, August 21, 2011

A Web Audio Spectrum Analyzer

In my last post, I went over some of the basics of the Web Audio API and showed you how to generate sine waves of various frequencies and amplitudes. We were introduced to some key Web Audio classes, such as AudioContext, AudioNode, and JavaScriptAudioNode.

This time, I'm going to go take things a little further and build a realtime spectrum analyzer with Web Audio and HTML5 Canvas. The final product plays a remote music file, and displays the frequency spectrum overlaid with a time domain graph.

The demo is here: JavaScript Spectrum Analyzer. The code for the demo is in my GitHub repository.

The New Classes

In this post we introduce three new Web Audio classes: AudioBuffer, AudioBufferSourceNode, and RealtimeAnalyzerNode.

An AudioBuffer represents an in-memory audio asset. It is usually used to store short audio clips and can contain multiple channels.

An AudioBufferSourceNode is a specialization of AudioNode that serves audio from AudioBuffers.

A RealtimeAnalyzerNode is an AudioNode that returns time- and frequency-domain analysis information in real time.

The Plumbing

To begin, we need to acquire some audio. The API supports a number of different formats, including MP3 and raw PCM-encoded audio. In our demo, we retrieve a remote audio asset (an MP3 file) using AJAX, and use it to populate a new AudioBuffer. This is implemented in the RemoteAudioPlayer class (js/remoteaudioplayer.js) like so:

RemoteAudioPlayer.prototype.load = function(callback) {
  var request = new XMLHttpRequest();
  var that = this;
  request.open("GET", this.url, true);
  request.responseType = "arraybuffer";
  request.onload = function() {
    that.buffer = that.context.createBuffer(request.response, true);
    that.reload();
    callback(request.response);
  }

  request.send();
}
Notice that the jQuery's AJAX calls aren't used here. This is because jQuery does not support the arraybuffer response type, which is required for loading binary data from the server. The AudioBuffer is created with the AudioContext's createBuffer function. The second parameter, true, tells it to mix down all the channels to a single mono channel.

The AudioBuffer is then provided to an AudioBufferSourceNode, which will be the context's audio source. This source node is then connected to a RealTimeAnalyzerNode, which in turn is connected to the context's destination, i.e, the computer's output device.

var source_node = context.createBufferSource();
source_node.buffer = audio_buffer;

var analyzer = context.createAnalyser();
analyzer.fftSize = 2048; // 2048-point FFT
source_node.connect(analyzer);
analyzer.connect(context.destination);
To start playing the music, call the noteOn method of the source node. noteOn takes one parameter: a timestamp indicating when to start playing. If set to 0, it plays immediately. To start playing the music 0.5 seconds from now, you can use context.currentTime to get the reference point.
// Play music 0.5 seconds from now
source_node.noteOn(context.currentTime + 0.5);
It's also worth noting that we specified the granularity of the FFT to 2048 by setting the analyzer.fftSize variable. For those unfamiliar with DSP theory, this breaks the frequency spectrum of the audio into 2048 points, each point representing the magnitude of the n/2048th frequency bin.

The Pretty Graphs

Okay, it's now all wired up -- how do I get the pretty graphs? The general strategy is to poll the analyzer every few milliseconds (e.g., with window.setInterval), request the time- or frequency-domain data, and then render it onto a HTML5 Canvas element. The analyzer exports a few different methods to access the analysis data: getFloatFrequencyData, getByteFrequencyData, getByteTimeDomainData. Each of these methods populate a given ArrayBuffer with the appropriate analysis data.

In the below snippet, we schedule an update() function every 50ms, which breaks the frequency-domain data points into 30 bins, and renders a bar representing the average magnitude of the points in each bin.

canvas = document.getElementById(canvas_id);
canvas_context = canvas.getContext("2d");

function update() {
  // This graph has 30 bars.
  var num_bars = 30;

  // Get the frequency-domain data
  var data = new Uint8Array(2048);
  analyzer.getByteFrequencyData(data);

  // Clear the canvas
  canvas_context.clearRect(0, 0, this.width, this.height);

  // Break the samples up into bins
  var bin_size = Math.floor(length / num_bars);
  for (var i=0; i < num_bars; ++i) {
    var sum = 0;
    for (var j=0; j < bin_size; ++j) {
      sum += data[(i * bin_size) + j];
    }

    // Calculate the average frequency of the samples in the bin
    var average = sum / bin_size;

    // Draw the bars on the canvas
    var bar_width = canvas.width / num_bars;
    var scaled_average = (average / 256) * canvas.height;

    canvas_context.fillRect(i * bar_width, canvas.height, bar_width - 2,
                         -scaled_average);
}

// Render every 50ms
window.setInterval(update, 50);

// Start the music
source_node.noteOn(0);
A similar strategy can be employed for time-domain data, except for a few minor differences: Time-domain data is usually rendered as waves, so you might want to use lot more bins and plot pixels instead of drawing bars. The code that renders the time and frequency domain graphs in the demo is encapsulated in the SpectrumBox class in js/spectrum.js.

The Minutiae

I glossed over a number of things in this post, mostly with respect to the details of the demo. You can learn it all from the source code, but here's a summary for the impatient:

The graphs are actually two HTML5 Canvas elements overlaid using CSS absolute positioning. Each element is used by its own SpectrumBox class, one which displays the frequency spectrum, the other which displays the time-domain wave.

The routing of the nodes is done in the onclick handler to the #play button -- it takes the AudioSourceNode from the RemoteAudioPlayer, routes it to node of the frequency analyzer, routes that to the node of the time-domain analyzer, and then finally to the destination.

Bonus: Another Demo

That's all folks! You now have the knowhow to build yourself a fancy new graphical spectrum analyzer. If all you want to do is play with the waves and stare at the graphs, check out my other demo: The Web Audio Tone Analyzer (source). This is really just the same spectrum analyzer from the first demo, connected to the tone generator from the last post.

References

As a reminder, all the code for my posts is available at my GitHub repository: github.com/0xfe.

The audio track used in the demo is a discarded take of Who-Da-Man, which I recorded with my previous band Captain Starr many many years ago.

Finally, don't forget to read the Web Audio API draft specification for more information.

Enjoy!

Saturday, August 13, 2011

Generating Tones with the Web Audio API

The Web Audio API is a W3C draft standard interface for building in-browser audio applications. Although the draft is well specified, it is almost impossible to find useful documentation on building applications with it.

In my quest to deeper understand HTML5 audio, I spent some time figuring out how the API works, and decided to write up this quick tutorial on doing useful things with it.

We will build a sine wave tone-generator entirely in JavaScript. The final product looks like this: Web Audio Tone Generator.

The full code is available in my GitHub repository: https://github.com/0xfe/experiments/tree/master/www/tone

Caveats

The Web Audio API is draft, is likely to change, and does not work on all browsers. Right now, only the latest versions of Chrome and Safari support it.

Onwards We Go

Getting started making sounds with the Web Audio API is straightforward so long as you take the time to study the plumbing, most of which exists to allow for real-time audio processing and synthesis. The complete specification is available on the W3C Web Audio API page, and I'd strongly recommend that you read it thoroughly if you're interested in building advanced applications with the API.

To produce any form of sound, you need an AudioContext and a few AudioNodes. The AudioContext is sort of like an environment for audio processing -- it's where various attributes such as the sample rate, the clock status, and other environment-global state reside. Most applications will need no more than a single instance of AudioContext.

The AudioNode is probably the most important component in the API, and is responsible for synthesizing or processing audio. An AudioNode instance can be an input source, an output destination, or a mid-stream processor. These nodes can be linked together to form processing pipelines to render a complete audio stream.

One kind of AudioNode is JavaScriptAudioNode, which is used to generate sounds in JavaScript. This is what what we will use in this tutorial to build a tone generator.

Let us begin by instantiating an AudioContext and creating a JavaScriptAudioNode.

var context = new webkitAudioContext();
var node = context.createJavaScriptNode(1024, 1, 1);
The parameters to createJavaScriptNode refer to the buffer size, the number of input channels, and the number of output channels. The buffer size must be in units of sample frames, i.e., one of: 256, 512, 1024, 2048, 4096, 8192, or 16384. It controls the frequency of callbacks asking for a buffer refill. Smaller sizes allow for lower latency and higher for better overall quality.

We're going to use the JavaScriptNode as the source node along with a bit of code to create sine waves. To actually hear anything, it must be connected to an output node. It turns out that context.destination gives us just that -- a node that maps to the speaker on your machine.

The SineWave Class

To start off our tone generator, we create a SineWave class, which wraps the AudioNode and wave generation logic into one cohesive package. This class will be responsible for creating the JavaScriptNode instances, generating the sine waves, and managing the connection to the destination node.

SineWave = function(context) {
  var that = this;
  this.x = 0; // Initial sample number
  this.context = context;
  this.node = context.createJavaScriptNode(1024, 1, 1);
  this.node.onaudioprocess = function(e) { that.process(e) };
}

SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = Math.sin(this.x++);
  }
}

SineWave.prototype.play = function() {
  this.node.connect(this.context.destination);
}

SineWave.prototype.pause = function() {
  this.node.disconnect();
}
Upon instantiation, this class creates a JavaScriptAudioNode and attaches an event handler to onaudioprocess for buffer refills. The event handler requests a reference to the output buffer for the first channel, and fills it with a sine wave. Notice that the handler does not know the buffer size in advance, and gets it from data.length.

The buffer is of type ArrayBuffer which is a JavaScript Typed Array. These arrays allow for high throughput processing of raw binary data. To learn more about Typed Arrays, check out the Mozilla Developer Documentation on Typed Arrays.

To try out a quick demo of the SineWave class, add the following code to the onload handler for your page:

var context = new webkitAudioContext();
var sinewave = new SineWave(context);
sinewave.play();
Notice that sinewave.play() works by wiring up the node to the AudioContext's destination (the speakers). To stop the tone, call sinewave.pause(), which unplugs this connection.

Generating Specific Tones

So, now you have yourself a tone. Are we done yet?

Not quite. How does one know what the frequency of the generated wave is? How does one generate tones of arbitrary frequencies?

To answer these questions, we must find out the sample rate of the audio. Each data value we stuff into the buffer in out handler is a sample, and the sample rate is the number of samples processed per second. We can calculate the frequency of the tone by dividing the sample rate by the length of a full wave cycle.

How do we get the sample rate? Via the getSampleRate() method of AudioContext. On my machine, the default sample rate is 44KHz, i.e., 44100 samples per second. This means that the frequency of the generated tone in our above code is:
freq = context.getSampleRate() / 2 * Math.PI
That's about 7KHz. Ouch! Lets use our newfound knowledge to generate less spine-curdling tones. To generate a tone of a specific frequency, you can change SineWave.process to:
SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = Math.sin(this.x++ / (this.sample_rate / 2 * Math.PI * this.frequency));
  }
}
Also make sure you add the following two lines to SineWave's constructor:
this.sample_rate = this.context.getSampleRate();
this.frequency = 440;
This initializes the frequency to pitch standard A440, i.e., the A above middle C.

The Theremin Effect

Now that we can generate tones of arbitrary frequencies, it's only natural that we connect our class to some sort of slider widget so we can experience the entire spectrum right in our browsers. Turns out that JQueryUI already has such a slider, leaving us only a little plumbing to do.

We add a setter function to our SineWave class, and call it from our slider widget's change handler.

SineWave.prototype.setFrequency = function(freq) {
  this.next_frequency = freq;
}
A JQueryUI snippet would look like this:
$("#slider").slider({
    value: 440,
    min: 1,
    max: 2048,
    slide: function(event, ui) { sinewave.setFrequency(ui.value); }
});

Going up to Eleven

Adding support for volume is straightforward. Add an amplitude member to the SineWave constructor along with a setter method, just like we did for frequency, and change SineWave.process to:

SineWave.prototype.process = function(e) {
  var data = e.outputBuffer.getChannelData(0);
  for (var i = 0; i < data.length; ++i) {
    data[i] = this.amplitude * Math.sin(this.x++ / (this.sample_rate / 2 * Math.PI * this.frequency));
  }
}
Folks, we now have a full fledged sine wave generator!

Boo Hiss Crackle

But, we're not done yet. You've probably noticed that changing the frequency causes mildly annoying crackling sounds. This happens because when the frequency changes, discontinuity occurs in the wave, causing a high-frequency pop in the audio stream.

Discontinuity when Changing Frequencies

We try to eliminate the discontinuity by only shifting frequencies when the cycle of the previous frequency completes, i.e., the sample value is (approximately) zero. (There are better ways to do this, e.g., windowing, LPFs, etc., but these techniques are out of the scope of this tutorial.)

Although this complicates the code a little bit, waiting for the cycle to end significantly reduces the noise upon frequency shifts.

SineWave.prototype.setFrequency = function(freq) {
  this.next_frequency = freq;
}

SineWave.prototype.process = function(e) {
  // Get a reference to the output buffer and fill it up.
  var data = e.outputBuffer.getChannelData(0);

  // We need to be careful about filling up the entire buffer and not
  // overflowing.
  for (var i = 0; i < data.length; ++i) {
    data[i] = this.amplitude * Math.sin(
        this.x++ / (this.sampleRate / (this.frequency * 2 * Math.PI)));

    // This reduces high-frequency blips while switching frequencies. It works
    // by waiting for the sine wave to hit 0 (on it's way to positive territory)
    // before switching frequencies.
    if (this.next_frequency != this.frequency) {
      // Figure out what the next point is.
      next_data = this.amplitude * Math.sin(
        this.x / (this.sampleRate / (this.frequency * 2 * Math.PI)));

      // If the current point approximates 0, and the direction is positive,
      // switch frequencies.
      if (data[i] < 0.001 && data[i] > -0.001 && data[i] < next_data) {
        this.frequency = this.next_frequency;
        this.x = 0;
      }
    }
  }
}

The End

As mentioned in the beginning of this tutorial, a demo of the full code is available at http://0xfe.muthanna.com/tone/ and the entire source code is available at https://github.com/0xfe/experiments/tree/master/www/tone.

Do check out the W3C Web Audio API specification. Do check out the Mozilla document on JavaScript Typed Arrays.

Comments, criticism, and error reports welcome. Enjoy!

Thursday, March 31, 2011

A Music Theory API

Most of my work last week consisted of writing music theory code. VexFlow now has a neat little music theory API, that gives you answers to questions like the following:

  • What note is a minor 3rd above a B?
  • What are the scale tones of a Gb Harmonic Minor?
  • What relation is the C# note to an A Major scale? (Major 3rd)
  • What accidentals should be displayed for the perfect 4th note of a G Major scale?
  • etc.

The API is part of VexFlow, and can be used independently of the rendering API. Take a look at music.js in the VexFlow GitHub repository for the complete reference. There's also a handy key management library for building scores in keymanager.js.

I'm currently working on updating the VexFlow Tutorial with a quickstart on the music theory API, but meanwhile, here are some teasers (pulled straight out of the tests).

// What does C note consist of?
var parts = music.getNoteParts("c");
equals(parts.root, "c");
equals(parts.accidental, null);

// What does C# note consist of?
var parts = music.getNoteParts("c#");
equals(parts.root, "c");
equals(parts.accidental, "#");

// What is a flat-5th above C?
var value = music.getRelativeNoteValue(music.getNoteValue("c"),
                                       music.getIntervalValue("b5"));
equals(value, music.getNoteValue("gb");
equals(value, music.getNoteValue("f#");

// What is the C quality of a Db?
equals(music.getRelativeNoteName("c", music.getNoteValue("db")), "c#");

// What are the tones of a C major scale?
var c_major = music.getScaleTones(
      music.getNoteValue("c"), Vex.Flow.Music.scales.major);
// result: ["c", "d", "e", "f", "g", "a", "b"]

// What is the interval between a C and a D?
equals(music.getCanonicalIntervalName(music.getIntervalBetween(
     music.getNoteValue("c"), music.getNoteValue("d"))), "M2");


Thanks to the theory support, we now have smarter Accidentals in the standard notation stave that VexTab generates.

Smarter Accidentals

Notice how the accidentals are correctly picked according to the rules of standard notation? Yep, so do I.

We also have lots more tests -- over 750 of them! Try running them on your browser and tell me how long it takes.

That's all folks!

Sunday, March 27, 2011

Prettier Tablature

Spot the difference:

Before

After
Still can't tell? Let me help you out. We have:
  • Slightly greater spacing between the tablature stave lines. This makes it more consistent in appearance with printed tablature.
  • Stave lines are cleared before fret numbers are rendered, vastly improving readability.
  • Font sizes for fret numbers and annotations are bigger.
  • Associated notation and tablature staves are connected with a vertical bar on the left.
  • Micro-changes in spacing between fret numbers, effects, annotations, etc.
All these changes have been incorporated into TabDiv, and pushed to GitHub. See more on the VexTab Tutorial page. Enjoy!

Thursday, March 24, 2011

The VexFlow Tutorial (...and other goodies)

Finally... finally... finally... we have the humble beginnings of what could be considered "documentation".

I present to you The VexFlow Tutorial.

Although still in its infancy, the tutorial covers everything you need to start using VexFlow in your own code. My plan for the next few weeks is to make this document as comprehensive as possible, and write up a separate API reference.

I hope that this tutorial will help developers understand VexFlow better, and enable them to build new and interesting libraries, parsers, and applications.

The entire tutorial is stored in the Git repo; feel free to send me your corrections or other updates.

About time, I know.

In other news, we have had a few contributions to both VexFlow and VexTab. A big thanks to airfrog, wiseleyb, and adamf for getting these done.

First, we have the ability to render dotted notes.

Dotted Notes

Then we have key signatures.

Key Signatures

We also have time signatures.

Time Signatures

This includes the really crazy time signatures too.

Whacky Time Signatures

Finally, we have support for different types of clefs.

Alternate Clefs

But wait... there's more. All of this is supported in VexTab by way of new tabstave parameters. Take a look at the updated VexTab Tutorial for the details.

Wednesday, March 23, 2011

Editing XML and HTML in Vim

I just discovered the Vim xmledit plugin.

With features like tag-completion, auto-wrapping and unwrapping, quick navigation, etc., it has, in a matter of minutes, measurably decreased my level of frustration while editing markup in Vim.

Installation

The quickest way to install the plugin is by downloading the latest .vba from the plugin site, and run the following commands:

$ vim xmledit.vba
:so %

You also need to edit your .vimrc and add the following:

filetype plugin on

This installs the plugin into your .vim/ftplugin directory, and enables it for .xml files. To enable it for other file types, create a link to the file with the new extension name in the same directory. (Copying the file also works.)

$ cd ~/.vim/ftplugin
$ ln -s xml.vim html.vim
$ ln -s xml.vim xhtml.vim

Usage


The plugin supports the various Vim modes in interesting ways.

In insert mode, when you finish a tag (with the > character), it will be autocompleted and the cursor placed in-between the tags.

If you immediately type another >, it will close the tag on its own line, and place the cursor on a new line right in between.

The % key jumps between the start and end of a tag.

The \% combination jumps between opening and closing tags. (Note that backslash is the default key-prefix for scripts and plugins to use. You can change this prefix with the mapleader setting.)

If you select text (for example with v), and type \x, it will prompt you to wrap the text with a custom tag.

Typing in \d unwraps surrounding tags from the cursor.

Learning More


Type :help xml-plugin for help and more information.

Yay for another awesome Vim plugin. You can see my entire Vim profile in my Evil Tomato GitHub Repository.

Sunday, March 20, 2011

On Twitter

It turns out I'm on twitter. I have absolutely no idea what I'm going to do with it.

Maybe I'll tweet every time time the compiler yells at me... or when my kernel panics... or when my browser frowns.

Whatever it is, I'm on twitter: twitter.com/11111110b

That's seven ones and a zero, followed by a bee.