Mosaic

A technique that tries to imitate photo-mosaicing on the auditory level. Given a visual template of the desired sound (essentially a sonogramme) and some input sounds, try to spread chunks of the input sound across the time-frequency plane of the template.

This algorithm is inspired by the idea of Synthrumentation by Klarenz Barlow. It was developed for the piece "Achrostichon" dedicated to Folkmar Hein.

The template time-frequency plane is given by a black-and-white sonagramme Image input file. Frequency is considered logarithmically spaced! Small number of bands (e.g. 1 to 3 per octave) usually produce more interesting results than a literal sonagramme. The sound material used to "paint" the template is given by the Audio input field. Note that this file must be a magnitude longer than the target sound file's duration. If necessary, use a sound editor to concatenate the sound material to itself a couple of times to produce the necessary length.

The target duration is given by Nominal duration. It is called "nominal" because the actual duration may differ a little bit due to the randomization processes. The time resolution of the input image is automatically derived from the nominal duration, e.g. when the image has a width of 100 pixels and the nominal duration is 10 seconds, each pixel corresponds to 100 ms.

The frequency span of the input image is given by Min freq. and Max freq., the bands per octave are automatically derived from these settings and the input image's height.

If you take the photo-mosaic metaphor, the algorithm now reads in chunks from the input sound file as "photos" and places them on the time-frequency canvas. The overlapping in the time-frequency plane is specified by Time Spacing and Freq Spacing (where 100% means dense packing without overlap, 50% means 1x overlap etc.). To produce less regular rhythmic and harmonic output, use some Time Jitter and Freq Jitter which is a ratio of the time chunk length and frequency band coverage in percent, by which the "photo" maybe moved from the nominal position.

The smoothness of the "grains" or "photos" depends on the fade-in and fade-out of the chunks, as given by Attack and Release.

White pixels in the input image constitute the loud parts, black pixels the silent parts. The corresponding minimum volume is given by the Noise Floor parameter. The parameter Max Boost specifies the maximum gain that can be applied to each "photo" to make it match the template pixel's volume.

Finally, if your input sound material has already been segmented, you can use markers to choose the photo-boundaries instead of the fixed-size time windows resulting from nominal duration divided by image-width. Each new marker in the input sound file marks the beginning of the next photo chunk. To use this approach, check the Read Markers option.


last modified: 10-Jun-09