pis.html

Collapsing Shepard's Relative Pitch Circularity:
A Pitch Invariant Sound



David Van Brink
Apple Computer
1995

Background: Shepard's Tones

I was recently meditating on the subject of the paradoxical "musical staircase" effect, also known as "Shepard's Tones". This is a well known aural illusion in which a scale -- or a continuously rising tone -- appears to ascend indefinitely. Its explanation is straightforward: start with a frequency F, and construct a tone containing harmonics at 2F, 4F, 8F, ..., and also, F/2, F/4, F/8, ... . Small transpositions of this tone are perceived as a new note, but a shift of one octave yields a tone with all the same harmonics as the original tone. In practice, only a small number of harmonics within the range of human hearing are needed, if their intensities are adjusted appropriately to "fade in" at the low end, and "fade out" at the high end. [Shepard 1964]

The tone described above is invariant to shifting up or down by a full octave. Similar tones can be constructed which are invariant over other transpositions, sometimes with interesting effects. If, for example, the harmonics are placed at intervals just greater than an octave, then the perceived pitch of the tone decreases when the tone is shifted up by one octave. [Schroeder 1986, Risset 1986] Similarly, a tone which sounds the same when shifted up by three semitones would contain harmonics at 2^(1/4)F, 2^(2/4)F, 2^(3/4)F, ... , and F/2^(1/4), F/2^(2/4), F/2^(3/4), ... .

By placing the harmonics at appropriate intervals, we can generate a tone which "sounds the same" for any desired pitch shift.

A Pitch Invariant Sound

The question occurred to me, "are there sounds which sound the same when pitch-shifted by any amount?" Two trivial solutions are immediately obvious. The first, which can be derived by extension from the above explanation, is the sound containing all harmonics, or white noise. The second is the sound which contains no harmonics: silence.

Are there any other such sounds?

We proceed with a simple equation. Let f(t) represent a signal over time, so that f(pt) is that same signal played at rate p. We want a signal where

f(t) = f(pt).

The only solution to that is where f(t) is a constant,

f(t) = f(pt) = d.

This signal sounds like silence. However, our ears are insensitive to DC offsets, or the "zeroth harmonic". So we may look for a function which satisfies

f(t) = f(pt) + c.

This problem has a simple solution, namely, the logarithm function.

log t = log pt + c, where c = -log p.

The shape of the log function is invariant under rate change; stretching its graph by any amount horizontally merely shifts the function vertically. While this is a solution, and indeed produces a tone which "sounds the same at any pitch", it is not a very interesting one. First of all, the "sound" that a log function makes is mostly silence, as it is monotonically increasing. Second, the range of this function is -(infinity) to (infinity) , making it inconvenient to download into a keyboard sample player.

Both of these issues can be addressed by taking the sine of the logarithm function. Consider

f(t) = sin log t.

Inasmuch as a signal may be said to have a frequency at a particular time, the "instantaneous pitch" of any function of the form f(t) = sin g(t), at time t, is the derivative of g(t) divided by the period of the sine function, 2, giving F(t) = g'(t)/2. Letting F(t) stand for this interpretation of the instantaneous pitch of f(t), we have that

f(t) = sin log t
F(t) = log t / dt = 1/2t.

The function shifted by pitch p becomes

f(pt) = sin (log pt) = sin (log t + log p), which is an inaudible phase shift,
F(pt) = (log t + log p) / dt = 1/2t.

That is, the instantaneous pitch of sin log t, at time t, is the same as the instaneous pitch of sin log pt! This is a sound which "sounds the same" played at any pitch. As can be seen from F(t) above, the signal is a rapidly falling tone, beginning at infinite pitch, and rapidly descending. It is illustrated in figure 1.




For practical purposes, we will want to scale the function appropriately for a human listener. The function as given, if we take t in seconds, will have fallen from 20kHz at time t=8e-6 to 20 Hz at time t=8e-3. The sound will have passed through the entire range of human hearing in less than a hundredth of a second. Again, this is uninteresting when downloaded to a sampling keyboard. Scaling the function slightly, so that the frequency at time t = 1 second is 100 Hz, gives

f(t) = sin (200 log t),
F(t) = 100/t.

One final equation shall be presented. Suppose we wish to construct this waveform from discrete samples, with a sampling rate of R. Let us call the samples S1, S2, ... . Sn = f(n/R), giving

Sn = sin (200 log (n/R)) = sin (200 (log n - log R)).

The log R term provides only an inaudible phase shift of the sine function, and may be omitted without altering the perceived sound, leaving

Sn = sin (200 log n) for sample rate R.

The sample rate does not matter, this sound is safe at any speed.


Bibliography

Shepard, Roger. "Circularity in Judgements of Relative Pitch". Journal of the Acoustical Society of America V36n12 (December 1964) pp2346-2353.

Schroeder, Manfred R. "Auditory Paradox based on fractal waveform". Journal of the Acoustical Society of America V79n1 (January 1986) pp 186-189.

Risset, Jean-Claude. "Pitch and rhythm paradoxes: Comments on 'Auditory paradox based on fractal waveform'". Journal of the Acoustical Society of America V80n3 (September 1986) pp 961-962.


4/14/96.18:59 - 7/19/96.16:53