pd implementation of 3D ambisonic

ritsch.sontacchi.zmölnig
institute for electronic music and acoustics

abstract

a modular threedimensional spatialization-engine that can easily be extended with roomsimulation-algorithms

ambisonic being a spatialization method based on intensity-panning seems to be extremely fine for spatial-sound-installations with >1 listeners. moreover, the specifique structure of ambisonic-signals makes possible a good and easy-to-do roomsimulation (reverberation)
  1. coding a mono-signal to 8-channel ambisonic
  2. generation and addition of 3d-reverberation
  3. listener's attitude
  4. rendering to a speaker-array
  5. miscellaneous

generating spatial signals

out of a mono-source with its spatial position (r, phi, theta with respect to the listener's position) in a virtual room, an ambisonic signal has to be rendered
sources were coded with 3rd degree ambisonic (1+6 channels: W,  X, Y, U, V, S, T) into the horizontal plane (xy-plane) and with 1st degree ambisonic (1+1 channels: W,  Z) into the vertical plane.

object slot

containing a mono-source (object player~) and the actual mono->ambisonic-coder (object ambisonic~)

object player~

a simple mono-source; can be replaced by any other mono-signalgeneration-unit

object ambisonic~

here the coding of a mono-signal to 8 disrecte ambisonic channels is done
the mono-signal (inlet1) is weighted with respect to the virtual position of the soundsource (inlet2) and a fadeaway-coefficient-triple (inlet3) to get the ambisonic signals
the weights for each channel are calculated in object pos2weights
object pos2weights
calculates the weights for ambisonic coding as follows::
W_factor 0.707
X_factor (r0/r)^k1 * cos (phi) * sin (theta)
Y_factor (r0/r)^k1 * sin (phi) * sin (theta)
Z_factor (r0/r)^k1 * cos (theta)
U_factor (r0/r)^k2 * cos (2*phi) * sin (theta)
V_factor (r0/r)^k2 * sin (2*phi) * sin (theta)
S_factor (r0/r)^k3 * cos (3*phi) * sin (theta)
T_factor (r0/r)^k3 * sin (3*phi) * sin (theta)

r, phi und theta are the polar coordinates of the soundsource with respect to the listener. note that the (phi=0)-axis is NOT equivalent to the viewing direction, but is a reference independent of the rotation of the listener's head (eg: the x-axis of the virtual room)
the fadeaway-coefficients k1, k2 and k3 define the decrementation of localization when incrementing the distance from source to ear
the refernce-distance r0 is the minimal distance between soundsource and soundsink.

reverberation and room-simulation

a 3dimensional reverberation signal has to be generated from the sum of all ambisonic-coded sources (the generation of reverbation-signals for each source seperately is highly computation-expensive and therefore has to be abandoned)
following is a description of a three step reverberation-algorithm fro generation of 3d-reverberation
  1. first reflections
  2. diffuse reflections
  3. distance-estimation

object 3dreverb~

the ambisonic reverberation-signal is generated out of the sum of all W-channels (inlet1); this sum is a mono-mixdown without any trace of spatial information
since an exact localization of virtual (due to the sound-reflection on surfaces) sources is useless (because of our normal hearing habit) and impossible by now (because of the summation of all sources) we will aply the 3d reverberation only to 1st degree ambisonic
some roomspecifique coefficients can be set (inlet2) :
dBgain_refl gain of the first reflections in dB
dBgain_diff gain of the diffuse reflections in dB
roomX length of the simulated room
roomY width of the simulated room
roomZ height of the simulated room
alpha_refl lowpass-attenuation for first reflections [0..100]
feedback feedback gain for the diffuse soundfield in dB
alpha_diff lowpass-attenuation for diffuse reflections [0..100]

the earposition x=0..roomX, y=0..roomY und z=0..roomZ (inlet3) has influence on the delay of the first refections as well as on the distance-estimation

object reflections~

first reflections are mainly responsible for the reception of the room-size
because the positions of the seperate soundsources is not known (any longer) and because only the sum of all signals is available by now, the position of the source of the sum is set to the listener's position. by doing so a good reception of the room-size can be achieved at minimal computation cost
the delay of the first reflection of a specifique boundary surface is therefore set to the time the sound needs from the ear-position to the said wall and back again

object diffuse~

generates a 10-step mono diffuse-reverberation that is spread over the six boundary surfaces

object hallnearwall~

in the neighbourhood of a wall the reverberation coming from this wall drops; this is taken into account by weighting the reverberation with the relative distance to the specifique surface
g(wall_i) (distance to wall_i)/(sum of all distances)

listener's attitude

object rotation~

up till now the assumption of a non-rotatable sound-sink was valid (that's why the (phi=0)-axis is not equivalent to the viewing direction). this was clever because the reverberation is independent of the head's rotation and dependent only of the surfaces and the position of reception
to realize a virtual turning of the head by psi°, the whole (reverberated) soundfield has to be rotated. luckily, this can easily be done by applying the multiplication of a rotation-matrix :
10000000
0cos (1*psi)sin (1*psi)00000
0-sin (1*psi)cos(1*psi)00000
00010000
0000cos (2*psi)sin (2*psi)00
0000-sin (2*psi)cos(2*psi)00
000000cos (3*psi)sin (3*psi)
000000-sin (3*psi)cos(3*psi)

rendering to a speaker-array

the speaker-array used for reproduction should be arranged as symmetrically as possible, ideally on a spheric surface
of course, the speakers have to calibrated to make differences in level and delay naught; it is desirable to do a spectral calibration too (by choosing appropriate speakers, equalizing)
generally spoken, the speaker signal LS for one speaker (azimuth alpha and elevation beta) is composed out of the 8 ambisonic channels (W, X, Y, Z, U, V, S, T) as follows ::
LS G1*W*p1 + G2*[X*p2*cos(alpha)*cos(beta) + Y*p3*sin(alpha)*cos(beta) + Z*p4*sin(beta) + U*p5*cos(2*alpha)*cos(beta) + V*p6*cos(2*alpha)*cos(beta) + S*p7*cos(3*alpha)*cos(beta) + T*p8*sin(3*alpha)*cos(beta)]

the weights G1 are G2 are depending on the number of speakers N and according to GERZON:
G1 = G2 = sqrt (8/3N)


thus resulting in a realroomspecifique decoder-matrix (eg: object a2s_coeff_wien)

object ambisonic2speaker~

applies any decoder-matrix to decode the 8 ambisonic channels to 16 output channels

object chan2spk16~

decodes one channel to 16 output channels

object speaker~

outputs the output-channels after applying level- and delay-calibration via a calibration-matrix (eg: object spk_calibrate_wien)
a mastergain is applied here too

miscellaneous


forum für umläute
Last modified: Thu Jul 27 14:45:27 CEST 2000