The surface normal, $N$, is defined as $$N = - {\nabla \tau \over |\nabla \tau|}$$
where $$\nabla \tau = ( {\partial\: \tau \over \partial\: x}, {\partial\: \tau \over \partial\: y}, {\partial\: \tau \over \partial\: z} )$$
The surface normal is usually pre-computed and stored in a texture for lookup by the fragment shader. As with the lookup of $\tau$, the lookup of $N$ uses tri-linear interpolation by the texture unit.
There are several ways in which the gradient can be computed. We'll look at computing the ${ \partial\: \tau \over \partial\: x}$ term of $\nabla\tau$, but the other two terms are similar.
Recall that $${ \partial\:\tau \over \partial\: x}(x,y,z) = \lim_{\Delta x \rightarrow 0} {\tau(x+\Delta x,\ y,\ z) - \tau(x,\ y,\ z) \over (x + \Delta x) - (x)}$$
This method uses a finite $\Delta x$: $${ \partial\:\tau \over \partial\: x}(x,y,z) = {\tau(x+\Delta x,\ y,\ z) - \tau(x,\ y,\ z) \over \Delta x}$$
The offset, $\Delta x$, is the distance between texels in the 3D texture, $\tau$.
"Staircasing" artifacts are quite visible where the surface is nearly parallel to one of the axes of the volume:
This method centres the difference around $(x,y,z)$ in the $x$ direction: $${ \partial\:\tau \over \partial\: x}(x,y,z) = {\tau(x+\Delta x,\ y,\ z) - \tau(x-\Delta x,\ y,\ z) \over 2\ \Delta x}$$
This results in less "staircasing" and slightly better accuracy at $(x,y,z)$:
This method centres the difference around $(x,y,z)$ and adds lower-weight contributions from around the line in direction $x$, essentially averaging around the $x$ direction: $${ \partial\:\tau \over \partial\: x}(x,y,z) = \displaystyle \sum_q \mathrm{weight}(q)\ \tau(q)$$
for 27 positions $q$ around $(x,y,z)$: $$q\ \ \in\ \ \{ x-\Delta x,\ \ x,\ \ x+\Delta x \} \times \{y-\Delta y,\ \ y,\ \ y+\Delta y\} \times \{z-\Delta z,\ \ z,\ \ z+\Delta z\}$$
Below, the position of $(x,y,z)$ is shown as the dot in the middle and the weights at each position, $q$, around $(x,y,z)$ are shown as $0$, $a$, $b$, $c$, and their negatives. The weight at $(x,y,z)$ is not shown, but it is zero.
For the Zucker-Hummel method, the weights are $$a = {1 \over \sqrt{3}},\ \ b = {1 \over \sqrt{2}},\ \ c = 1$$
This give good gradients:
The Sobel method also looks at the 27 positions around $(x,y,z)$, but uses the weights $$a = \frac16,\ \ b = \frac12,\ \ c = 1$$
This appears to give even smoother gradients while preserving details:
Note that the central difference method, above, uses the weights $$a = 0,\ \ b = 0,\ \ c = 1$$
The Sobel method seems to give the best appearance. It's expensive to compute (as is the Zucker-Hummel method) but that does not matter if the computation is done before rendering.