Volterra Series

Vito Volterra (1860-1940) was one of the founding fathers of functional analysis. At the turn of the twentieth century, he introduced the notion of functions of lines, which are defined over a functional space. He studied their derivative and obtained results on integral equations. Practically, a Volterra series is a polynomial functional expansion similar to a Taylor series that provides an approximation of weakly nonlinear systems. One of the first application to nonlinear system analysis is due to Wiener in the 1940s, who developed a method for determining the nonlinear response to a white noise input. Nowadays, this approach is widely used for system identification in many domains such as electrical engineering or biological sciences.


Given an input space of signals X, an output space of signals Y, and a nonlinear input-output operator F : X \rightarrow Y, we look for a polynomial approximation of F. The Volterra series provide such an approximation (under certain conditions) as follows :

\displaystyle \boxed{y(t) = \sum_{n = 1}^{\infty} \int_{- \infty}^{\infty}\cdots \int_{- \infty}^{\infty} h_n ( \tau_1 , \cdots , \tau_n ) x ( t - \tau_1 ) \cdots x ( t - \tau_n ) d\tau_1 \cdots d\tau_n}

where x ( t ) is the input, y ( t ) is the output, and each h_n ( \tau_1 , \cdots , \tau_n ) is called the n-th order Volterra kernel. This is equivalent to the operator formulation

\displaystyle y = F[x] = H_1 [x] + H_2 [x] + \cdots + H_n [x] + \cdots

where each H_n is called the n-th order Volterra operator

\displaystyle H_n [x] ( t ) = \int_{- \infty}^{\infty} \cdots \int_{- \infty}^{\infty} h_n ( \tau_1 , \cdots , \tau_n ) x ( t - \tau_1 ) \cdots x ( t - \tau_n ) d\tau_1 \cdots d\tau_n

The first order operator coincides with the standard convolution

\displaystyle H_1 [x](t) = \int_{- \infty}^{\infty} h_1 ( \tau ) x ( t - \tau ) d\tau

whereas the second order operator can be considered as a two-fold convolution

\displaystyle H_2 [x] ( t ) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} h_2 ( \tau_1 , \tau_2 ) x ( t - \tau_1 ) x ( t - \tau_2 ) d\tau_1 d\tau_2

and so on for higher order operators.


The formulation of the Volterra kernels in the frequency domain is obtained by applying the Fourier transform, namely

\displaystyle \widehat{h_n}\left(\omega_1,\ldots,\omega_n\right)=\int h_n\left(\sigma_1,\ldots,\sigma_n\right) e^{-i \left(\omega_1 \sigma_1+\ldots+\omega_n \sigma_n\right)} d\sigma_1 \ldots d\sigma_n

The response to a multi-sinusoidal input x(t)=\sum_k x_k e^{i \omega_k t} is

\displaystyle H_n[x](t) = \sum_{i_1,\ldots,i_n}x_{i_1} \ldots x_{i_n} e^{i t\left(\omega_{i_1} + \ldots + \omega_{i_n}\right)} \int h_n\left(\sigma_1,\ldots,\sigma_n\right) e^{-i\left(\omega_{i_1}\sigma_1 + \ldots + \omega_{i_n} \sigma_n\right)} d\sigma_1 \ldots d\sigma_n

\displaystyle H_{n}[x](t)=\sum_{i_{1},\ldots,i_{n}}x_{i_{1}}\ldots x_{i_{n}}e^{i t\left(\omega_{i_{1}}+\ldots+\omega_{i_{n}}\right)}\widehat{h_n}\left(\omega_{1},\ldots,\omega_{n}\right)

We deduce the formulation in the frequency domain

\displaystyle \boxed{\widehat{y}(\omega)=\widehat{h_1}(\omega) \widehat{x}(\omega) + \sum_{\omega_i+\omega_j=\omega} \widehat{h_2}\left(\omega_i,\omega_j\right) \widehat{x}\left(\omega_i\right) \widehat{x}\left(\omega_j\right) + \cdots}


The mathematical foundations of the Volterra series can be established from the Stone-Weierstrass theorem. We give the general principle below following the approach of the Rugh’s book.


Basically, the approximation of continuous functions by polynomials is established by the Weierstrass theorem.

Weierstrass Theorem

Every continuous function f : [a,b] \rightarrow \mathbb{R} can be uniformly approximated by polynomials such that

\displaystyle \lim_{n \rightarrow \infty} P_{n}(x) = f(x)

The uniform topology ensures that the limit of continuous functions is a continuous function. However, uniform approximation of continuous functions by polynomials is not always possible. For example, the function \sin(x) cannot be uniformly approximated by polynomials over \mathbb{R}, essentially because \sin(x) is bounded and \mathbb{R} is not compact, whereas non constant polynomials are not bounded over \mathbb{R}. In contrast, on a compact domain, polynomials are bounded and often useful to approximate continuous functions.

In fact, the set \mathcal{P}\left([a,b]\right) of polynomials is a subalgebra of the Banach algebra \mathcal{C}\left([a,b]\right) of real valued continuous functions over [a,b]. Thus, the Weierstrass theorem is equivalent to say that \mathcal{P}\left([a,b]\right) is dense in \mathcal{C}\left([a,b]\right). More generally, we have the Stone-Weierstrass theorem.

Stone-Weierstrass theorem

Let X be a compact Hausdorff space, and \mathcal{A} a subalgebra of \mathcal{C}(X) which contains the constant function 1 and separates the points. Then \mathcal{A} is dense in \mathcal{C}(X).

The key aspect of this theorem is that \mathcal{A} must separate the points. This means that if x_1,x_2 \in X are distinct then there exists f \in \mathcal{A} such that f\left(x_1\right) \neq f\left(x_2\right). Otherwise, no function can set different values to x_1 and x_2. For example, the algebra of constant functions does not separate the points when there are at least two points, it is not dense in \mathcal{C}. A polynomial algebra, however, does separate the points.


In order to be able to use the Stone-Weierstrass theorem, the operator F must be defined as a real valued continuous function on a compact space. Thus, we choose X as a compact space. Let X \subset L^2\left([0,T]\right) be a space of square integrable functions satisfying the two following properties :

(1) There exists a constant K > 0 such that for all x \in X

\displaystyle \int_{0}^{T} \left|x(t)\right|^2 dt \leq K

(2) For all \epsilon > 0, there exists \delta > 0 such that for all x \in X and |\tau| < \delta

\displaystyle \int_{0}^{T} \left|x\left(t+\tau\right)-x\left(t\right)\right|^2 dt < \epsilon

The resulting space X is compact, the proof is given in Liusternik and Sobolev.


Although F is not a real valued function, it can be seen as such in the following way. Let Y=\mathcal{C}\left([0,T]\right) be the space of real valued continuous  functions over [0,T] with norm

\displaystyle \Vert y \Vert = \max_{t \in [0,T]} \left| y(t) \right|

Let F : X \rightarrow Y and P : X \rightarrow Y be two continuous, stationary and causal operators such that for all x \in X

\displaystyle \left| F(x) - P(x)\right|_{t=T} < \epsilon

There exists t_1 \in [0,T] such that

\displaystyle \max_{t \in [0,T]} \left|F(x)-P(x)\right| = \left|F(x)-P(x)\right|_{t=T-t_1}

Hence, when posing x_1(t)=0 on [0,t_1] and x_1(t)=x\left(t-t_1\right) on \left[t_1,T\right]

\displaystyle \Vert F(x)-P(x) \Vert = \left|F\left(x_1\right)-P\left(x_1\right)\right|_{t=T} < \epsilon

Therefore, F and P can be seen as real valued functions X \rightarrow \mathbb{R}, it suffices to consider F(x) and P(x) at t=T. The majoration \Vert F(x)-P(x) \Vert < \epsilon is valid for all t \in [0,T].


The last step consists of defining the algebra \mathcal{A} of continuous, stationary and causal operators X \rightarrow Y. We take as generators the operator

\displaystyle P_1=1

and all operators

\displaystyle P_2(x)=\int_{0}^{t} h(\sigma) x(t-\sigma) d\sigma

The algebra \mathcal{A} is obtained by repeated addition, scalar multiplication and multiplication of the generators, which leads to operators

\displaystyle P(x)=h_0 \\ + \sum_{i=1}^{n_1} \int_{0}^{t} h_{1,i}\left(\sigma_1\right) x\left(t-\sigma_1\right) d\sigma_1 \\ + \sum_{i=1}^{n_2} \sum_{j=1}^{n_3} \int_{0}^{t} \int_{0}^{t} h_{2,i}\left(\sigma_1\right) h_{3,j}\left(\sigma_2\right) x\left(t-\sigma_1\right) x\left(t-\sigma_2\right) d\sigma_1 d\sigma_2 \\ + \cdots

The stationarity and causality of operators is trivial. The continuity is obtained if we assume that h is square integrable over [0,T], moreover the algebra \mathcal{A} does separate the points (see Rugh). The Stone-Weierstrass theorem concludes that a continuous, stationary, causal system F : X \rightarrow Y can be approximated by a continuous, stationary, causal polynomial system P : X \rightarrow Y. However, the compactness of X is quite restrictive in practice.


Consider the nonlinear system

\displaystyle H [ x ] = \frac{1}{10 + x}

for an input built as superposition of pure frequencies 0.1, 0.4 , 1 , 1.7 , 2.9 , 5.2 , 6.7 , 8.9 , 13.2 , 16.4. The input is shown below

The following figure shows the first order approximation (in green) versus the second order approximation (in red) of the output (in blue). Higher order contributions have been neglected. Clearly, the accuracy increases with the number of Volterra kernels.


  1. The Volterra and Wiener Theories of Nonlinear Systems, Martin Schetzen, Krieger Publishing Company, 2006
  2. Theory of Functionals and of Integral and Integro-Differential Equations, Vito Volterra, Dover Publications, 2005
  3. Nonlinear System Theory, Wilson J.Rugh, The Johns Hopkins University Press, 1981, Web version, 2002
  4. Elements of Functional Analysis, Liusternik and Sobolev, 1961

2 thoughts on “Volterra Series

  1. Marta says:

    This was a very helpful summary of Volterra series, in particular the example as I couldn’t find any others on different websites.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s