L.A. Smith. (1997) Proc International School of Physics ``Enrico Fermi", Course CXXXIII, pg 177--246, Societa Italiana di Fisica, Bologna, Italy.

All theorems are true. All models are wrong. And all data are inaccurate. What are we to do?

We must be sure to remain uncertain. In 1901, the year of Enrico Fermi's birth, it was well known that the sun could be only a few years old, inasmuch as a back of the envelope calculation showed that even if the sun were made of the highest quality coal, its chemical energy and gravitational energy would both be exhausted well before the time-scales claimed by geologists. Newton's Laws had successfully prophesied the existence of Neptune from irregularities in Uranus's orbit, and the planet Vulcan had been observed (between Mercury and the sun) which might explain irregularities in Mercury's orbit. While Neptune is still with us, Vulcan was, perhaps, a misinterpreted sunspot. Throughout Fermi's lifetime, astrophysical phenomena and physical experiments, often by his hand, repeatedly did things which could not happen, at least according to the ``Laws of Physics" of the day. An unshakable belief in the applicability of those laws would have made progress impossible.

What has this to do with nonlinear dynamics and the analysis of time series? Nonlinear time-series analysis often resembles an experimental science: some technique is applied to a data set, an interesting observation is made, and a discussion ensues as to whether or not the observation is sound. Are we following Le Verrier in naming Vulcan in the hope of bringing Mercury's orbit closer into agreement with Newton's Laws, or are we following him in discovering Neptune? Is the uncertainty in the available data? or in our current understanding of the Physics? In these lectures we will examine methods which aim to maintain our uncertainty rather than adopt unsubstantiated conclusions. Applications range fr om testing the reliability of algorithm by analysing data of known origin, to propagating uncertainty in an initial condition under forecast models in order to examine the reliability of a particular forecast.

Nonlinearity plays a central role in data analysis, modelling, and predicting physical systems. We are often faced with questions like:

Are these two signals related?

Is there a deterministic/periodic component in this signal?

Did this data set originate from a strange attractor?

Is this system chaotic?

What is the ``limit of predictability" of this system?

Which is the better model for this system?

Our goal will be to examine the feasibility of answering these questions, rather than to demonstrate the current crop of algorithms for doing so.

Figure 1 shows two data sets with ``similar dynamics." Is there a causal connection between these two series? Most likely not. Is there a statistically significant relationship between just these two series? For almost any simple null hypothesis: yes. In these lectures, we will examine methods which attempt to quantify the significance of a variety of data analysis techniques in the context of nonlinear, perhaps chaotic, phenomena. There are limits, of course, to our ability to determine whether or not a given observation is significant. Sometimes we simply must require more data. The important thing is to remain uncertain!

One useful role for simple models is to help us maintain our uncertainty in the light of ``promising" results. The historical record of sun-spots is one of the most studied time-series, and we will draw heavily from the work of Spiegel and Wolf, Weiss, and Casdagli \etal. A wide ranging report on the relationship between sunspots and a variety of phenomena can be found in Stetson, which includes a number of interesting (then) out-of-sample forecasts. Figure 2a shows the sunspot record while Figure 2b is a particular sample from the stochastic sunspot simulation of Barnes \etal, which will be described in Section 4.1. How can we use this model to inform our uncertainty? Figure 3a shows a three-dimensional reconstruction of the sunspot data, produced with the techniques of Singular Spectrum Analysis (SSA), which is discussed in the references and the contribution of Ghil and Taricco to this volume. It has been observed that this view is reminiscent of the chaotic Rossler attractor: Is this observation evidence that the dynamics of sunspots are low dimensional deterministic chaos? To try to find out, we may, for example, repeat the experiment with data from the Barnes model, which we know (by construction) is stochastic and hence does not display deterministic chaos. The result is shown in Figure 3b where again we recover structure reminiscent of the Rossler attractor. We conclude that such structure will occur in the analysis of any data set that ``looks like" those of Figures 2, whether they arise either from a stochastic or from a deterministic processes; hence this observation provides little additional information on the dynamical process governing sun-spots. Our uncertainty is maintained.

In the following section, we introduce the basic framework for nonlinear dynamics. Dimensions and Lyapunov exponents are introduced and it is proven, by example, that chaos need not be difficult to predict. By chaos I shall mean deterministic chaos. A dynamical system is deterministic in the sense of Laplace when the future trajectory of the system is completely determined by the exact initial condition and the equations of motion. If the effective growth-rate of infinitesimal uncertainties is exponential in time, such a system is chaotic. This exponential-average-growth is reflected by positive Lyapunov exponents, but as illustrated in Section 2.2.1, positive Lyapunov exponents perse place no practical limits on predictability. Takens' Theorem is stated in Section 2.4, and the encouragement it provides for methods of reconstructing dynamics from data is discussed.

Section 3 contrasts the various meanings of ``prediction." In these lectures we are primarily concerned with forecasts either from data-driven models or from full simulations; the difference between developing the best model and extracting the best forecasts from a given model are explored, as are the extreme limitations of employing least square error criteria to define the best model. The initial condition is a different beast from an observation of the initial condition: observational data are never exact. Given the true initial condition of a chaotic system, the probability of an event is either zero or one. Determinism yields uniqueness. But given only an (inexact) observation, this probability may take on other values, even if our model is perfect. For this reason we are encouraged to make probabilistic forecasts even given good models of deterministic systems. If our models are not so good, the situation is even more interesting. Ensemble forecasts for perfect models, laboratory systems, and the Earth's atmosphere are discussed in Sections 3.5, 3.7 and 7.3 respectively. The relevant probability distribution functions (PDFs) often display complicated non-Gaussian structure. This makes model evaluation less trivial than taking the model with the least squared prediction error. Alternatives are discussed in Section 5.

In practice, it is often the case either that we do not understand the underlying physics of a system well enough to build first-principles models, or that such models would be too complex to be deployed. If we are lucky enough to have a great deal of data from such a system, the techniques of Section 3 can be used to reconstruct its dynamics directly from the data. But how can we know that we have ``a great deal of data" ? Section 4.1 begins with the presentation of tests for data sufficiency and the robustness of scaling exponents estimates, and concludes by suggesting tests for the self-consistency of dynamical models.

There are many applications we pass over without comment, and the nonlinear filtering of signals [9,10] was almost one of these. The study of nonlinear systems, like the systems themselves, has too many interesting degrees of freedom. It is important to keep the driving question in sight, and distinguish between the distinct goals of studying a phenomena, testing an algorithm, analysing a data set and making the best forecast given the current state of the art(s).

For those who read only introductions while scanning figure captions, the gist of these lectures are (1) that statistics play an important role in helping us recognise the shortcomings of data analysis, and a dubious role in locating strengths; (2) that algorithms should be tested to destruction, so that at least some of their weaknesses are learned; (3) that tests of self-consistency are more accessible than tests of absolute truth, which is unsurprising if we consider even the ``Laws of Physics" as the analogies of physics while we probe their limitations; and (4) that truly deep insights can only be supported by data not considered in the analysis. Until such data are obtained, we must remain uncertain, if hopeful. Regardless of the level of statistical skill and physical insight at hand, and regardless of the high level of statistical significance at which, for example, two data sets can be shown unlikely to be unrelated, promising results often evaporate given a glimpse of out-of-sample data. In the case of sunspots and the number of Republicans in the Senate, additional data can be obtained; contrast Figure 2 and 22. As noted by Robert Boyle in the quotation that introduces Section 4, this was the case 300 years ago. And it will most likely be the case 300 years hence. Yet we may hope to reduce, in both magnitude and number, the disappointme nt of our expectations through the careful maintenance of our uncertainty.

E-mail: lenny@maths.ox.ac.uk

Last updated: 14 Feb 2001