### MSc project ideas - to be supervised by Keith Briggs 2008

I work at BT Research in Martlesham. The ideal arrangement is for you, the MSc student, to work here for all or most of your project. We are at a very lively technology park, and you would have at least daily meetings with me and very good computing facilities (i.e. linux). But if this is not feasible, we can negotiate another arrangement.

These are only rough sketches for projects. I find that it is better to start this way, and then fine-tune the detail to suit the student's abilities and interests. I prefer to leave some open-endedness, to provide a research challenge. A good MSc project can result in a publication.

I like mathematical computing. Most of these projects have that theme.
The emphasis is on the concrete. This way of working often involves
*experimental mathematics*, in which we treat the computer as a
laboratory for developing and testing conjectures. This need not mean the
work lacks rigour - some algorithms on which we would work would involve only
integer arithmetic, and thus the results are exact. Even when floating-point
arithmetic is needed, there are more reliable ways than hardware arithmetic
available. So, these projects do not generally involve conventional
numerical analysis.

I can probably only supervise 2 (maybe 3) of this projects at one time. So if you are interested, contact me asap!

For past projects I have supervised, see here.

#### York presentation 2008 Jan 30

pdf | pdf with pause points | pdf 3x3 single-sheet handout

#### Bath presentation 2007 Dec 12

pdf | pdf with pause points | pdf 3x3 single-sheet handout

#### 1. Semidefinite programming and graph theory

Semidefinite programming (SDP) is a kind of generalized linear
programming. Many practical optimization problems can be formulated
as SDPs. There has been rapid progress in the last 20 years on
understanding the theory of SDP, but the development of SDP software
that is convenient to use has lagged behind. Good software is now
available. This project would use sdpsol for small test problems, and
then move to DSDP for large problems. The aim of the project is to
see what performance one can achieve on real graph theory problems of
the types that come up in network modelling. Such problems include
the Lovasz theta number (a lower bound for the chromatic number), the
maximal stable set problem, and the maximum cut problem. An important
outcome of the project would be a determination of how the computation
time scales with problem size. Geometric optimization problems could
also be studied.
Skill in C programming is needed. This would be a good project for
someone wanting a career in practical optimization.

#### 2. Singular value decomposition updating

The Singular value decomposition (SVD) of a matrix is a basic computation in numerical linear algebra, with many applications such as in statistics and signal processing. Updating the SVD usually means recomputing the factors after adding a row or column, and algorithms for this have been much studied.

However, I have an application in signal processing, where only the approximate SVD is required, but it is required to rapidly update the factors after a small change in the matrix (perturbation theory, if you like). I have an idea how to do this, and the project would be to develop the idea and implement it in software.

#### 3. Singular value decomposition methods for Lyapunov exponents

Lyapunov exponents are basic quantities in dynamical systems theory; roughly speaking they quantify the average divergence of orbits. Standard methods use QR or SVD decompositions, and need the local linearization of the flow. However, a method proposed by Dieci and Elia shows how to get the singular values much more directly, without performing a full SVD on each time-step.

This project would improve on this by using *automatic differentiation* (AD) techniques to compute the local linearizations. I have much experience in AD and can supply software for this step.

Skill in C++ programming is needed.

#### 4. Automatic differentiation (AD) methods for ODE sensitivity analysis

ODEs are routinely solved numerically by Runge-Kutta and similar methods.
But often we would like to know the derivatives with respect to initial values
and with respect to parameters, of the solution. This means solving another
linear system of ODEs along the solution. It can be tedious to calculate this
linear system by hand, but fortunately it is possible to automate this with AD
techniques.

This project would develop software for this purpose and evaluate its
efficiency. C++ programming is needed (as operator overloading is necessary).

#### 5. Fast random selection

Consider the problem: a large number *N* of objects are presented to us one by one, and we wish to either:

Select exactly some specified number *n≤N* of them
Select each with some specified probability *p*
The problem becomes hard to solve efficiently when *p* is small,
because to generate a random number which usually results in a rejection is
inefficient. It would be better to skip over a block of items. Ways to do
these were proposed by Vitter in ACM Trans Math Soft 13, 58 (1987).

This project would investigate these methods and check their efficiency in
practice. There are applications to the generation of large random graphs.

#### 6. Random sampling of set partitions

A partition of [n]={0,1,...,n-1} is an assignment each element to a subset called a block, without regard to the labelling of the block, or order of elements within a block. The block structure of a partition may be indicated by listing the elements and separating blocks with a symbol such as |.
Thus for [5], 01|2|34 and 2|10|43 are the same partition.

If there are k blocks, we speak of a k-partition. The number of k-partitions of [n] is given by the Stirling number of the second kind S(n,k), which can be computed from the recursions S(n,k)=S(n-1,k-1) + kS(n-1,k), S(n,1)=S(n,n)=1.
The total number of partitions of [n] is given by the sum over k of S(n,k), and this is called a Bell number B(k).

There are algorithms for generating all partitions in a Gray code order, due to Ehrlich and Ruskey. In such a sequence, only one element moves to a new block on each step. Beyond about n=15, there are too many partitions to allow us to construct all of them in reasonable time.

Thus, for large sets, simulations must rely on a uniform random sampling procedure. This project would investigate such methods and produce a well-designed C library for use in other projects.

#### 7. Fast counting

This project is inspired by the paper HyperLoglog: the analysis of a near-optimal cardinality estimation algorithm of Flajolet et al. How do we estimate (rapidly, and with minimal storage) the number of distinct items in a very large set (or data stream)? In other words, the problem concerns estimating the cardinality of a multiset. Flajolet et al. previously developed their *loglog* algorithm, and I worked on an efficient C implementation of this. They have now improved this with the *hyperloglog* algorithm, and this project would be to implement this and compare its performance in practice with alternatives.

The algorithm should have practical applications in informatics; for
example, counting the number of different packet types in a network.

#### 8. Random sampling of unlabelled structures

This project is inspired by the paper Boltzmann
Sampling of Unlabelled Structures of Flajolet et al. I quote from their
abstract: "Boltzmann models from statistical physics, combined with methods
from analytic combinatorics, give rise to efficient algorithms for the
random generation of unlabelled objects. The resulting algorithms
generate in an unbiased manner discrete configurations that may have
nontrivial symmetries, and they do so by means of real-arithmetic
computations."

This project would develop software for some of these methods, and measure
its efficiency in practice. There are important applications in statistical
computing.

#### 9. Convex optimization in python

Convex optimization is
very important in many application areas, and is also theoretically very
elegant. But these techniques are not used as often as should be in
practice because of the lack of software which satisfies all the requirements
of being free and open-source (which rules out matlab-based software),
easy-to-use, powerful, and well designed. This seems to be changing with
the python-based CVXOPT and CVXMOD.

This project would survey the field of convex optimization and investigate
and evaluate this software. We would hope to try applications in the field
of radio technology.

#### 10. Geographical computations

In this project, I developed an algorithm for computing intervisibility of points on the earth's surface, using elevation data from a shuttle flight. This project would take this further, to compute the entire region visible from a given point. Computer graphics would be used to draw the region. There is actually some real maths here - we have to handle nonlinear 3d coordinate systems.

The application is to the historical study of ancient defence systems.

#### 11. Statistics of Roman roads in Britain

I have started some computations in this direction here.

The Romans built roads all over Britain, and I would like some statistical characterization of the network - the diameter of the graph, the distribution of distances to the nearest road, and so on.

I have a database, and a major part of this project would be to classify all the roads, probably as major, minor and doubtful. We would want separate statistical information for each category. The maths involved is interesting computational geometry, and fast algorithms are needed for things like the point-in-polygon test, and for distance to the nearest of a given finite set of straight line segments.

#### 12. Name-fashion dynamics

A half-baked idea - needs more development!

Given names follow fashion, and only fashion. While some names (John, Susan) stay popular for long periods, typically "new" names (Ryan, Grace) shoot up in popularity, and then die off after a year or two.

I have a database of the 100 most popular names in England from 1994 to 2006.The project would be to see if there is a "universal" popularity curve; that is, can the curves be linearly scaled to be the same shape?

Amazingly, data is also available for Suffolk in the 14th century (when for
example, Edmund suddenly became popular). Did fashions then follow the same
sort of trends?