Saturday, November 30, 2019

Special Event - Big Data Challenges in Physics

This month was a special event jointly organised with the Institute of Physics who have been focussing on the theme of big data in 2019.


Dr Craig McNeile's slides are here: (pdf). A video of his talk is on the group's youtube channel: (video).

Prof Robert J Hicken's slides are here: (pdf). A video of his talk is on the group's youtube channel: (video).


Talk 1: Using Supercomputers to Search for the Breakdown of the Standard Model of Particle Physics

Dr Craig McNeile, a Lecturer in Theoretical Physics, University of Plymouth, gave an introductory overview of some of the big open questions in particle physics.


Craig started by clearly setting the scene and the problem. Observing how the universe moves suggests there is more matter than is suggested by the amount of light we observe.


In fact the observed matter is only 5% of the matter that is there causing gravitational effects. The invisible 95% is called dark matter - it doesn't interact with the electromagnetic radiation so we can't see it even with instruments that record x-rays or infrared.

That dark matter must be composed of sub-atomic particles that we haven't yet discovered. And the search for dark matter is centred on the search for these particles.

Scientists have, over the years, discovered many particles, most recently by smashing particles together at high energy to see if new particles are created with that energy. The higher the energy, the greater the chance of creating larger mass particles. But higher energy colliders are more expensive, the Large Hadron Collider at CERN cost just under £4 billion to construct.

Detecting particles sounds simple but is a complex process of deduction from vast amounts of data generated by detectors around a collision. The amount of data generated is huge and the calculations required to deduce the presence of particles, with sufficient confidence, requires super computing resources.

To guide the search, we need theories that predict how matter is composed and organised. The Standard Model describes all known particles, and all interactions (weak, strong, electromagnetic) except gravitation. Because it is a highly symmetric model, it describes particles that were not observed initially but were found years after they were predicted. The Higgs boson is a recent notable example.

The Standard Model doesn't predict dark matter - so new theories are needed. The so-called "string theory" is one of these, as is QCD, quantum chromodynamics.

The search for particles is complicated by the fact that the Heisenberg uncertainty principle means that for a short period of time, particles/energy can pop into existence and then disappear - fluctuations in the vacuum. These short-lived particles do have an effect on longer lived particles, which becomes significant when we're trying to sift through observations to try to find the subtle effects of new particles.


Modelling the theoretical dynamics of QCD, so they can be compared with observations, is mathematically involved, and in practice is done using the Monte Carlo random sampling method which makes feasible otherwise difficult calculations to be done.

These discrete calculations are huge and requires a distributed computing architecture to achieve. Craig indicated how GPUs are particularly effective, and how compute nodes use MPI and other message passing protocols to coordinate distributed computation. He explained how computations over 2000 cores is fairly common!

Craig discussed how significant effort is put into optimising code to take advantage of the hardware. He suggested that a compute efficiency of 38% of the theoretical peak was achieved. This indicates how difficult it is to optimise code to best take advantage of the available hardware's capabilities.

A real problem is that optimising code to a particular hardware is a human-intensive and expensive task, and the benefits don't last as new better hardware architectures emerge regularly. The portability of code across computers, particularly with respect to compute efficiency, is a hard problem.

Craig also gave us an insight into the challenge of data storage, organisation, and transport. The datasets are typically large, petabyte scale, and computation needs it to be local to the compute nodes. This means it is often being transferred across relatively slow networks.

Craig underlined the importance of visualisation to make sense of the data and computation results. The following shows an effect on the vacuum between a quark and anti-quark as they are separated, something which is not so easily understood merely by looking at the numerical results of the simulations.


Interestingly, Craig explained how the economics og high performance computing mean that building your own clusters are always cheaper than trying to achieve similar performance with consumer clouds such as AWS, Azure or Google.

The Dirac academic project is a collaboration between universities and include systems with thousands of nodes, eg Cambridge with 1152 Skylake nodes and Edinburgh with 4116 Xeon processors!

Despite the huge theoretical effort and large sums of money invested in experiments and computational power, the search for dark matter continues!

One member of the audience asked an interesting question as to why people search for something for which there seems to be such thin or subtle evidence or clues. In my opinion, the search for the truth is an inherent human motivation, particularly when that truth is mathematically beautiful.


Talk 2: The Challenge of Storing Ever Bigger Data

Prof Robert J Hicken, a Professor of Condensed Matter Physics, University of Exeter, gave an overview of the challenge of storing and retrieving ever larger volumes of data, and the need for new technologies which can scale to billions of users, are faster and energy efficient.


Rob set the scene with figures showing the huge amounts of data we store, to the scale of exabytes. Billions of people are uploading videos and photos constantly, and demand these are stored for potential later retrieval!

The volume of data isn't the key problem on its own, the energy required to organise and retrieve that data is significant.

"One Google search uses ~ 1 kJ of energy (equivalent to running a 60 W light bulb for 60s), and Google carries out more than 1 trillion searches per year"

This is a huge amount of energy when we think about how many google searches happen globally in any time interval.

Perhaps surprisingly, spinning platter hard disks (HDD) maintain an edge over solid state storage (SSD) in terms of the density of data per unit area. This is important in large installations such as cloud-scale data centres. HDDs also remain cheaper per volume of data too.


As demand for storage grows we need to explore how this technology can either evolve or change more drastically.

Rob's research has this motivation at heart - greater storage density, at lower energy consumption, and faster where possible.

Rob took us back to basics, explaining how traditional storage, tape and HDDs use materials which can be locally magnetised in a one of two directions to denote the binary bits, 1 and 0.


Surely the smaller these regions the more data we can store per area?

The problem is that the smaller the 'grains' the more susceptible to thermal noise they become. This could be addressed by using materials that require larger energy to switch between 1 and 0, but this then means we haven't addressed the energy efficiency challenge.

Rob summarised the trade-offs as a triangle, a trilemma of storage density, writability and thermal stability, stating we can only have 2 of the 3.


Rob then explained that this limiting triangle can be addressed by new paradigms for the data recording process.

One approach is to use heat assisted magnetic recording, HAMR, to lower the energy barrier to writing data. Another is to use microwave excitation which has a similar effect, MAMR.

Rob explained that Seagate will be manufacturing HAMR heads in the UK and showed a video illustrating how the precision required us achieved using a planar fibre optic guided laser to focus energy via a gold peg.


This is an example of new, and relatively advanced techniques, being industrialised and commercialised to meet demand.

Rob showed a chart showing HAMR to be used for the period 2019-2023, beyond which a heated dot method is predicted until 2025. Beyond that is an open question.

One area of excitement is the ultrafast demagnetisation effect, where an ultrashort laster pulse of 100 femtoseconds caused a halving of magnetisation in a nickel film in less than 1picosecond.


This is far faster than any other observed change in magnetic state - and is a very promising basis for future storage technologies. Typical retrieval times for data storage and retrieval is about a nanosecond yet can be transmitted over fibre optics at Gbits/s.

Expanding wider, Rob shared phenomena where circularly polarised optical pulses also produced very fast changes in magnetisation in materials - and such light could lead to even more energy efficient storage which also does away with the need for managing heat egress from the HAMR technique.


Thoughts

The evening was a special event with invited scientists, and this attracted a larger audience. The broader impact is that science is brought closer and more directly to the community and this is always a good thing - not only from the engagement between academia, industry and community, but also because it makes science a more realistic career option for many.

For this reason I was particularly pleased to see younger members of secondary school age at the event!


As Toby of Headforwards also stated, any successful region needs a healthy vibrant and active science and technology scene. It makes the region more externally visible, attracts investment, and provides pathways for high-skills careers.

More fundamentally, it allows people to meet and this real-life social network is the basis of all successful ecosystems and economies. And supporting this is the aim of Data Science Cornwall.

Personally I was very pleased to have strengthened links with the Institute of Physics, and hope to do more with them in the region.


Acknowledgements

This event was jointly organised with the Institute of Physics, the leading professional body and learned society for physics in the UK and Ireland.

The event was also supported by the Cornwall Science Community, formerly the Cornwall branch of the British Science Association.

Ongoing advice and refreshments sponsorship were kindly provided by Headforwards.

Thanks also to Falmouth University for continuing to hosting us.