Using big-data and genetic analysis to improve plant breeding around the world

The software changing how we grow staples

A range of biological and physical stressors need to be considered, especially with the increasing impact of those associated with climate change. A team of University of Wollongong statisticians at the National Institute for Applied Statistics Research Australia (NIASRA), led by Senior Professor Brian Cullis are developing a new software platform, DWReml, to produce accurate and reliable information from the analysis of large data-sets from breed selection experiments conducted across a range of locations, spanning multiple years. 

These data help farmers improve their crops by precisely pinpointing the best plant varieties that either thrive in a wide range of conditions or are specifically suited to certain environments that are crucial for farmers' success. 


Senior Professor Brian Cullis joined UOW in 2011 after a 30-year career with NSW Department of Primary Industries. 

“The whole goal of my life started when I was in my early 20s when I was lucky enough to go to a talk by a Nobel Peace Prize winner named Norman Borlaug. He really inspired me.” 

Borlaug developed a high-yield, disease-resistant variety of wheat that is credited with saving more than a billion people worldwide from starvation. In the 1960s he worked on a United Nations-sponsored breeding program in Mexico, Pakistan, and India that advanced agricultural techniques and greatly improved food security. 

“He spent his life working purely and simply to provide better wheat growing capabilities to developing countries so that they can feed themselves. He spent his life working for public good,” Professor Cullis said. 

One of the landmarks of Professor Cullis’ career is a software product called ASReml – A spatial REML program which fits the linear mixed model using the method of “Average Information Restricted Maximum Likelihood”.

What is ASReml? 

ASReml is a statistical software package commonly used for fitting linear mixed models (LMM) and other related statistical analyses. It is particularly popular in the fields of genetics, agriculture, and biology for analysing complex data sets with hierarchical or nested structures.

The software is frequently used in genetic research to analyse traits influenced by both genetic and environmental factors. It can be employed in quantitative genetic studies to estimate genetic parameters such as heritability, genetic correlations, and breeding values.

In short, it helps farmers decide what varieties to plant, or which animals to breed, that will best match the specific conditions of their farm. 

Professor Cullis said that the goal of a statistician is to turn data into reliable, usable information.
“ASReml has stood the test of time, it still is a leading software product, 30 years after it was developed. It's the software of choice for breeders to estimate genetic variance or genetic parameters so that you can identify causal genes - genes are controlling yield, quality or disease resistance.” 

A man in a grey polo shirt and black cap is standing against a farm gate with his arms folded

Senior Professor Brian Cullis brings more than 30 years industry experience to the project. Photo: Paul Jones

However, over time, as with many technological tools, ASReml is beginning to become obsolete. 

“More recently what has been killing ASReml is that it was developed the ’90s, and now we have much bigger data sets. 

“The world has experienced what people call the genomic revolution whereby we can take a take a sample of DNA to then take a profile of the genome and uncover single nucleotide polymorphisms (SNPs).” 

SNPs play a vital role in shaping the genetic diversity of plant populations and influencing traits important for plant growth, development, and adaptation. Understanding these genetic variations is valuable for plant breeding, conservation, and the development of crops with improved characteristics.

“There are millions and millions of SNPs. We're now getting the ability to map genotyping at a very, very low cost, but it’s leading to millions of data points. 

“Breeders are wanting to throw more and more genotypic data at the ASReml platform because they're collecting more data sets over many years and locations in plant breeding – it's assembling massive amounts of data.

“People are absolutely drowning in this genotype, data and ASReml, being a 30-year-old technology, cannot cope. The model was dying a slow death because the technology and the hardware and the way that we'd written it was 30 years old. It needed to be updated.”

Wheat – a worldwide staple food 

Adaptation of a particular variety of wheat is extremely specific to the environment that it is grown in. 

Australian Grain Technologies (AGT) is one of Australia’s largest plant breeding companies, and a market leader in wheat genetics. They have supported Professor Cullis’ DWReml project with $140,000 towards research and development. 

Dr Adam Norman, Bioinformatics Coordinator and Wheat Breeder at Australian Grain Technologies, said, “In order for plant breeding programs to make genetic gain they require accurate, relevant and representative phenotype data on large genetically diverse breeding populations. These data underpin effective selection.

“From a plant breeder’s perspective it is extremely frustrating to know that they are limited to substandard practices due to computational constraints within existing software, as such large gains could be made if these constraints were lifted.

“The new software platform currently under development has shown itself capable of lifting these constraints and subsequently providing a sizable benefit to plant breeding programs and AGT is proud to be able to support this open-source software development project,” Dr Norman said.

Enter DWReml 

As the quantities of genomic data have exploded, Professor Cullis’ team has embarked on a new project called DWReml - Dependent Wollongong Restricted Maximum Likelihood. 

The big innovation with DWReml will be its capability to put together huge sets of data and analyze them instantly. It will also create ambitious models that effectively handle the ongoing variations in crop growth conditions across different parts of Australia.

“We’ve come up with smarter ways to speed up the computing process effectively using parallel computing improvement, iterative schemes, and methods for increasing capacity,’ Professor Cullis said. 

“Breeders don’t have eight weeks to analyse data – they’ll miss the sowing season. They need the information immediately to be able to make selections for planting next year.   

“We’re putting together bigger data sets than I had ever thought were possible, providing much more accurate information and turning it around in real time.” 

The feedback from beta testing has been resoundingly positive – some are calling the new platform revolutionary. 

“I did a job for a company who would ordinarily send their data away to the US and it would take months to process so they would not get results in time. I did an analysis for them and they were just blown away - they just could not believe it.

“The data set was three times the size of what they would normally send away and DWReml processed it in three days. 

“This product will be a game changer. People will be able to do things now that they never thought they would be able to do with genomic data. They'll be able to make decisions much quicker and more accurately and they'll have much more confidence in what they're doing.”

A man in a grey polo shirt and black cap is standing sitting against a red tractor

Photo: Paul Jones

Pacific Seeds, Australia’s leading seed supplier, have trialed the beta version of DWReml and have been equally impressed with the technology. 

“The accelerated ability to process incredibly large data sets that were not possible before now will provide better genomic predictions and modelling with greater accuracy and higher confidence, leading to the release of more adapted varieties to the marketplace. The speed in which DWReml runs the analysis is a game changer, providing outputs in days not weeks. This will revolutionise how we approach the analysis each year,” said Junior Plant Breeder, Willow Liddle. 

“In addition to this, DWReml will have positive flow on effects throughout the business as varietal advancement decisions will be able to be made earlier with more precision. All these benefits combined provide better outcomes across the whole Australian agricultural Industry, and potentially has positive global implications with increased germplasm adaptation.” 

Open source – feed the world 

One of Professor Cullis’ biggest disappointments of the ASRmel journey was that the platform was privatised in the 1990s. 

Inspired by Noman Borlaug, Professor Cullis’ lifelong goal is to make his software free.

“Unfortunately, because it's not free, a lot of these people in developing countries just don't have access ASReml technology, and so they can't make progress with breeding programs that we can in wealthier countries, and they don’t have the tools to improve food security. It's really terrible. 

“I would love to distribute a free version to low-income countries. It’s something that I've always wanted,” Professor Cullis said. 

One of the strengths of the new DWReml is that it will be open-source software available to farmers all around the world to contribute data to and, most importantly, to benefit from its insights. 

“Our objective with DWRmel was to make this open source so that everybody in the world can at least access it and hopefully make a difference in their in their local research and breeding programs.”  

Factoring in the impact of climate change 

Another huge benefit of the new software will be its ability to inform decisions that are being made more complex by climate change. 

As climate data point to changes from region to region, farmers will be able to use DWRmel to make a sound inferential framework - using that data from one region to make an informed decision about another. 

AGT has shared a large dataset that includes information from various settings to explore how factors like rainfall, temperature, evaporation, soil conditions, and the presence of diseases affect plant growth. This dataset will help us understand how different plant varieties interact with their surroundings and environmental conditions. By analysing these data, breeders can make informed decisions when creating new plant crosses, considering the important factors influenced by climate change. The goal is to improve the success of breeding efforts in adapting plants to changing environmental conditions. 


Want more UOW feature stories delivered to your inbox?