Unpacking the genome


Every human cell contains more than two metres of twisted, tightly packed DNA, so switching on the right genes at the right time is a major challenge.

Virtually all the cells in your body share the same set of genetic instructions – around 20,000 genes, encoded in long strands of DNA called chromosomes. But all your cells are not the same. Different types of cells need to use specific sets of genes so they can carry out their particular functions in the body. For example, a liver cell needs to activate genes encoding digestive enzymes and switch off the instructions for making neurotransmitters, while a brain cell has to do the opposite.

What’s more, the DNA in every human cell is more than two metres long. It is coiled, twisted and stuffed into the nucleus – a structure smaller than the point of a pin – along with a multitude of proteins. Somehow, in amongst all this molecular confusion, the cell must find and activate the right genes at the right time.

This arrangement of DNA in the nucleus is similar to a tangled ball of knitting yarn. Some parts are tightly squeezed together, while others are loosely packed. Finding and activating a specific gene is like hunting for a specific short stretch of yarn in amongst the tangled mess, releasing it from any tight clusters and loosening the thread so it can be used.

It’s already been shown that active genes tend to be in more loosely packed, ‘open’ compartments of the nucleus compared with inactive genes, but little is known about how genes are organised into these different regions or how their location changes when they are switched on and off.

Understanding how this works at a molecular level is one of the most important challenges in biology, and it’s one that CRG senior group leader Thomas Graf wants to solve.

Into the fourth dimension

The story starts in 2014 when Graf and his colleagues at the CRG – Miguel Beato, Guillaume Filion and Marc A. Marti-Renom – began a major collaboration known as the 4D Genome (ERC Synergy Grant project), investigating how the organisation of DNA changes as genes are switched on or off.

Not only has the team been mapping the organisation of DNA in the nucleus of ‘resting’ cells, the researchers have also been developing ground-breaking new techniques to track changes in the three-dimensional structure of chromosomes in the nucleus over time (the fourth dimension) as cells shift from one type to another, whether temporarily or permanently.

This kind of transition is seen in development, as multipurpose embryonic stem cells gradually become specialised into particular tissues in the developing embryo or fetus. But this time Graf was particularly curious to see what happens when specialised cells ‘reverse’ back into stem cells – a process known as reprogramming.

“People had already compared nuclear organisation in specialised cells and stem cells, but they did not know how these changes occur over time,” says Graf, “We wanted to catch them in action, asking whether the organisation of the genome changes before or after genes are switched on during reprogramming.”

To wind the clock back, the CRG researchers used a variation on a technique developed by Nobel prize-winning Japanese scientists Shinya Yamanaka, who discovered that a cocktail of four proteins (OCT4, SOX2, KLF4 and MYC) could turn specialised cells back into stem cells. These impressive molecules are transcription factors, which bind to particular sites in the DNA close to the start of stem cell-specific genes and switch them on, reprogramming the cell back into a stem cell state.

Unfortunately, the method isn’t very efficient for many cell types. For example, only a very small fraction of immune B cells can be reprogrammed with these so-called Yamanaka factors. However, Graf and his team discovered that adding in another protein, known as C/EBP alpha, before the Yamanaka factors led to at least 95 per cent of B cells being converted back into stem cells over the course of eight days.

By taking samples of these cells every two days, the researchers could use their 4-D techniques to follow the changing organisation of DNA in the cells’ nuclei as they converted from B cells into stem cells.

Untangling the data

To find out how genes are re-arranged within the nucleus of the cells during reprogramming, Graf and his team used a method called Hi-C. This reveals whether specific regions of DNA are touching each other and reflects how loosely or tightly packed they are.

The team also gathered data on whether certain genes were switched on or off, as well as cataloguing the molecular marks (known as epigenetic modifications) that are associated with active or inactive genes. Much of the practical work was done by postdoctoral fellow Ralph Stadhouders, together with computational biologist Enrique Vidal.

The key to the project’s success was a new piece of software developed by Marti-Renom and his team, known as TADbit. It’s a bit like a ‘Google Earth’ for the nucleus, bringing together all the data to build a detailed map of how the DNA is organised in any part of the genome.

“In some ways generating the data is trivial – analysing it is the hard part and takes lots of time and computing power,” Marti-Renom says. “These experiments generate billions of pieces of data and need hundreds of thousands of hours of computing time, so our new software was absolutely key to make the analysis automatic and user-friendly.”

As might be expected, the researchers discovered that most of the genes that are turned on as the B cells become stem cells appear to move into more active compartments of the nucleus. Intriguingly, they found that this happens several days before the genes are actually switched on.

“The prevailing idea was that genes are switched on by the binding of transcription factors, such as the Yamanaka factors, and then they move into an active region of the nucleus,” explains Graf. “But we found that many genes moved first then were activated later. This was an unexpected but very exciting finding.”

Back to the beginning

Graf believes that these findings reveal a more important role for changing organisation in the nucleus than was previously thought, and also a potentially new function for transcription factors. Not only do they bind to DNA and switch genes on, he explains, but he thinks they may also have a separate, earlier part to play in unpacking the genome and moving genes into active regions of the nucleus.

“Once the transcription factors untangle the DNA and expose the genes, then it is easy to switch them on,” Graf says. “But now the big questions are how do they do it, who do they work with, and what is the engine that drives the reorganisation?”

He and his colleagues in the 4D Genome team are now searching for the molecules that work together with transcription factors to untangle and rearrange DNA. And the they also want to try and manipulate these interactions – whether by altering the DNA or by changing the proteins – to unpick the precise relationship between the four-dimensional changes they see in the nucleus and the resulting patterns of gene activity.

“We are learning the principles of cell fate decisions, and what we are seeing in our reprogramming system is a model for the processes that happen in an embryo,” Graf says. “I can’t wait to find out what is happening during the earliest days of life when the first pluripotent stem cells are born.”

Reference work

Ralph Stadhouders, Enrique Vidal, François Serra, Bruno Di Stefano, François Le Dily, Javier Quilez, Antonio Gomez, Samuel Collombet, Clara Berenguer, Yasmina Cuartero, Jochen Hecht, Guillaume J. Filion, Miguel Beato, Marc A. Marti-Renom & Thomas Graf.

“Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming”

Nature Genetics, 50:238–249 (2018), doi:10.1038/s41588-017-0030-7