Molly Shook is a Cincinnati Children’s Hospital’s Center for Autoimmune Genomics and Etiology (CAGE) researcher.

She is a leading author of a recent genetics and Eosinophilic Oesophagitis (EoE) publication

We spoke to Molly about her study of the genetic basis of allergic and autoimmune diseases, including Eosinophilic Oesophagitis.

About Molly Shook

I’m a member of a combined research group headed by two principal investigators, Leah Kottyan and Matt Weirauch.

We study the genetic basis of allergic and autoimmune diseases, including EoE. 

I joined the lab in 2022 as a research associate. My previous research background was studying epigenetic mechanisms in plants and honey bees. 

As it turns out, many of the genetic mechanisms underlying disease in humans are conserved in other organisms. So scientists use these organisms as “model systems” for understanding the basics of how DNA is packaged into a structure called chromatin that influences genome function. And how proteins called transcription factors bind to the DNA and regulate gene expression. 

When I joined the Kottyan lab, I had a background in studying these molecular processes that I wanted to apply to human disease. 

So, I am relatively new to the study of human allergic diseases, but I have been studying genomics for many years, going back to when I was an undergraduate in 2004. 

What inspired this study, and what are its principal objectives? 


Key takeaway: The study utilised a novel approach with "barcodes" to track how genetic variants affect gene expression, specifically targeting EoE. This method helped identify which genetic changes are likely contributing to the disease, providing a clearer understanding of EoE's genetic underpinnings.


Previous research studies in the field of genome-wide association studies (GWAS) laid much of the foundation for this study.

GWAS recruits cohorts of EoE patients and control individuals without EoE to tease out what differences in affected individuals’ genomes contribute to the risk of developing EoE. 

By genotyping numerous individuals with and without the disease, the association studies can identify statistical associations between genomic regions called “risk loci” and EoE.

GWAS is a potent approach, but it’s subject to the limitation. It can only associate stretches of DNA with a disease, and it cannot pinpoint specific differences within those important DNA regions for the disease. 

When we think about mechanisms underlying disease, we’re interested in identifying precise positions in the genome that may affect the way regulatory molecules in the cell work. 

For example, we know that sometimes single-letter changes in the genetic code can change how proteins called transcription factors bind to DNA, affecting gene expression. 

Our study uses a cutting-edge tool called a massively parallel reporter assay to systematically interrogate the small differences in the DNA at EoE risk loci and identify those important for regulating gene expression. 

What the study is about  


Key takeaway: Out of 531 genetic variants examined, only about 32 were found to have a significant impact on gene expression related to EoE. This precise identification helps narrow down the focus to the most relevant genetic factors contributing to EoE, making future research and potential treatments more targeted.


genetic study scheme on eosinophilic diseases

A genome risk loci map for Eosinophilic Oesophagitis which sheds light on the genetics-dependent biology that increases a person’s disease risk. Cincinnati Children's Hospital, CAGE, 2023.

Our approach uses random DNA sequences 20 letters long, which we call “barcodes,” to track the effects of genetic variants on gene expression. These barcodes work very similarly to the ones you use regularly at the grocery store. 

When you go shopping and ring up a barcode on an item’s package, it tells the checkout system that you are buying a specific brand of crackers or a particular flavour of seltzer water. 

Our technology attaches molecular barcodes to a reporter gene, and each barcode is associated with a particular genetic variant. 

So when the cell expresses the reporter gene, each reporter gene RNA is tagged with a barcode that allows us to associate its expression with a particular variant. 

In this way, we can start with a long list of 531 variants spread over the EoE risk loci and identify specific ones that increase the expression of the reporter gene. 

And we’re even able to identify variants that do this in a genotype-dependent manner. So hypothetically, if you have a cytosine nucleotide in a specific position in your genome, that allele may drive much higher gene expression than an alternative base such as a guanine. 

Overall, this study gives us a much clearer understanding of how each genetic variant contained within the EoE risk loci might contribute to the molecular mechanisms that play a role in the disease. 

What are the key findings of the study? 


Key takeaway: The research confirmed the involvement of known genes like TSLP, important for immune signalling in EoE, and discovered new genes for further investigation, expanding our understanding of the genetic landscape of EoE.


With this study, we were looking for needles in a haystack. Starting with a long list of genetic variants we thought could be important for EoE, we wanted to narrow it down to a much smaller set with genotype-dependent regulatory activity. 

One of the major findings is that out of the 531 variants spread across all of the EoE risk loci included in the study, only about 32 are variants with alleles that we expect to be working differently to regulate gene expression. This conclusion comes from observing how these alleles distinctly influence the expression of a reporter gene in a massively parallel reporter assay.

That helps us zoom in on the specific variants that are most likely to contribute to the development of EoE.

We could also take those 32 variants and link them up with genes we thought they might be regulating. There are a few ways to do this, but one is by drawing on a large dataset of “expression quantitative trait loci,” or eQTLs. 

These genetic variants are statistically associated with the expression level of a gene. By looking at this existing eQTL dataset, we were able to annotate our set of 32 genotype-dependent variants with likely associated genes. 

This approach identified genes previously known to play critical roles in EoE, including TSLP, an important molecule in immune signalling pathways.

But it also uncovered some new genes that have potential for further investigation, including many genes at the HLA locus, which is also linked to other allergic diseases such as asthma. 

Lastly, we used algorithms to find transcription factors—proteins that bind DNA and regulate gene expression. 

We found specific transcription factors that might be part of regulatory programs involving these 32 prioritised variants. 

One of these proteins is a factor called USF1, which is fascinating because it’s associated with a variant likely regulating multiple genes on chromosome 16. 

By manipulating proteins and DNA in a test tube, I was able to show that the USF1 protein specifically binds one allele of this variant and not the other allele. 

It’s great to start with a large dataset like a massively parallel reporter assay and gain mechanistic insights into specific proteins and variants.

What do these findings mean for patients and healthcare professionals now and in the future? 


Key takeaway: Understanding the genetic mechanisms opens up possibilities for developing more effective treatments targeting the molecular causes of EoE rather than just managing symptoms.


The GWAS isn’t a clinical study, so I think it’s essential to put it into context to discuss how it’s moving the field forward. 

The genetics of EoE are hugely complex; there’s no single gene that scientists can point to and say, “That’s causing EoE!” 

Instead, EoE arises from interactions between environmental and genomic risk factors that we’re still working to understand fully. 

The GWAS studies are an important first step toward identifying genomic loci associated with the disease. This study takes us further toward understanding what is happening at those loci. 

Now, we can say that for many of the risk loci, we’ve identified single-letter changes in the DNA that are likely contributing to the disease by causing changes in gene expression. 

This is important because many current therapies for treating EoE, such as topical corticosteroids or proton pump inhibitors (PPI), target reducing the inflammation associated with EoE. 

However, understanding the actual molecular pathways causing that inflammation may allow for the development of more effective therapies in the future. 

TSLP is a notable example of a potential target for future biologic therapies that is supported by an abundance of research.

We know, for example, that elevated TSLP expression is associated with changes in the expression of other genes in the oesophagus of people affected by EoE. 

Our study identifies a genetic variant that appears to be regulating TSLP expression in an oesophageal cell line. 

Building upon existing knowledge with further genetic research expands the pool of potential drug targets for future investigation. 

What questions remain unanswered? 


Key takeaway: The study lays the groundwork for further research, including the need to experimentally validate the computational predictions and explore how these genetic variants and their regulation of gene expression contribute to EoE. This could lead to a better understanding of EoE and the development of targeted therapies.


There is still more work to do to translate the new insights from our project into a better understanding of the mechanisms underlying EoE. 

We used computational approaches to predict which transcription factors are binding to these important regulatory variants we discovered. 

But we still haven’t experimentally validated many of those predictions. One approach that can be used as a follow-up is to “knock out,” or eliminate these transcription factors in cells, then use the massively parallel reporter assay to see how many of our variants regulate gene expression depending on those factors. 

We can also test whether our regulated gene predictions are correct by using genome editing approaches to introduce a specific allele of a variant and then using RNA sequencing to see how that allele changes the expression of our predicted genes. 

This might be particularly interesting to try with some variants regulating multiple target genes. 

So, there is a lot that we can do to validate the conclusions of our paper experimentally. 

Read more on the research page


Listen to the EOS Network Podcast episode featuring Molly Shook: