Computational Genetics Summer Institute 2022

Los Angeles: place of palm trees, endless beaches, and Holywood stars! But also, the place for one of the most inspiring experiences a PhD student can wish: CGSI 2022.

For a total of 23 days in July in 2022, CGSI (Computational Genetics Summer Institute) consists of a retreat, two short programs and a middle week, chuck full of science, networking and fun.

My colleague Lianyun and I had the pleasure of participating in the long programme which started with the three-day retreat to Big Bear Lake. Here, talks every morning and afternoon social activities created the perfect relaxed atmosphere to network with fellow students and faculty.

The retreat created a close-knit group that was housed in a frat-house on the UCLA campus for the remainder of the program. Paired up into double rooms we got a glimpse into the college experience, and may have partied a bit.

Then, the first short program started. This week-long program filled with talks and networking was intense, there were many additional students and faculty joining this week and we ourselves were invited to give chalk-talks in daily affinity groups. Please listen to a voice recording and transcript of the essence of Joliens Chalk Talk at CGSI here, or have a look at the text.

Jolien Chalk Talk Audio

Jolien Chalk Talk Text

I work on dissecting the highly heterogeneous phenotypye that is Major Depressive Disorder (MDD). As my colleague Lianyun explained yesterday, the current standard in the Diagnostic and Statistical Manual version 5, or DSM-V, contains a list of 9 sypmtoms for Major Depressive Disorder. To obtain the diagnosis, someone is required to get 5 out of 9 symptoms with the restriction that they need either or both symptoms 1 and 2.

Let’s imagine one person who has the following 5 symptoms:
1. depressed mood
2. sleeplessness
3. weightloss
4. fatigue
5. suicidal

Let’s imagine a second person who has:
1. a loss of interest in things they use to enjoy (anhedonia)
2. oversleeping
3. weight gain
4. psychomotor agitation
5. guilt.

These two people are very different. In fact, they don’t share a single symptom. But they both get the same major depressive disorder diagnosis. So in total, using these symptoms as it is at the moment there are more than 250 combinations possible to obtain the same diagnosis.

Clearly major depressive disorder is very heterogeneous phenotype to work with. Doing research on such a complex phenotype is very hard. Most importantly, we don’t have a way to tell these people apart. We have tried with things like melancholic depression, or early onset depression. But the diagnosis of these is also highly heterogeneous. There’s a lot of disagreement between practitioners on whether someone has depression or not and this goes along with misdiagnosis and missing diagnoses. This means that people who should be getting a treatment are not, and people who have different disorders, say ADHD, are being treated for the wrong disorder. And even if you do get correctly diagnosed, because we can’t properly research such a complex disorder, patients are left to find treatment on a trial and error basis. And even then treatment might not work for you. There is a whole subtype in literature, that’s called treatment resistant, MDD where they tried everything, but it’s simply did not work. This is why the search for subtypes in depression is so important. We need a proper diagnosis that is as clean cut as possible to better understand treatment responses and properly and to better help the affected individuals.

But if we don’t have any subtypes that work, where do we start? Well, in our case. We kind of have to use what we have. So we use the current model even though it’s bad. But we hypothesized that the MDD diagnosis from this manual is actually a mixture of symptomatic pathways Leading to the diagnosis. So the individuals I mentioned before with the different symptom profiles, we now see these as pathways towards MDD.

How then do we find proof for the existence of these pathways? Well, I use a method by (our collaborator) dr. Andy Dahl that’s called Coordinated Epistasis(CE). The concept of CE is that we can partition the genome randomly under the infinitesimal model assumption, and thereby somewhat correctly, partition the genome into the pathways that we need. We then create polygenic risk scores (PRS) of those pathway partitions. And we can create an additive model where these Polygenic risk scores don’t interact. Or we create a model where they do interact. We then compare these two models with a log-likelyhood-ratio test to see whether either of them predicts the phenotype better. If they are significant, so if the interaction model predicts better, then we can have a look at the interaction effect coefficient. And that coefficient will tell us if it’s an antagonistic or synergistic interaction.

The way that we use this is that if we find significant negative interaction, subtypes exist, because the pathways are reducing each other’s power on the phenotype compared to when they individually act on it. First we search for any interaction in the phenotype so we make per-chormosome PRSs and interact those with each other. And we find significant interaction. We do this with the symptoms as well and we find significant interactions there as well. Now what was surprising to us is that if you do the even odd split, where you take 11 odd chromosomes and 11 even chromosomes into your PRS and you interact those with each other, you will get different gamma estimates then if you take the chromosome-specific PRS and interact those with each other.

To investigate this we make a hundred random partitions like the even odd partition, but we take 11 random chromosomes for both PRSs and make interaction models for them. Repeating that 100 times means that we get a hundred gammas. Which then make a distribution of gammas.

At the moment, we’re not quite certain what that means. So we are actively investigating the effects that act upon this distribution. And also whether the chromosome PRS is have a distribution themselves. And how those distributions align with each other.

A great thing about CGSI is that the talks are available on youtube afterwards! From the first-week talks, I highly recommend the interested reader to listen to these three talks that were most memorable and interesting to me: Jon November on Sardinian population history, Jonathan Flint on the Genetics of Depression and Melissa Gymrek on analyzing tandem repeats.

John November

Jonathan Flint

Melissa Gymrek

The second week, or middle week, slowed down a bit. There were fewer talks and the Wednesday was free to let us recover, recharge and enjoy the city. One such activity was a concert at the Hollywood bowl, a yearly tradition.

From talks that were given this week I recommend listening to Dr. Na Cai speak about phenotype imputation to optimize power in biobanks, Dr. Nick Mancuso if you’re up for a mathematical challenge and Rayan Chickhi for his engaging explanations and pretty graphs.

Na Cai

Nick Mancuso

Rayan Chikhi

Last but not least, the second short program: the last week of the summer school. This week included some more networking, and social events similar to those of the first week. And ofcourse a few more talks. My two top picks here are by Dr. Eimar Kenny speaking on population genetics and our responsibility to make methods unbiased, and Bogdan Pasanuic on using large-scale biobank data.

Eimar Kennu

Bogdan Pasanuic

The exposure to so much science from different areas within computational genetics was enlightening. I remember reading a paper afterwards in record time, simply because I could better understand how science communicated. Also, making research friends through networking made for life-long connections with peers.

I can’t do anything else than highly recommend the experience to all PhD students within the field and thank the organizers for inviting Lianyun and me to participate.