Lyngby e05: Unstable variances

Just some scattered notes this week! Not really sticking to my plan of posting on Wednesdays but I’m happy if I stick to more-or-less weekly.

Notes on variance stabilization

Data transformation is becoming a recurring nightmare on the blog (202401312031). I’m looking into a single cell RNASeq project and trying to figure out what typical preprocessing looks like. Not being particularly familiar with the terrain I picked some random tutorial and started following the steps. The instructions suggest two separate transformations, both of them in the name of variance stabilization. It is not clear to me why I would want to focus on preemptive variance stabilization. Some half-baked thoughts/observations:

According to the accepted answer to this CV Q, taking the square root of Poisson data (the classical variance stabilizing transformation) before doing a t-test is kind of pointless.
For a regression, if you want to rely on the usual iid normal assumptions, variance stabilization is necessary for more accurate p-values and confidence intervals.
But it is possible to model unequal variances (heteroskedasticity if you have a personal speech coach) directly so it’s not automatically the case that any variance should be stabilized.
Which q.v. the accepted answer to this related CV Q, which concludes that what kind of transformation you want really depends on what you want to do. “It depends” is a common answer to statistical questions.
I often take logs of data that span several orders of magnitude to make more readable visualizations. But then I’m not really doing it to stabilize any variances.

Naturally a tutorial is an outline of common techniques and you always have to think carefully about how it applies to your own case.

A stupid research assistant

Over lunch on campus we talked a bit about large language models, as you do. I’ve seen an observation online somewhere that ChatGPT is like having a very naive research assistant. This feels accurate to me: it just brings you whatever it has completely uncritically but it can be a nice starting point if you know what you’re doing.

My students don’t know what they are doing, or they wouldn’t be taking my course. I get the feeling that to some of them, the LLM is a knowledgable research supervisor and not a stupid research assistant. This is bad for a variety of reasons, one of which is that the LLM has no doubt, no concept of being wrong. Probably because it was trained on Reddit comments.

Backlinks:

(202401171334) Letters from Our Lyngby Correspondent

this file last touched 2024.02.25