Skip to content

Batch correction introduces negative numbers to adata.X

The batch correction methods mnn and combat alter adata.X by changing the data type to a not sparse numpy (disk space issue?) array and introducing negative values (downstream issue). This can cause problems in downstream analysis. In my case, this broke group marker filtering as group_fractions require positive values i.e. count values to compute group fractions. The end of it was that marker filtering quietly didn't calculate group fractions (considering nothing as expressed) causing everything to be filtered, which finally failed when trying to do a marker plot without markers.

  1. we need checks/ warnings/ errors that tell when group fractions are not computed.
  2. How do we fix batch correction? Changing the datatype back to something sparse shouldn't be a big issue but what about the negative values? Can we translate everything to positive values without changing the embedding or anything downstream? Should we change all downstream functions to automatically search for a raw data layer (e.g. adata.layers["raw"]) or at least accept another layer?

I'm surprised that this didn't come up earlier. Does no one use mnn or combat? Anyway, we need to discuss this.