Computation of all EMD between pairs of flowFrames belonging to a flowSet. This method provides two different input modes:

  • the user provides directly a flowSet loaded in memory (RAM).

  • the user provides (1.) a number of samples nSamples; (2.) an ad-hoc function that takes as input an index between 1 and nSamples, and codes the method to load the corresponding flowFrame in memory; Optional row and column ranges can be provided to limit the calculation to a specific rectangle of the matrix. These i.e. can be specified as a way to split heavy calculations of large distance matrices on several computation nodes.

pairwiseEMDDist(
  x,
  rowRange = c(1, nSamples),
  colRange = c(min(rowRange), nSamples),
  loadFlowFrameFUN = NULL,
  loadFlowFrameFUNArgs = NULL,
  channels = NULL,
  verbose = FALSE,
  BPPARAM = BiocParallel::SerialParam(),
  BPOPTIONS = BiocParallel::bpoptions(packages = c("flowCore")),
  binSize = 0.05,
  minRange = -10,
  maxRange = 10
)

Arguments

x

either a flowCore::flowSet, or the number of samples (integer >=1)

rowRange

the range of rows of the distance matrix to be calculated

colRange

the range of columns of the distance matrix to be calculated

loadFlowFrameFUN

the function used to translate a flowFrame index into a flowFrame. In other words, the function should code how to load a specific flowFrame into memory. Important: the flow Frame index should be the first function argument and should be named ffIndex.

loadFlowFrameFUNArgs

(optional) a named list containing additional input parameters of loadFlowFrameFUN()

channels

which channels (integer index(ices) or character(s)):

  • if it is a character vector, it can refer to either the channel names, or the marker names

  • if it is a numeric vector, it refers to the indexes of channels in fs

  • if NULL all scatter and fluorescent channels of fs #' will be selected

verbose

if TRUE, output a message after each single distance calculation

BPPARAM

sets the BPPARAM back-end to be used for the computation. If not provided, will use BiocParallel::SerialParam() (no task parallelization)

BPOPTIONS

sets the BPOPTIONS to be passed to bplapply() function. Note that if you use a SnowParams back-end, you need to specify all the packages that need to be loaded for the different CytoProcessingStep to work properly (visibility of functions). As a minimum, the flowCore package needs to be loaded. (hence the default BPOPTIONS = bpoptions(packages = c("flowCore")) )

binSize

size of equal bins to approximate the marginal distributions.

minRange

minimum value taken when approximating the marginal distributions

maxRange

maximum value taken when approximating the marginal distributions

Value

a distance matrix of pairwise distances (full symmetric with 0. diagonal)

Examples


library(CytoPipeline)

data(OMIP021Samples)

# estimate scale transformations
# and transform the whole OMIP021Samples

transList <- estimateScaleTransforms(
    ff = OMIP021Samples[[1]],
    fluoMethod = "estimateLogicle",
    scatterMethod = "linearQuantile",
    scatterRefMarker = "BV785 - CD3")

OMIP021Trans <- CytoPipeline::applyScaleTransforms(
    OMIP021Samples,
    transList)

# calculate pairwise distances using only FSC-A & SSC-A channels
pwDist <- pairwiseEMDDist(
    x = OMIP021Trans,
    channels = c("FSC-A", "SSC-A"))