GF-IGF

What is GF-IGF

An R implementation of the Gene Frequency - Inverse Cell Frequency method for single cell data normalization (Gambardella et al. 2019). The package also includes Phenograph Louvain method clustering using RcppAnnoy library from uwot. The package also include data reduction with either Principal Component Analisys (PCA) or Latent Semantic Anlisys (LSA) before to apply t-SNE or UMAP for single cell data visualization.

Installing

From github

gficf makes use of annoy library in uwot. So you may have to carry out a few extra steps before being able to build this package like for uwot installation:

Windows: install Rtools and ensure C:\Rtools\bin is on your path.

Mac OS X: using a custom ~/.R/Makevars may cause linking errors. This sort of thing is a potential problem on all platforms but seems to bite Mac owners more. The R for Mac OS X FAQ may be helpful here to work out what you can get away with. To be on the safe side, I would advise building uwot without a custom Makevars.

if(!require(devtools)){
  install.packages("devtools") # If not already installed
}
devtools::install_github("dibbelab/gficf")

Phenograph Implementation Details

In the package gficf the function clustcells implement the Phenograph algorithm, which is a clustering method designed for high-dimensional single-cell data analysis. It works by creating a graph (“network”) representing phenotypic similarities between cells by calculating the Jaccard coefficient between nearest-neighbor sets, and then identifying communities using the well known Louvain method in this graph.

In this particular implementation of Phenograph we use approximate nearest neighbors found using RcppAnnoy libraries present in the uwot package. The supported distance metrics for KNN (set by the dist.method parameter) are:

  • Euclidean (default)
  • Cosine
  • Manhattan
  • Hamming

Please note that the Hamming support is a lot slower than the other metrics. It is not recomadded to use it if you have more than a few hundred features, and even then expect it to take several minutes during the index building phase in situations where the Euclidean metric would take only a few seconds.

After computation of Jaccard distances among cells, the Louvain community detection is instead performed using igraph implementation. All supported communities detection algorithm (set by the community.algo parameter) are:

  • Louvain (default)
  • Louvian with modularity optimization (c++ function imported from Seurat)
  • Louvain algorithm with multilevel refinement (c++ function imported from Seurat)
  • Walktrap
  • Fastgreedy