geInteraction¶

Options¶

Option	Description	Argument
--samplesList	File with a column named "sample" listing samples names. Additional TSV columns will be used to annotate the output figures. "field"_COLOR columns are used to map colors to the additional fields [required]	[char]
--gipOut	GIP output directory [default gipOut]	[char]
--outName	Output name [default gipOut/sampleComparison/geInteraction]	[char]
--chrs	Chromosomes to use. If “NA” it uses the same chromsomes as GIP [default NA]	[char …]
--minMAPQ	Remove genes with MAPQ < –MAPQ [default 0]	[int]
--minDelta	Min normalized coverage delta between samples [default 1]	[double]
--minMaxCov	Use only genes with normalized coverage >Value1 or <Value2 in at least one sample. If "NA" no filter is applied [default NA]	[num num]
--rmNotSigGenes	Use only genes with significant coverage in at least one of the samples
--heatmapType	Gene normalized coverage value transformation used for the CNV vs samples heatmap. [default scaled]	[scaled \| log10 saturated \| flatten]
--covSaturation	Gene normalized coverage saturation value. DEPENDENCY --heatmapType "saturated" or "flatten" [default 3]	[int]
--quantileSaturation	Provide two numbers. Saturate the colors of the gene CNV vs samples heatmap for quantiles < num1 or > num2 DEPENDENCY --heatmapType "scaled" or "log10"” [default 0 1]	[double double]
--doNotClusterSamples	Do not cluster heatmap columns. Show the samples in the same order as in --samplesList
--clusteringMethod	Heatmaps clustering method [default complete]	[ward.D2\|ward.D single\|complete average\|mcquitty median\|centroid]
-cutree_cnv	Based on the hierarchical clustering, divide the genes in this number of clusters [default 1]	[int]
--cutree_samp	Based on the hierarchical clustering, divide the samples in this number of clusters [default 1]	[int]
--show_geneNames	Show gene names in the heatmaps
--show_sampNames	Show sample names in the heatmaps
--cnvPlotDim	CNVs vs samples heatmap file height and width values [default 11 6]	[double double]
--corPlotDim	CNVs vs CNVs heatmap file height and width values [default 11 11]	[double double]
--lolPlotDim	Lollipop plot file height and width values [default 7 4]	[double double]
--kmeansClusters	Use this number of k-means clusters for clustering. If "NA" use mclust [default NA]	[int]
--MCLinflation	Use this MCL inflation value for clustering. Higher inflation values result in increased cluster granularity. If "NA" use mclust [default NA]	[int]
--MCLexpansion	MCL expansion value. DEPENDENCY --MCLinflation not "NA" [default 2]	[int]
--clMaxSDdist	Gene CNVs with distance from the cluster centroid > --clMaxSDdist standard deviations from the mean distance are removed from the cluster. High values make this filter unffective. [default Inf]	[double]
--clMinSize	Min number of members in a cluster [default 2]	[int]
--edgesMeanCorFilter	NETWORK. Remove edges representing CNV correlation scores lower than the mean absolute CNV correlation
--edgesPvalueFilter	NETWORK. Remove edges with adjusted pvalue below this threshold [default 0.1]	[double]
--debug	Dump session and quit
-h, --help	Show help message

Description¶

The geInteraction module aims at detecting CNV genes across multiple samples and identifying gene interactions using a correlation-based network approach.

The algorithm steps:

Load the GIP files with the gene sequencing coverage values (.covPerGe.gz files) of all samples,
Select CNV genes. These are defined as the genes with a normalized coverage variation within the sample set greater than –minDelta.
Compute all-VS-all gene coverage correlation
Compute correlation clusters (cc) using one of the clustering algorithms: mclust (default), kmeans (--kmeansClusters), MCL (--MCLinflation).
Optionally remove CNV genes belonging to small cc (--clMinSize), or placed at a significant distance from the cluster centroid. To do that, for each cluster the module measures the centroid, the mean euclidian distance and the standard deviation. Cluster members whose distance from the centroid is greater than --clMaxSDdist standard deviations from the mean are removed.
Generate the gene normalized coverage heatmap (“.CNV.pdf”) and table (“.CNV.xlsx”). The heatmapType parameter has 4 options. If "scaled" values are first centered subtracting the mean gene normalized coverage across samples, then scaled dividing by the standard deviation. If "log10" values are log10 transformed. If "saturated" values are saturated at --covSaturation. If "flatten" values are first subracted by the min gene normalized coverage across samples, then saturated at --covSaturation. The latter visualization option is useful to appreciate coverage variations of genes that are highly amplified in all samples.
Plot the all-VS-all correlation heatmap (“.corr.pdf”) and table (“.corr.xlsx”). The plot file include also a line plot showing the scaled normalized gene coverage of genes in each cc across samples.
Produce PCA scatterplots and standard deviation and entropy histograms as general descriptors of detected CNVs (“.overview.pdf”).
Compute static and interactive correlation networks based on all-VS-all CNV absolute correlation. The network nodes represent gene CNVs and the edges the absolute correlation value. The higher the correlation the closer the nodes. Edges colors indicate whether the correlation between gene pairs is positive or negative. The color of the nodes reflect the network clusters (nc) computed with either of the clustering algorithms. The same options used to select the cc clustering method (--kmeansClusters and --MCLinflation) and the cc filters (--clMinSize and --clMaxSDdist) apply also to nc. Between cc and nc there are two important differences. The first is that cc are based on pearson correlation values (i.e. including both positive and negative scores), while nc are based on the absolute correlation scores. The difference second is that cc quality remove CNV genes from all results, while nc filters will impact only the network plot and tables.

Example¶

From the GIP worked example folder execute

giptools geInteraction --samplesList samplesMetaData

This will generate the geInteraction output files in the gipOut/sampleComparison folder.

The geInteraction module requires to specifty the --samplesList parameter providing a tab separated file where the first column is the list of the sample names to be processed. Optionally columns can be passed with additional sample meta data information (e.g. drug resistance, geographic origin, operator) and the colors to by assigned to each feature. If no color is provided this will be assigned randomly. In this example the sampleMetaData file is this. The output of this module consists of eight files.

The geInteraction.CNV.pdf file includes a heatmap showing the normalized coverage of the detected CNV genes. The default is scaling the normalized coverage values but other data transformations are possible (see above). The --cutree_samp and --cutree_cnv can be used to split the heatmap at the sample (columns) and CNV (rows) levels respectively.

The figure produced in this example is the following:

The geInteraction.overview.pdf file includes multiple plots. The first plot represents the PCA analysis of the samples based on detected gene CNVs. Supplementary plots are produced for each additional meta data field. In these plots the samples are colored by the meta data information. The last plot represents two histograms showing respectivelly the standard deviation and the entropy of the gene CNV normalized coverage. The PCA plot in this example is the following:

../_images/geInteraction.overview.PCA.png

The geInteraction.corr.pdf file reports the all vs all gene CNV correlation heatmap. The --cutree_cnv option can be used to split the CNVs (both on the columns and rows) in different groups.

The geInteraction.lolli.pdf file demonstates for each gene CNV (rows) the most negative correlation (left side, pink), the median correlation (black dot), and the most positive correlation (right side, green) values measured among the gene CNVs. The gene CNV order is the same as the one in the all vs all gene CNV heatmap.

The geInteraction.network.pdf file reports the gene CNV correlation network, where the nodes represent the genes, the edges the correlation values, and the color of the edges the correlation direction (positive or negative). The nodes are colored according to the predicted clusters. Multiple clustering methods are offered. For instance addind to the command line the option --kmeansClusters 3 returns the following plot:

The geInteraction.network.d3.html is a D3 interactive vidualization of the network. While the network layout may be slightly different than the static visualization (due to the differences between the tools used to generate the two), the node clusters and the overall shape are the same.

The geInteraction.CNV.xlsx includes thee spreadsheets:

sampleInfo. This is a copy of the provided sample meta data showinf the features colors and reporting the sample branch group assignment in the geInteraction.CNV.pdf heatmap.
cnvInfo. This table includes the relevant statistics measured for the detected gene CNVs, including the most positivelly and negativelly correated genes partners, the gene CNV branch group and cc assignment in the geInteraction.CNV.pdf and geInteraction.corr.pdf heatmaps.
normGeneCoverage. This table includes the normalized gene coverage across the samples of interest.

The data in each spreadsheed is sorted the same way as the geInteraction.CNV.pdf heatmap.

The geInteraction.corr.xlsx includes a different spreadsheet for each predicted cluster correlation group (cc). Each of them reports the gene members, their functions (if available) and the all vs all correlation values. The latter is sorted as to reflect the geInteraction.corr.pdf plot.

The geInteraction.network.xlsx includes a different spreadsheet for each predicted network correlation group (nc). Each of them reports the gene members, their functions (if available) and the all vs all correlation values. The last spreadsheet reports the list of genes filtered from the network (if any).

geInteraction¶

Options¶

Description¶

Example¶

Navigation

Related Topics