Introduction to GIP¶
Distribution¶
GIP is a tool for scientific investigation, compatible with Linux and OS X systems and distributed as a self-contained package.
GIP consists in 3 files:
gip: The Nextflow pipeline
gip.config: The configuration file
giptools: The Singularity image
All the (many) required software dependencies already are already embedded in the Singularity container.
Thanks to the Nextflow implementation GIP can be executed on a local machine (default), on cluster resource manager or the cloud.
Why using GIP?¶
GIP is a fully automated pipeline requiring minimal configuration.
While GIP can be used for batch computation of large WGS data sets, the minimum required input is just (i) a paired-end whole genome sequencing data set (.fastq files) and (ii) a reference genome assembly (FASTA file).
GIP is flexible by design, which means that it does not include any built-in hardcoded parametrization (e.g. number/names of chromosomes, centromer position..) limiting its use to higher eukaryotes (e.g. human, mouse) or model organisms in general.
GIP is particularily adapted to the genome analysis of non-model organisms such as Leishmaina.
GIP can be used to explore the genome instability of biological systems exploiting genome instability for adaptation (Leishmania, Candida, Cancer) through frequent DNA dosage variations (e.g. chromosome aneuploidy, gene CNVs) or SNVs.
GIP analyses overview¶
Prepare the genome
Map the reads
Evaluate chromosome copy number
Evaluate gene and bin copy number
Identify and visualize copy number variation wrt the reference genome
Identify and measure gene clusters
Identify SNVs and structural variants
Generate a report file providing summary statistics, tables in exel, figures
Once GIP execution completed, the user can execute giptools to compare samples sub-sets and highlight chromosome copy number, gene copy number and SNV differences.