Email updates

Keep up to date with the latest news and content from Plant Methods and BioMed Central.

Open Access Software

ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

Jose M Muiño12*, Kerstin Kaufmann3, Roeland CHJ van Ham12, Gerco C Angenent4 and Pawel Krajewski5

Author Affiliations

1 Applied Bioinformatics, Plant Research International, PO Box 619, 6700 AP Wageningen, The Netherlands

2 Netherlands Bioinformatics Centre, PO Box 619, 6700AP Wageningen, The Netherlands

3 Laboratory of Molecular Biology, Wageningen University, PO BOX 8128, 6700 ETPB Wageningen, The Netherlands

4 Bioscience, Plant Research International, PO Box 619, 6700 AP Wageningen, The Netherlands

5 Institute of Plant Genetics, Polish Academy of Sciences, 60-479 Poznań, Poland

For all author emails, please log on.

Plant Methods 2011, 7:11  doi:10.1186/1746-4811-7-11

Published: 9 May 2011

Abstract

Background

In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology.

Results

We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate.

Conclusions

The software ChIP-seq Analysis in R (CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.