Example use cases of running DNENRICH
-
If we would like to calculate enrichment of the nonsynonymous de novo mutations (from schizophrenia patients) in postsynaptic gene sets (mirroring the analysis peformed in Table 1 of this paper), we will use:
program Options Number of
PermutationsGene alias file Gene size matrix Gene Set Mutation list Background list (optional) dnenrich . 1000 alias.txt refseq_gene_sizes.txt kirov.set Fromer.denovos.NS.mut NOT USED HERE Note that Fromer.denovos.NS.mut can be extracted from Supplementary Table 1 of the above paper.
On the command line, this looks like:
dnenrich . 1000 alias.txt refseq_gene_sizes.txt kirov.set Fromer.denovos.NS.mut > res-kirov-SCZ-NS
Formatting with the extractDnenrichResults.csh script (see here):
extractDnenrichResults.csh res-kirov-SCZ-NS > res-kirov-SCZ-NS.txt
The file res-kirov-SCZ-NS.txt contains assessments of enrichment for each of the 17 postsynaptic and neuronal gene sets defined in the Kirov, et al. 2012 paper.
Since we see that the p-values are often very close to 1 in 1000, we would in fact need to run more permutations to get a better estimate of the true p-value. -
And, to estimate significance of recurrence of nonsynonymous mutations for the same schizophrenia mutations (both in aggregate and for each gene, mirroring the analysis peformed in Extended Data Table 2 of this paper), we first need to extract out all genes and make gene sets corresponding to all genes as one set ("ALL") and each gene as its own set. This will allow us to test gene-level recurrence in aggregate (within the "ALL" set) and for each gene individually:
cat refseq_gene_sizes.txt | awk '{if (NR==1) {for (i=3; i<=NF; i++) {split($i,genes,"+"); for (g in genes) {GNS[genes[g]]++}}; for (g in GNS) {print g}} else {exit}}' | sort | awk 'BEGIN{OFS="\t"} {gene=$1; print gene,"ALL",1; print gene,gene,1}' > all_genes.set
Then, run dnenrich:dnenrich . 1000 alias.txt refseq_gene_sizes.txt all_genes.set Fromer.denovos.NS.mut > res-genes-SCZ-NS
And, then format the results (using the "--recurrence 2" option to extract significance for recurrence of genes hit 2 or more times by mutation):extractDnenrichResults.csh --recurrence 2 res-genes-SCZ-NS > res-genes-SCZ-NS.txt
The 20 output rows in res-genes-SCZ-NS.txt contain:- File header (first row)
- Significance of recurrence for 18 genes with 2 (or more) NS mutations [e.g., SET is labeled "BAIAP2>=2"]
- Significance of aggregate recurrence for "ALL" genes with 2 (or more) NS mutations [set is "ALL>=2"]