R script for computation of G values

R script for computation of G values

This script allows one to take a set of pairwise 2D-SFS and compute G values for individual contigs from one pair of populations. All scripts are provided in their current state of development with no warranty that they will work in all cases, and only for reference purposes. They can be used and modified by citing the Author (see script header). Example files can be obtained from the Author upon request.

#this script was written by Ivan Scotti, INRA, URFM
#Site Agroparc, Domaine Saint-Paul
#84914 AVIGNON Cedex 9
#you are free to use and modify it, provided you quote the author
#for questions, please contact Ivan Scotti at
#ivan.scotti[at]paca.inra.fr
#15 July 2015
#
G2DcalcMultiPop<-function(sfs=multiPopSFS.list,allfreq=minAllFreq.df,firstPop,secondPop,sampleSize)
  {  
#allows one to comput 2D-SFS for a pair of populations out of a group
#of populations.
#NB this is for PAIRWISE SFS's, not for multi-pop SFS's
#the arguments are:
#allfreq, which defaults to minAllFreq.df
#sfs, which defaults to multiPopSFS.list
#both defaults are inherited from the fromArpToSFS() function
#firstPop and secondPop, the ordinal index of the populations
#sampleSize, the sample per pop
#
#EXTRACTING MULTINOMIAL PROBS BY CONTIG
#First, obtaining contingency tables by contig
#writing ad-hoc function
#NB coerce table to sampleSize by sampleSize
#2D-SFS tables are made from allele counts in each pop
#
#levels are coerced to 0:sampleSize (=nb of alleles)
#each table takes a vector form (?) so that the output is a matrix
#with each contig in a column
contigTable<-function(y){table(factor(y[,(firstPop+1)],levels=0:sampleSize),
    factor(y[,(secondPop+1)],levels=0:sampleSize))
  }
#applying by contig (using by())
#INDICES is the contig column in the MinAllFreq.df object:
#it must be written explicitly as an argument (this is normal)
contig2DSFS<-by(minAllFreq.df,FUN=contigTable,INDICES=as.factor(minAllFreq.df[,1]))
#
#implementing multinomial prob calculation for tables:
#the function here takes each element of the matrix-containing list,
#turns them into vectors and computes probabilities
#based on the expectation contained in the global (genome-wide) MAF 2DSFS
#this requires that the genome-wide SFS's name is HARDWIRED
#(I could change this, too)
#
#***IMPORTANT***
#the MAF2DSFS here must be obtained from the SIMULATED DATA
#that have been simulated based on the observed SFS, which
#is NOT USED ANYMORE here.
#first, the right element of multiPopSFS.list must be identified
pointer<-matrix(ncol=2)
for (i in 1:(nPop-1))
{
  for (j in (i+1):nPop)
  {
    pointer<-rbind(pointer,c(i,j))
  }
}
pointer<-pointer[-1,]
pointer<-cbind(pointer,1:nrow(pointer))
sfsIndex<-pointer[pointer[,1]==firstPop & pointer[,2]==secondPop,3]
multinomProbs<-function(X){dmultinom(x=as.vector(X),
                                     prob=as.vector(as.matrix(sfs[[sfsIndex]])))}
#notice that the function is run through a lapply()
#because the starting object is a LIST
ProbsGlobal.list<-lapply(X=contig2DSFS,FUN=multinomProbs)
#same as above, but computes probs relative to the
#LOCAL expectation
multinomProbsLocal<-function(X){dmultinom(x=as.vector(X),
                                          prob=as.vector(X))}
ProbsLocal.list<-lapply(X=contig2DSFS,FUN=multinomProbsLocal)
#computation of the statistics
Gvalues<<-2*(log(unlist(ProbsLocal.list)) - log(unlist(ProbsGlobal.list)))
#removing uninformative loci (must look at it more closely)
#GvaluesInformative<-Gvalues[which(CROPsProbsDenom.list!=1)]
hist(Gvalues)
# #not used: computation of theoretical P-values
# #P-values: the test must have Ne d.f.
# #(Ne=the number of non-empty cells in the genome-wide table)
# Ne<-length(sfs[sfs!=0])
# #provides the emirical P-value distribution
# #(for more precision, this must be cumulated with several other
# #simulations)
# #not used: vector of P values
# #as.vector(pchisq(q=Gvalues, df=Ne, ncp = 0, lower.tail = TRUE, log.p = FALSE))
as.vector(Gvalues)
}
#not used: test of G2calc
# firstSetOfGvalues<-G2Dcalc(allfreq=minAllFreq.df,sfs=MAF2DSFS)

Date de modification : 22 juin 2023 | Date de création : 16 juillet 2015 | Rédaction : Ivan Scotti