The VIRCOL website

=The LHAPDF home page= Theory = LHAPDF ready MC's = Downloads = online manual =

Contents

1.  A Mathematical framework
2.  Random Sampling Method
3.  Lagrange Multiplier Method
4.  Covariance Matrix Method
5.  Offset Method

1  A Mathematical framework

At first sight PDF fitting is rather straightforward. However, a more detailed look reveals many difficult issues. As the PDF uncertainties will affect all areas of phenomenology at hadron colliders, a clear mathematical framework of a PDF fit is essential [1]. From this formulation, all explicit methods can be derived. Also, the mathematical description will make explicit all assumptions needed before one can make a fit. These assumptions are crucial and do not find their origin in experimental results, but rather in theoretical prejudice. Such assumptions are unavoidable as we fit a system with an infinite number of degrees of freedom (the PDF functional) to a finite set of data points.

We want to calculate the probability density function POpdf which reflects the uncertainty in predicting the observable O due to the PDF uncertainties. The function POpdf(xe) gives the probability density to measure a value xe for observable O.

To calculate the PDF probability density function for observable O we have to integrate over the functional space of all possible PDFs V(F). The integration is weighted by three probability density functions: the prior probability density function, Pprior, the experimental response function of the observable, PexpO and the probability density function of the fitted experiments, Pexpinput. The resulting formula is given by
PpdfO(xe) = ó
õ


V(F) 
F Pprior(F)×Pexpinput(F)×PexpO(xe|xt(F)) .
(1)

The prior probability density function contains theoretical constraints on the PDF functionals such as sum rules and other potential fit constraints (e.g. an explicit aS(MZ) value). The most crucial property of the prior function is that it defines the functional integral by imposing smoothness constraints to make the number of degrees of freedom become finite. The simplest example is an explicit finite parametrization of the PDF functionals
PpdfO(xe) = ó
õ


V({l}) 
l1 d l2¼ d ln Pprior({l})×Pexpinput({l})×PexpO(xe|xt({l})) ,
(2)

where the PDF parameters are given by the list {l}. Note that through the functional parametrization choice we have restricted the integration to a specific subset of potential PDF functionals F. Such a choice is founded on physics assumptions with no a priori justification. The resulting phenomenology depends on this assumption.

The experimental response function forms the interface with the experimental measurement. It gives the probability density to measure a value xe given a prediction xt. The experimental response functions contain all information about the experiment needed to analyze the measurement. The perdiction xt is an approximation using a perturbative estimate of the observable amended with relevant nonperturbative corrections. A simple example would be
PexpO(xe|xt ) =  1

d
Ö

2p
exp æ
è
-  1

2

(xe-xt)2/d2 ö
ø
 ,

where we have a Gaussian response function with a one sigma uncertainty d. Note that the form of the response function depends on the actual experiment under consideration. It is sometimes convenient to get a result that is independent of an experiment and its particular detector: To obtain the theoretical prediction for the probability density function of the observable, one can simply replace the experimental response function by a delta function (i.e. assume a perfect detector)

Pexptheory(xe|xt) = d(xe-xt) .

Finally the probability function for the fitted experiments is simply a product of all the experimental response functions
Pexpinput(F) = Nexp
Õ
i 
Pexp(i)(xm(i)|xt(i)(F))  ,
(3)

where xm(i) denotes the set of measurements provided by experiment (i). This function measures the probability density that the theory prediction based on PDF F  describes the combined experimental measurements.

Often the experimental uncertainties can be approximated by a Gaussian form of the experimental reponse function, i.e. a c2 description of the uncertainties:
PpdfO(xe) µ ó
õ


V({l}) 
l1 d l2¼ d ln e-[ 1/2]åi ci2(xm(i)-xt(i)({l}))×e-[ 1/2]cO2(xe(i)-xt({l})) ,
(4)

where we have chosen a specific parametrization for the PDF functionals. This approximation leads to a more traditional approach for determining PDFs with uncertainties. These methods are outlined in sections 4. and 5.. To go beyond the Gaussian approximation more elaborate methods are needed. Sections 2. and 3. describe techniques to accomplish this.

2  Random Sampling Method

This method attempts to calculate Eq. (1) without any approximations by using random sampling, i.e. a Monte Carlo evaluation of the integral [1]. By generating a sufficiently large sample of PDFs Eq. (1) can be approximated by
PpdfO(xe) »  1

N

N
å
i=1 
Pprior(Fi)×Pexpinput(Fi)×PexpO(xe|xt(Fi)) .
(5)

The simple implementation of Eq. (5) would lead to a highly inefficient random sampling and an unreasonably large number of PDFs would be required. By using an optimization procedure, this problem can be solved. The optimization procedure on the functional level of Eq. (1) is simply redefining the PDF F to Fu such that the Jacobian of the transformation equals
 d Fu

F

= Pprior(F)×Pexpinput(F) ,
(6)

so that

PpdfO(xe) = ó
õ


V(Fu) 
Fu PexpO(xe|xt(Fu)) .
(7)

Applying this to the random sampling evaluation gives

PpdfO(xe) »  1

N

N
å
i=1 
PexpO(xe|xt(Fiu)) .
(8)

In the random sampling approximation the redefined PDFs Fiu are easily identified as the unweighted PDFs with respect to the combined probability density Pprior(F)×Pexpinput(F). That is, the density of Fiu is given by this combined probability density. As such, each of the unweighted PDFs is equally likely. This is reflected in Eq. (8), as the probability density function of the observable is the average of the response function over the unweighted PDFs. Finally, we have to generate the set {Fiu}. One method is to use the Gaussian approximation for Eq. (6) simplifying the generation of the set {Fiu} [2].

Another more general approach is to apply a Metropolis Algorithm on the combined probability density function Pprior(F)×Pexpinput(F). This approach will handle any probability function. Furthermore, in a Metropolis Algorithm approach convergence does not depend on the number of parameters used in the PDF parametrization. Also, complicated non-linear parameter subspaces which have constant probability will be modeled correctly. These properties make it possible to use large number of parameters and explore the issue of parametrization dependence of the PDFs.

Once the set {Fiu} is generated we can predict the probability density function for any observable by averaging over the experimental response function using Eq. (8).

3  Lagrange Multiplier Method

The values of the fit parameters that minimize c2 provide by definition the best fit to the global set of data. The dependence of c2 on those parameters in the neighborhood of the minimum can be characterized in quadratic approximation by the matrix of second derivatives, which is known as the Hessian matrix. The inverse of the Hessian matrix is the error matrix, and it forms the basis for traditional estimates of the uncertainties.

The traditional assumption of quadratic behavior of c2 in all directions in parameter space is not necessarily a very good one in the case of PDF fitting, because there are ``flat directions'' in which c2 changes very slowly, so large changes in certain combinations of parameters are not ruled out. This difficulty is always present in the case of PDF fitting, because as more data become available to pin down the PDFs better, more flexible parametrizations of them are introduced to allow ever-finer details to be determined.

To some extent, the flat directions can be allowed for by an iterative method [4,3], whereby the eigenvector directions of the error matrix and their eigenvalues are found using a convergent series of successive approximations. This iterative method is implemented as an extension to the standard minimization program minuit. The result is a collection of PDFs (currently 40 in number) that probe the allowed range of possibilities along each of the eigenvector directions. The PDF uncertainty on any physical quantity-or on some feature of the PDFs themselves-can easily be computed after that quantity is evaluated for each of the eigenvector sets. This method has been applied, for example, to find the uncertainty range for predictions of W and Z cross sections and their correlations [3].

To completely overcome the need to assume quadratic behavior for c2 as a function of the fitting parameters, one can use a Lagrange Multiplier method [4,5] to directly examine how c2 varies as a function of any particular variable that is of interest. The method is a classical mathematical technique that is more fully called Lagrange's ``method of undetermined multipliers.'' To explain it by example, suppose we want to find the effect of PDF uncertainties on the prediction for the Higgs boson production cross section. Instead of finding the PDF parameters that minimize c2 (which measures the quality of fit to the global data), one minimizes c2 + lsH, where sH is the predicted Higgs cross section-which is of course also a function of the PDF parameters. The minimization is carried out for a variety of values of the Lagrange Multiplier constant l. Each minimization provides one point on the curve of c2 versus predicted sH. Once that curve has been mapped out, the uncertainty range is defined by the region for which the increase in c2 above its minimum value is acceptable.

The essential feature of the Lagrange Multiplier method is that it finds the largest possible range for the predictions of a given physical quantity, such as sH, that are consistent with any given assumed maximum increase Dc2 above the best-fit value of c2. This method has been applied, for example, to study the possible range of variation of the rapidity distribution for W production, by extremizing various moments of that distribution [4,5].

4  Covariance Matrix Method

The covariance matrix method is based on the Bayesian approach to the treatment of the systematic errors, when the latter are considered as a random fluctuation like the statistical errors. Having no place for the detailed discussion of the advantages of this approach we refer reader to the introduction into this scope given in Ref. [6]; the only point we would like to underline here is that application of the Bayesian approach is especially justified in the analysis of the data sets with the numerous sources of independent systematic errors, which is the case for the extraction of PDFs from existing experimental data.

Let the data sample have one common additive systematic error1. In this case following the Bayesian approach the measured values yi are given by
yi=fi+mi si+lsi,
(9)

where fi(q0) is the theoretical model describing the data, q0 - the fitted parameter of this model, si - statistical errors, si - systematic errors for each point, mi and l - the independent random variables, and the index i runs through the data points from 1 to N. The only assumption we make is that the average and the dispersion of these variables are zero and unity respectively. It is natural to assume that the mi are Gaussian distributed when the data points are obtained from large statistical samples. Such an assumption is often not justified for the distribution of l

Within the covariance matrix approach the estimate of the fitted parameter [^(q)] is obtained from a minimization of the functional
c2(q)= N
å
i,j=1 
(fi(q)-yi) Eij (fj(q)-yj),
(10)

where

Eij=  1

si sj

æ
è
dij -  rirj

1+r2

ö
ø
 ,

is the inverse of the covariance matrix

Cij = si sj+dijsisj,
(11)

r is modulus of the vector [(r)\vec] with N components equal to ri=si/si and dij is the Kronecker symbol. We underline that with the data model given by Eq. (9) one need not to assume a specific form for distribution of yi in order to obtain the central value of this estimate. In a linear approximation for fi(q) one also does not need such assumptions to estimate the dispersion. In this approximation the dispersion reads [7]

DC(
^
q
 
)=  1

f2

é
ë
1+  r2 z2

1+r2(1-z2)

ù
û
,
(12)

where f is modulus of the vector [(f)\vec] with N components equal to fi=fi¢(q0)/si, the symbol ¢ denotes the derivative with respect to q, and z is the cosine of the angle between [(r)\vec] and [(f)\vec].

To obtain the distribution of [^(q)] one needs to know the complete set of its moments, which, in turn, requires similar information for the moments of yi. At the same time from the consideration similar to the proof of the Central Limit theorem of statistics (see. Ref. [8]) one can conclude that the distribution of [^(q)] is Gaussian, if all independent systematic errors are comparable in value and their number is large enough. More educated guess of the form of the distribution can be performed with the help of the general approach described in section 2..

The dispersion of fitted parameters obtained by the covariance method is different from the one obtained by the offset method described in section 5.. Indeed, the dispersion of the parameter estimate obtained by the offset method applied to the analysis of the data set given by Eq. (9) is equal [7]
DO(
^
q
 
)=  1

f2

(1+r2 z2).
(13)

One can see that DO is generally larger than DC. The difference is especially visible for the case N >> 1, when r >> 1, if the systematic errors are not negligible as compared to the statistical ones. In this case and if z ¹ 1

DC(
^
q
 
) »  1

f2(1-z2)

 ,
(14)

while

DO(
^
q
 
) »  r2 z2

f2

.
(15)

One can see that the standard deviation of the offset method estimator rises linearly with the increase of the systematics, while the covariance matrix dispersion saturates and the difference between them may be very large.

Some peculiarities arise in the case when the systematic errors are multiplicative, i.e. when the si in Eq. (11) are given by hi yi, where hi are constants. As it was noted in Ref. [9] in this case the covariance matrix estimator may be biased. The manifestation of this bias is that the fitted curve lays lower the data points on average, which is reflected by a distortion of the fitted parameters. In order to minimize such bias one is to calculate the covariance matrix of Eq. (11) using the relation si=hi fi(q). In this approach the covariance matrix depends on the fitted parameters and hence has to be iteratively re-calculated during the fit. This certainly makes calculation more time-consuming and difficult, but in this case the bias of the estimator is non-negligible as compared to the value of its standard deviation if the systematic error on the fitted parameter is an order of magnitude larger than the statistical error [7].

5  Offset Method

With the offset method [10], already mentioned above, the systematic errors are incorporated in the model prediction
ti(q,l) = fi(q) +
å
k 
lk  sik ,
(16)

where we allow for several sources of systematic error (k). The c2, to be minimized in a fit, is defined as

c2(q,l) =
å
i 
æ
è
 yi-ti(q,l)

si

ö
ø
2

 
+
å
k 
lk2.
(17)

It can be shown [5] that leaving both q and l free in the fit is mathematically equivalent to the covariance method described in the previous section. However, there is also the choice to fix the systematic parameters to their central values lk = 0 which results in minimizing

c2(q) =
å
i 
æ
è
 yi-fi(q)

si

ö
ø
2

 
 ,
(18)

where only statistical errors are taken into account to get the best value [^(q)] of the parameters. Because systematic errors are ignored in the c2 such a fit forces the theory prediction to be as close as possible to the data.

The systematic errors on q are estimated from fits where each systematic parameter lk is offset by its assumed error (±1) after which the resulting deviations Dq are added in quadrature. To first order this lengthy procedure can be replaced by a calculation of two Hessian matrices M and C, defined by
Mij =  1

2

 2 c2

¶qi ¶qj

       Cij =  1

2

 2 c2

¶qi ¶lj

 .
(19)

The statistical covariance matrix of the fitted parameters is then given by

Vstatq = M-1 ,
(20)

while a systematic covariance matrix can be defined by [11]

Vsystq = M-1 C CT M-1 ,
(21)

where CT is the transpose of C.

Having obtained the best values and the covariance matrix of the parameters, the covariance of any two functions F(q) and G(q) can be calculated with the standard formula for linear error propagation
VFG =
å
ij 
 F(q)

¶qi

   Vijq    G(q)

¶qj

 ,
(22)

where Vq is the statistical, systematic or, if the total error is to be calculated, the sum of both covariance matrices.

Comparing Eqs. (17) and Eqs. (18) it is clear that the parameter values obtained by the covariance and offset methods will, in general, be different. This difference is accounted for by the difference in the error estimates, those of the offset method being larger in most cases. In statistical language this means that the parameter estimation of the offset method is not efficient. The method has a further disadvantage that the goodness of fit cannot be directly judged from the c2 which is calculated from statistical errors only.

For a global QCD analysis of deep inelastic scattering data which uses the offset method to propagate the systematic errors, we refer to [12] (see http://www.nikhef.nl/user/h24/qcdnum for the corresponding PDF set with full error information).

References

[1]
W. Giele, S. Keller, and D. Kosower. hep-ph/0104052. 2001.

[2]
W. Giele and S. Keller. Phys. Rev., D58:094023, 1998.

[3]
J. Pumplin et al. Phys. Rev., D65:014013, 2002.

[4]
J. Pumplin, D. Stump, and W. Tung. Phys. Rev., D65:014011, 2002.

[5]
D. Stump et al. hep-ph/0101051. Phys. Rev., D65:014012, 2002.

[6]
G. D'Agostini. hep-ph/9512295. 1995.

[7]
S. Alekhin. hep-ex/0005042. 2000.

[8]
W. Eadie, D. Drijard, F. James, M. Roos, and B. Sadoulet. Statistical methods in experimental physics, north holland, 1971.

[9]
G. D'Agostini. Nucl. Instrum. Meth., A346:306-311, 1994.

[10]
M. Botje. hep-ph/0110123. 2001.

[11]
C. Pascaud and F. Zomer. preprint LAL-95-05.

[12]
M. Botje. Eur. Phys. J., C14:285-297, 2000.

[13]
M. Botje. QCDNUM version 16.12, preprint ZEUS-97-066.

Footnotes:

1We consider the case of one source of systematic errors, generalization on the many sources case is straightforward.


File translated from TEX by TTH, version 3.06.
On 6 Apr 2002, 07:47.


Comments and/or Problems
=The LHAPDF home page= Theory = LHAPDF ready MC's = Downloads = online manual =