The VIRCOL website=The LHAPDF home page= Theory = LHAPDF ready MC's = = online manual = |
Contents1. A Mathematical framework
1 A Mathematical frameworkAt first sight PDF fitting is rather straightforward. However, a more detailed look reveals many difficult issues. As the PDF uncertainties will affect all areas of phenomenology at hadron colliders, a clear mathematical framework of a PDF fit is essential [1]. From this formulation, all explicit methods can be derived. Also, the mathematical description will make explicit all assumptions needed before one can make a fit. These assumptions are crucial and do not find their origin in experimental results, but rather in theoretical prejudice. Such assumptions are unavoidable as we fit a system with an infinite number of degrees of freedom (the PDF functional) to a finite set of data points. We want to calculate the probability density function POpdf which reflects the uncertainty in predicting the observable O due to the PDF uncertainties. The function POpdf(xe) gives the probability density to measure a value xe for observable O.
To calculate the PDF probability density function for observable O
we have to integrate over the functional space of all possible PDFs V(F).
The integration is weighted by three probability density functions:
the prior probability density function, Pprior,
the experimental response function of the observable, PexpO
and the probability density function of the fitted experiments,
Pexpinput. The resulting formula is given by
The prior probability density function contains theoretical constraints
on the PDF functionals such as sum rules and other potential
fit constraints (e.g. an explicit aS(MZ) value).
The most crucial property of the prior function is that it
defines the functional integral by imposing smoothness constraints
to make the number of degrees of freedom become finite. The simplest example
is an explicit finite parametrization of the PDF functionals
where the PDF parameters are given by the list {l}. Note that through the functional parametrization choice we have restricted the integration to a specific subset of potential PDF functionals F. Such a choice is founded on physics assumptions with no a priori justification. The resulting phenomenology depends on this assumption.
The experimental response function forms the interface with the experimental
measurement. It gives the probability density to
measure a value xe given a prediction xt.
The experimental response functions contain all information about
the experiment needed to analyze the measurement.
The perdiction xt is an
approximation using a perturbative estimate of the observable
amended with relevant nonperturbative corrections.
A simple example would be
where we have a Gaussian response function
with a one sigma uncertainty d.
Note that the form of the response function depends on the actual
experiment under consideration. It is sometimes convenient to
get a result that is independent of an experiment and its particular
detector: To obtain the
theoretical prediction for the probability density function of the
observable, one can simply replace
the experimental response function by a delta function
(i.e. assume a perfect detector)
Finally the probability function for the fitted experiments is simply
a product of all the experimental response functions
where xm(i) denotes the set of measurements provided by experiment (i). This function measures the probability density that the theory prediction based on PDF F describes the combined experimental measurements.
Often the experimental uncertainties can be approximated by a Gaussian
form of the experimental reponse function, i.e. a c2
description of the uncertainties:
where we have chosen a specific parametrization for the PDF functionals. This approximation leads to a more traditional approach for determining PDFs with uncertainties. These methods are outlined in sections 4. and 5.. To go beyond the Gaussian approximation more elaborate methods are needed. Sections 2. and 3. describe techniques to accomplish this.
2 Random Sampling Method
This method attempts to calculate Eq. (1) without
any approximations by using random sampling, i.e. a Monte Carlo evaluation of
the integral [1]. By generating a sufficiently
large sample of PDFs Eq. (1)
can be approximated by
The simple implementation of Eq. (5)
would lead to a highly inefficient
random sampling and an unreasonably large number of PDFs would be
required. By using an optimization procedure, this problem can be
solved. The optimization procedure on the functional level of
Eq. (1) is simply redefining the PDF F to Fu
such that the Jacobian of the transformation equals
so that
Applying this to the random sampling evaluation gives
In the random sampling approximation the redefined PDFs Fiu are easily identified as the unweighted PDFs with respect to the combined probability density Pprior(F)×Pexpinput(F). That is, the density of Fiu is given by this combined probability density. As such, each of the unweighted PDFs is equally likely. This is reflected in Eq. (8), as the probability density function of the observable is the average of the response function over the unweighted PDFs. Finally, we have to generate the set {Fiu}. One method is to use the Gaussian approximation for Eq. (6) simplifying the generation of the set {Fiu} [2]. Another more general approach is to apply a Metropolis Algorithm on the combined probability density function Pprior(F)×Pexpinput(F). This approach will handle any probability function. Furthermore, in a Metropolis Algorithm approach convergence does not depend on the number of parameters used in the PDF parametrization. Also, complicated non-linear parameter subspaces which have constant probability will be modeled correctly. These properties make it possible to use large number of parameters and explore the issue of parametrization dependence of the PDFs. Once the set {Fiu} is generated we can predict the probability density function for any observable by averaging over the experimental response function using Eq. (8).
3 Lagrange Multiplier MethodThe values of the fit parameters that minimize c2 provide by definition the best fit to the global set of data. The dependence of c2 on those parameters in the neighborhood of the minimum can be characterized in quadratic approximation by the matrix of second derivatives, which is known as the Hessian matrix. The inverse of the Hessian matrix is the error matrix, and it forms the basis for traditional estimates of the uncertainties. The traditional assumption of quadratic behavior of c2 in all directions in parameter space is not necessarily a very good one in the case of PDF fitting, because there are ``flat directions'' in which c2 changes very slowly, so large changes in certain combinations of parameters are not ruled out. This difficulty is always present in the case of PDF fitting, because as more data become available to pin down the PDFs better, more flexible parametrizations of them are introduced to allow ever-finer details to be determined. To some extent, the flat directions can be allowed for by an iterative method [4,3], whereby the eigenvector directions of the error matrix and their eigenvalues are found using a convergent series of successive approximations. This iterative method is implemented as an extension to the standard minimization program minuit. The result is a collection of PDFs (currently 40 in number) that probe the allowed range of possibilities along each of the eigenvector directions. The PDF uncertainty on any physical quantity-or on some feature of the PDFs themselves-can easily be computed after that quantity is evaluated for each of the eigenvector sets. This method has been applied, for example, to find the uncertainty range for predictions of W and Z cross sections and their correlations [3]. To completely overcome the need to assume quadratic behavior for c2 as a function of the fitting parameters, one can use a Lagrange Multiplier method [4,5] to directly examine how c2 varies as a function of any particular variable that is of interest. The method is a classical mathematical technique that is more fully called Lagrange's ``method of undetermined multipliers.'' To explain it by example, suppose we want to find the effect of PDF uncertainties on the prediction for the Higgs boson production cross section. Instead of finding the PDF parameters that minimize c2 (which measures the quality of fit to the global data), one minimizes c2 + lsH, where sH is the predicted Higgs cross section-which is of course also a function of the PDF parameters. The minimization is carried out for a variety of values of the Lagrange Multiplier constant l. Each minimization provides one point on the curve of c2 versus predicted sH. Once that curve has been mapped out, the uncertainty range is defined by the region for which the increase in c2 above its minimum value is acceptable. The essential feature of the Lagrange Multiplier method is that it finds the largest possible range for the predictions of a given physical quantity, such as sH, that are consistent with any given assumed maximum increase Dc2 above the best-fit value of c2. This method has been applied, for example, to study the possible range of variation of the rapidity distribution for W production, by extremizing various moments of that distribution [4,5].
4 Covariance Matrix MethodThe covariance matrix method is based on the Bayesian approach to the treatment of the systematic errors, when the latter are considered as a random fluctuation like the statistical errors. Having no place for the detailed discussion of the advantages of this approach we refer reader to the introduction into this scope given in Ref. [6]; the only point we would like to underline here is that application of the Bayesian approach is especially justified in the analysis of the data sets with the numerous sources of independent systematic errors, which is the case for the extraction of PDFs from existing experimental data.
Let the data sample have one common
additive systematic error1.
In this case following the Bayesian approach the measured values
yi are given by
where fi(q0) is the theoretical model describing the data, q0 - the fitted parameter of this model, si - statistical errors, si - systematic errors for each point, mi and l - the independent random variables, and the index i runs through the data points from 1 to N. The only assumption we make is that the average and the dispersion of these variables are zero and unity respectively. It is natural to assume that the mi are Gaussian distributed when the data points are obtained from large statistical samples. Such an assumption is often not justified for the distribution of l
Within the covariance matrix approach the estimate of the
fitted parameter [^(q)] is obtained
from a minimization of the functional
where
is the inverse of the covariance matrix
r is modulus of the vector [(r)\vec] with N components equal to
ri=si/si
and dij is the Kronecker symbol.
We underline that with the data model given by Eq. (9)
one need not to assume a specific form for distribution of yi
in order to obtain the central value of this estimate.
In a linear approximation for fi(q) one also does not need such
assumptions to estimate the dispersion.
In this approximation the dispersion reads [7]
where f is modulus of the vector [(f)\vec] with N components equal to fi=fi¢(q0)/si, the symbol ¢ denotes the derivative with respect to q, and z is the cosine of the angle between [(r)\vec] and [(f)\vec]. To obtain the distribution of [^(q)] one needs to know the complete set of its moments, which, in turn, requires similar information for the moments of yi. At the same time from the consideration similar to the proof of the Central Limit theorem of statistics (see. Ref. [8]) one can conclude that the distribution of [^(q)] is Gaussian, if all independent systematic errors are comparable in value and their number is large enough. More educated guess of the form of the distribution can be performed with the help of the general approach described in section 2..
The dispersion of fitted parameters obtained by the
covariance method is
different from the one obtained by the offset method described
in section 5.. Indeed, the dispersion of the
parameter estimate obtained by the offset method
applied to the analysis of the data set given by
Eq. (9)
is equal [7]
One can see that DO is generally larger than DC.
The difference is especially visible for the case N >> 1, when
r >> 1, if the systematic errors are not negligible as compared
to the statistical ones. In this case and if z ¹ 1
One can see that the standard deviation of the offset method estimator rises linearly with the increase of the systematics, while the covariance matrix dispersion saturates and the difference between them may be very large. Some peculiarities arise in the case when the systematic errors are multiplicative, i.e. when the si in Eq. (11) are given by hi yi, where hi are constants. As it was noted in Ref. [9] in this case the covariance matrix estimator may be biased. The manifestation of this bias is that the fitted curve lays lower the data points on average, which is reflected by a distortion of the fitted parameters. In order to minimize such bias one is to calculate the covariance matrix of Eq. (11) using the relation si=hi fi(q). In this approach the covariance matrix depends on the fitted parameters and hence has to be iteratively re-calculated during the fit. This certainly makes calculation more time-consuming and difficult, but in this case the bias of the estimator is non-negligible as compared to the value of its standard deviation if the systematic error on the fitted parameter is an order of magnitude larger than the statistical error [7].
5 Offset Method
With the offset method [10], already mentioned above,
the systematic errors are incorporated in the model prediction
where we allow for several sources of systematic error (k). The
c2, to be minimized in a fit, is defined as
It can be shown [5] that leaving both q and
l free in the fit is mathematically equivalent to the
covariance method described in the previous section. However, there
is also the choice to fix the systematic parameters to their central
values lk = 0 which results in minimizing
where only statistical errors are taken into account to get the best value [^(q)] of the parameters. Because systematic errors are ignored in the c2 such a fit forces the theory prediction to be as close as possible to the data.
The systematic errors on q are estimated from fits where each
systematic parameter lk is offset by its assumed error
(±1) after which the resulting deviations Dq are
added in quadrature. To first order this lengthy procedure can be
replaced by a calculation of two Hessian matrices M and C, defined
by
The statistical covariance matrix of the fitted parameters is then
given by
while a systematic covariance matrix can be defined
by [11]
where CT is the transpose of C.
Having obtained the best values and the covariance matrix of the
parameters, the covariance of any two functions F(q) and
G(q) can be calculated with the standard formula for linear
error propagation
where Vq is the statistical, systematic or, if the total error is to be calculated, the sum of both covariance matrices. Comparing Eqs. (17) and Eqs. (18) it is clear that the parameter values obtained by the covariance and offset methods will, in general, be different. This difference is accounted for by the difference in the error estimates, those of the offset method being larger in most cases. In statistical language this means that the parameter estimation of the offset method is not efficient. The method has a further disadvantage that the goodness of fit cannot be directly judged from the c2 which is calculated from statistical errors only. For a global QCD analysis of deep inelastic scattering data which uses the offset method to propagate the systematic errors, we refer to [12] (see http://www.nikhef.nl/user/h24/qcdnum for the corresponding PDF set with full error information).
References
Footnotes:
1We consider the case of one source
of systematic errors, generalization
on the many sources case is straightforward.
File translated from
TEX
by
TTH,
version 3.06. |
Comments and/or Problems=The LHAPDF home page= Theory = LHAPDF ready MC's = Downloads = online manual = |