Constraints as factors reducing the entropy of distributions : an entropy-maximizing spatial interaction model as an example

The aim of this paper is to provide empirical evidence for the statement that the constraints imposed on an objective function are able to reduce the entropy of the corresponding distributions produced by entropy-maximizing models. This idea is evaluated via an application to an entropy-maximizing spatial interaction model, as a typical representative of the family of entropy-maximizing models used in geography. Eleven versions of this spatial interaction model are fitted separately to six sets of data concerning interregional migration in Slovakia. For each model, the predicted flow distribution is derived, prior to calculation of the corresponding predicted entropy, and then comparison of the entropy values relating to all the models. The results obtained indicate very clearly that constraints imposed on an objective function reduce the initial maximum entropy successively, with this reduction depending on the number and nature of the constraints incorporated.


Introduction
As is well-known, constraints used in the derivation of entropy-maximizing models are considered restrictions imposed on an objective function.However, Jaynes (1957Jaynes ( , 1979)), the author of the maximum entropy principle, simultaneously considers them information available.In that capacity, they are able to reduce the entropy of distributions produced by entropy-maximizing models.It Przegląd Geograficzny, 2017, 89, 4, s. 517-533 is this ability that is the main concern in this paper, and specifi cally empirical evidence in support thereof.

Conceptual background
Since the topic of this paper will "turn around" three closely-related basic concepts, viz.information, entropy and uncertainty, we will specify these briefl y for the sake of convenience, notwithstanding their being rather well-known in principle.
If an amount of information contained in an event is , where p is the probability that the event would happen, then a mean amount of information (mathematical expectation) contained in a set of mutually exclusive and exhaustive events having the discrete probability distribution   1 2 , ,..., n P p p p  will be . [1] Note, howev er, that relation [1] represents, not merely a mean amount of information, but also Shannon's entropy relating to the probability distribution, which is in fact known as a measure of uncertainty (Wilson, 1970, p. 7;Yaglom and Yaglom, 1983, pp. 46-47).
The nature of relation [1] as a measure of uncertainty makes possible the derivation of a further concept, viz.information gain.Note that information gain is removed uncertainty, with the amount of information gained equal to the amount of uncertainty removed, i.e.
, [2] where H r deno tes the amount of removed uncertainly, i.e. the information gain I g , H i denotes the initial uncertainty and H o is the observed (actual or remaining) uncertainty.
It can be shown that maximum uncertainty, as measured in terms of maximum entropy, H max , appears when , where n denotes a number of mutually exclusive and exhaustive events (observations); in this case

H H 
, then relation [2] assumes the form .
[3] However, relation [3] represents a special case of the more-general Kullback information gain, I g (k) , defi ned as , [4] where q i denotes a priori probability and p i a posteriori probability.If

Application to entropy-maximizing models
Recall again that constraints, as items of information available, are able to reduce the uncertainty, i.e. the entropy as its measure, of the distributions produced by entropy-maximizing models.To support this view, let us quote the statement of Garrett, a physicist particularly concerned with the topic under discussion, who states: "The more constraints one has the smaller is the maximized entropy …" (Garrett, 1991, p. 155), which in other words means the more uncertainty is removed.Similar formulations are also not unknown in the domains of geography or regional science.For example, a view identical to that from Garrett was expressed some years ago by Batty (2010).To be able to compare let us quote his statement: "This order-disorder continuum with respect to H is directly invoked if we consider that as we put more and more constraints on the form of a distribution we successively reduce the entropy" (Batty, 2010, pp. 397-398).An interesting assertion in line with the aim of our paper can also be found in the article of Roy and Thill (2004).These authors argue that, if maximization of entropy "is … constrained by inducing the model fl ows to conform to certain aggregate base period quantities, [then] such constraints … not only reduce the 'entropy' or 'uncertainty' of the fi nal solution, but usually contribute to obtaining a … solution closer to that observed at the base period" (Roy and Thill, 2004, p. 345).
The basic objective of this paper is to provide empirical evidence for the statement that the gradual imposition of constraints on an objective function, i.e. the addition of ever-more new constraints, really does lead to a gradual reduction in the entropy of a corresponding distribution.Note here also that the application of this procedure also allows for the ultimate derivation of a model producing such a distribution as is near to (in the ideal case as near as possible to) the one observed.

A spatial interaction model as an example
The objective stated above has been pursued by reference to an entropymaximizing spatial interaction model, as a typical representative of the family of entropy-maximizing models used in geography.Recall that this kind of model was designed to predict or replicate the size of the interaction between a set of areal units.In the derivation of the spatial interaction model either Shannon's entropy or the equivalent Boltzmann's entropy is taken as an objective function that is maximized subject to certain constraints.We stress repeatedly that these constraints play a fundamental role in the entropy-maximization procedure.They represent whatever convenient information is available to the spatialinteraction modeller.The more information that is available, the greater will be the number and complexity of constraints.Consequently, the optimal fl ow distribution in a particular case is strongly depended on the number and nature of the constraints incorporated.
As is well known, different versions of the spatial interaction model have been derived with respect to different constraints.The set of model versions selected in this paper includes, not only those considered standard and put to widespread use in interaction modelling, but also certain very simple versions used only rarely, if at all.Nevertheless, reduction of the entropy of corresponding distributions also arises in the case of these models, hence their non-emission for the sake of completeness.These simple models will be referred to as elementary spatial interaction models.
We saw no purpose in writing down the complete derivation of all models taken account of by this paper.The derivation is generally known, and contained in many studies, including the original works by Wilson (1967Wilson ( , 1970Wilson ( and 1974)).We therefore decided to derive only the elementary interaction models whose derivation is usually omitted.This work is offered in condensed form in the Appendix.
This paper examines eleven spatial interaction models, denoted by the Roman numerals I to XI. Model equations (including ones relating to balancing factors) and constraints used in their derivation are presented in Table 1.Models I to IV are the elementary spatial interaction models derived in the Appendix.Models V and VI are two singly constrained spatial interaction models with the exponential distance function.Model V is the production-constrained model, whereas model VI is the attraction-constrained model.Exactly the same principle of model specifi cation is applied to derive the two singly constrained spatial interaction models with the power distance function denoted by the numerals VII and VIII.Model VII is related to model V and model VIII to model VI.Finally, three doubly constrained spatial interaction models are also considered.Model IX is one with the exponential distance function, while two other models can be produced by replacing the exponential distance function by the power distance function (model X) or Tanner's distance function (model XI).
Table 1.Spatial interaction models used in the analysis

Empirical evaluation
This section presents empirical evidence as to this paper's basic statement that the constraints imposed on an objective function are able to reduce the entropy of the corresponding distributions produced by the entropy-maximizing models.
The theoretical arguments presented in the preceding section were evaluated through application to interregional migration fl ows in Slovakia.The evaluation procedure was relatively straightforward and similar to the above theoretical discussion.The spatial interaction models I to XI presented in Table 1 were fi tted separately to each of the six sets of migration data.For each model the predicted {T ij } distribution was fi rst derived, before T ij 's were converted to P ij 's and the corresponding predicted entropy, H p (in bits), was calculated.The entropy of observed distribution, H o , expressed by observed migration fl ows, was also computed.Finally, the entropy values corresponding to all the models were compared.The entire procedure was repeated for each of the six data-sets.
Notation in a migration context: m is the number of origin areas; n is the number of destination areas; T ij is the number of migrations from origin area i to destination area j; O i is the known total outmigration from area i; D j is the known total inmigration into area j; T is the known total number of migrations in the system; c ij is the distance between areas i and j; C (or C´) is the observed total migration length; and are distance decay parameters estimated in the model calibration; A i and B j are the balancing factors ensuring that the constraints on total outmigration and/or total inmigration are satisfi ed.All tables and fi gures in the paper are authors' own elaborations.

Migration data
As a spatial interaction model is fi tted, the basic data requirements are for a matrix containing fl ows between a set of origins and destinations in some specifi c period, and (in the case of models V to XI) a corresponding matrix of inter-area distances.However, in the present context, certain qualifi cations must be applied to the observed interaction matrix, if computation of the entropy measure is to be made possible.In the fi rst place, all entries of the interaction matrix must be known.This is to say that fl ows within regions must be known and taken into consideration.Finally, the interaction matrix must comprise nonzero values only.Clearly, the practical problems of collecting accurate interaction data can be severe in this context.In this paper, we use data on internal migration in Slovakia as the only kind of spatial interaction data that are simultaneously suitable and available.
The migration data used in our analysis were obtained from the records of the current registration of population in Slovakia.They relate to changes of permanent residence between two communes (the smallest units of local administration) in the country.It follows from the defi nition that data on internal migration in Slovakia are counts of moves rather than of transitions.Data on migration before 2000 were taken from the basic source Pohyb obyvateľstva (Population Change), published annually by the Statistical Offi ce of the Slovak Republic or the Federal Statistical Offi ce of former Czechoslovakia, while all information concerning migration fl ows after 2000 was extracted from unpublished tabulations provided by the Statistical Offi ce of the Slovak Republic.The annual migration data were consolidated into fi ve-year sets to avoid the problems of sparse matrices which arise with data for single-year migration fl ows.
As already noted above, the evaluation procedure was applied to the six migration data-sets labelled A, B, C, D, E, and F, each with a different number of origin and destination regions.The data-sets A, B, and C relate to migration fl ows between functional urban regions (FURs) in Slovakia during the 2001-05 period.The FURs were derived by Bezák (2000a) using the journey-to-work data from the 1991 Census of Population and Housing.They were defi ned as spatially contiguous groups of communes which are cohesive internally and (relatively) self-contained in terms of daily commuting.To avoid the problem of sparse interaction matrices, it was decided to modify the original set of fi fty-one functional urban regions as follows.
Data-set A relates to migration fl ows between the forty-four FURs modifi ed by allocating each of seven original FURs with zero infl ows and/or outfl ows to the adjacent FUR to which it sends the largest commuting fl ow.Data-set B includes only data on migration between the twenty-fi ve largest FURs with population exceeding 80 thousand in 2001.To produce the data set C, the fi ftyone original FURs were aggregated into twelve functional macro-regions, using the INTRAMAX regionalization procedure (Masser and Scheurwater, 1978) and interregional commuting data from the 2001 Population Census.In all three cases, distances between and within regions were measured as mean straightline distances between population-weighted centroids of their constituent communes, using the procedure described by Bell et al. (2002).
Data-set D contains the most recent data on migration between FURs during the 2011-15 period.However, to avoid zero values, only moves between twenty randomly selected FURs with non-zero infl ows and outfl ows were used.Distances between and within regions were determined repeatedly through application of the procedure referred to above.
Finally, data-sets E and F refer to migration fl ows between former administrative districts that were defi ned in Slovakia prior to the 1996 administrative reform.Data-set E includes migration data for the 1981-85 period, while data-set F relates to those for the 1991-95 period.In order to obtain comparable regional units, each of the two urban districts of Bratislava and Košice was amalgamated with its surrounding rural district into one metropolitan area, with the effect that the number of districts used in analysis was reduced from thirty-eight to thirty-six.Distances between districts were measured as the road distances between district centres, while the average distances within districts were estimated as half of the radius of a circle of equal area, following the approach suggested by Rogerson (1990).
To complete the discussion, it should be noted that, in all six cases, migrations within regions (or districts) were defi ned as moves between their constituent communes.

Calibration and results
Spatial interaction models V to XI were calibrated using a maximum likelihood method based on the Newton-Raphson iterative routine for solving systems of non-linear equations (Batty and Mackie, 1972).The computer program used in this exercise was a modifi ed version of one described in Stillwell (1983).The original program was updated, extended and rewritten into the C++ programming language by the second author.The entropy values, H p (in bits), obtained by fi tting the spatial interaction models to the six sets of migration data are as presented in Table 2.
Two points need stressing as entropy values contained in Table 2 are interpreted.First, the order of the models in Table 2 is consistent with their sequence in Table 1 and, as a result, this does not correspond exactly with the reduction in entropy values.Second, it needs to be emphasised that not every pair of consecutive models is comparable with respect to entropy reduction.For example, model IX is not comparable with model VIII on account of a different distance decay function resulting from a different migration length constraint (see Table 1).In spite of this, Table 1 retains some value, as it allows us to compare entropy values corresponding with a particular model across different data-sets.
To facilitate correct comparison of entropy values among models, the said values for individual data-sets are as shown separately in Figs.1-3.In these fi gures, each model is represented by a small rectangle, which is marked with the associate Roman numeral and simultaneously contains the corresponding entropy value.The elementary spatial interaction model I, with a single balancing constraint on total migration is placed at the top of the fi gure and the rectangle related to the observed distribution (labelled OBS) is located at the bottom of the fi gure.Other models can be found at several intervening levels, each of them referring to a different category of spatial interaction model.Note that models placed at the same level are distinguished only by the nature (or complexity) of the constraints imposed on an objective function.On the other hand, models placed at different levels are also distinguished by the number (or amount) of constraints incorporated.
For example, elementary spatial interaction models II and III, each with one balancing constraint on total outmigration or total inmigration, are placed at the second level.However, elementary spatial interaction model IV, derived by using both constraints, is located at the third level.The fourth level includes four singly constrained spatial interaction models also with balancing constraints on total outmigration or total inmigration.In addition, a migration-length constraint is imposed on the migrations in these models.It needs to be empha- sised, however, that the modifi cation of the migration-length constraint leads in this case to two versions of the spatial interaction models with different distance functions.It can be supposed, therefore, that the initial maximum entropy, H max , corresponding with elementary spatial interaction model I, is successively reduced from the top to the bottom of the fi gure just along lines connecting the  individual models.Moreover, it is clear that only the models connected with these lines are comparable with respect to the entropy reduction.
The main results derived from these fi gures can be summarised as follows.In the fi rst place, it is evident that constraints imposed on an objective function reduce the initial maximum entropy.As expected, the reduction depends on the number and nature of constraints incorporated.All fi gures show clearly that the spatial interaction models with a distance decay function contribute far more to entropy reduction than do elementary interaction models I to IV.In turn, the doubly constrained models IX to XI with the complete system of constraints prove more successful in reducing the initial entropy than all comparable versions of singly constrained model.Another interesting point is that the reductions in initial entropy achieved by attraction-constrained models VI and VIII are greater than those generated by the corresponding production-constrained models V and VII.
As regards the form of the distance decay function used, it is evident that the negative power function in any case provides for a greater reduction of the initial entropy than does the negati ve exponential function.This result is consistent with the broad consensus that the power function is more appropriate for analysing longer-distance interactions such as migration fl ows (cf.Fotheringham and O'Kelly, 1989, pp. 12-13), as confi rmed also by the study on interregional migration in Slovakia (Bezák, 2000b).Furthermore, it is clear that the best approximation of the predicted entropy, H p , to the entropy of observed distribution, H o , has been achieved in the case of doubly constrained model XI with Tanner's distance decay function.We can conclude, therefore, that from among the set  of models considered, model XI is the most appropriate spatial interaction model according to the methodology discussed in this paper.

Conclusion
The principal conclusion to be drawn from this study is that the constraints imposed on an objective function are able to reduce the entropy of the corresponding distributions produced by the entropy-maximizing spatial interaction models.It should be emphasised that this reduction is gradual, and depends on the number and nature of the constraints imposed.In addition, we have arrived at certain conclusions concerning the kinds of constraints most successful in reducing initial maximum entropy.Finally, we have demonstrated that the doubly constrained spatial interaction model with Tanner's distance function can be regarded as the best model from among the set considered, as far as the suggested methodology is concerned.
We can thus conclude that the basic statement of this paper is now supported by some empirical evidence.However, we are far from any generalization based on these conclusions, because our examination was based on only a limited number of cases.For this reason, further examinations in the same direction would be welcome.

Fig. 1 .
Fig. 1.Entropy values in bits for data-sets A and B

Fig. 3 .
Fig. 3. Entropy values in bits for data-sets E and F

Table 2 .
Entropy values in bits for spatial interaction models fi tted to the six sets of migration data Note: The label OBS denotes the observed distribution of migration fl ows.