Re: [ncdnhc-discuss] [long] The NCDNHC's .org report is numerically inconsistent.

On 2002-08-20 15:18:42 -0400, Milton Mueller wrote:

>Annex 5 (an Excel file) was not formatted for public view, it was
>used purely for calculation purposes. That is why it appears
>confusing. We didn't sort the data by the final scores, which is
>what you complained about in your first message, because we didn't
>think it would show up in the final report.

Well, the table on page 49 is in fact a bit worse than just that:
The names column is sorted by the actual final score. The
responsiveness column is sorted in decreasing order (ups, not
precisely - neustar and fall out of that pattern - ok,
it's just not sorted in any sensible way at all ;). The support and
differentiation columns are once again sorted by the actual final
score. Consequently, the "total" column of that table is just
garbage, since it adds up scores coming from different applications.

The more I think about this, the more this looks like someone did
serious damage by trying to sort this table.

Mh. Looking more closely, it seems like the same numbers also made
it into the table on page 47, but in a different order. This entire
appendix 5 is extremely confusing and should be cleaned up and

>ISOC really is a 21.25, check your arithmetic. I think you made
>the arithmetic mistake this time ;-)

I don't think so (I hope this makes it to you in reasonably readable

ISOC 3 3 5 5 3 5 2
weight 2.0 0.25 0.5 1.0 1.0 1.0 0.5

6 + 0.75 + 2.5 + 5 + 3 + 5 + 1 = 23.25

I'm attaching a spreadsheet for you. Column I has your results,
column J has a forumla. They _should_ agree...

>I don't agree with your critique of averaging the
>rankings. The problem is that the different dimensions
>of evaluation - differentiation, public support, and
>responsiveness/governance - are not commensurable.
>In ranking applications on each of the three dimensions,
>we created distinct numerical scales - one for "public
>support" one for differentiation, and one for governance.
>Each of these scales is SOMEWHAT arbitrary, but
>does have internal consistency in measuring the specific
>thing it is measuring.

Agreed, with the exception of "public support". More about that
further below.

>But to then treat all 3 of those scales as if they could be
>measured against each other takes the arbitrariness well past the
>breaking point. We don't really know HOW a score of 21.75 in
>"responsiveness" relates to a score of 84 in "public support." And
>it is, in my opinion, bad practice to "normalize" or combine them
>in any way. So the ONLY useful measure of an overall ranking,
>imho, is to average the rankings themselves, or simply to look at
>the three rankings together.

Looking at the three rankings together - in the way the Gartner
people did this with the technical evaluation - is probably the best
approach: Give the respective scores in a matrix, and color-code the
fields according to the tiers in the individual categories (see
results.xls, attached). That makes for a very nice, graphical
presentation to the board which clearly, and at a glance,
demonstrates who has what strenghts and weaknesses,

Of course, you are right that arbitrary normalizations are a
considerable problem. However, you are in that business anyway by
summing up scores for various aspects of governance and
responsiveness in table 2, for instance. Your judgement on the
relative importance of these aspects is encoded in the weights you

Pretending that you don't make that judgement by averaging ranks
just doesn't work: Because, by just averaging ranks, you give the
same weight to relatively unimportant, minor differences (for
instance, the difference between Unity and GNR in the responsiveness
scoring; same tier) as to major differences (like the one between
Unity and on the differentiation criteria scale;
different tiers).

On their respective scales, you even make the difference of
assigning these to different tiers, and in the final evaluation, you
just say that the differences are basically the same thing. That
doesn't make much sense to me.

Now, for the "public support" category (see my blog for a much more
- possibly too - polemic version of this): In this category, I have
the strongest reservations. To begin with, the other categories
may include some arbitrary judgement on your side, but there is
certainly a lot of valuable information in this, at the very least
on the "tier" level - in particular since tiers have a reasonable
safety margin between them in most cases.

The only source you can draw from for public support are the
postings to ICANN's .org forum. The problem with this is that it's
a classic, self-selected (or, even worse, orchestrated) survey.
What do these numbers really tell us? The answer is: They only tell
us how good the various proponents were at mobilizing their
respective "fan clubs". We just don't know how representative these
results may be for the population of .org domain name holders.
Further, we can be pretty sure that these are not informed comments
- who does actually have the time to read all that material?

(I notice that you tried to find out about this last point in your
verification e-mail, but I don't seem to be able to find the results
of this undertaking.)

Thus, this is, from the very beginning, the worst and most
insignificant input you have. To make it still worse, you have to
_estimate_ the number of ISOC class B responses on page 23, because
you can't reasonably make the distinction between class As and class
Bs for this application (which may indicate that the distinction was
the wrong approach to solve this particular problem). Also, in the
evaluation in that chapter, you assign the weight of one class A
response to 5 class Bs (you even write that this is arbitrary).
That's the starting point for the scores and values on page 22, and
for the "averaged rating" evaluation.

But for the "score-based" approach, you then use some kind of
(supposed to be) pseudo-logarithmic rating with additional weights,
described on page 43. There is no rationale for this, and it leads
to interesting results. Just look at the assignment of applications
to tiers: In the score-based approach, GNR (like Neustar) has a
score of 3, but it's in the C tier (Neustar is in B), while dotorg
foundation has 1, and is in the B tier. That doesn't make sense.

Now, what does this mess tell us, ultimately? That you have tried
to make sense out of numbers which don't make any. You are trying
to force some meaning onto these numbers which just may not be

My suggestion: Don't even try. Don't try to use these particular
numbers as a basis for anything - except, perhaps, in the case that
_all_ other scores are equal. And, in particular, please don't
intermingle them with the numbers from the evaluation of the other
criteria. Mixing these numbers in _any_ way has a strong smell of
manipulation around it. That smell doesn't make your evaluation
stronger - in fact, it even weakens the stronger points of your
evaluation, responsiveness and differentiation.

Finally, one question which doesn't have anything to do with
numbers: I have looked a bit at your evaluation of the probable
"winner" of the entire process, ISOC. In the "responsiveness"
category, you write:

>ISOC proposes a number of very innovative services designed to
>respond to the needs of noncommercial entities, not just
>registrants generally. ISOC therefore received a High rating in
>this category. Finally, the Committee notes that although it has
>made no commitment to support good works, profits from the
>registry will go to ISOC. On the arguable proposition that
>support for IAB/IETF standards processes constitutes good works we
>awarded ISOC a Low ranking in this category rather than a None.

I'm sorry, but I fail to find these services. I find services which
are generally useful for registrants, and services useful for IP
owners. But none specifically targeted at noncommercial entities.
Maybe you can shed some light on this?

I apologize for the length of this letter.

Kind regards,
Thomas Roessler <>

Attachment: sc.xls
Description: application/excel

Attachment: results.xls
Description: application/excel