Re: [ncdnhc-discuss] [long] The NCDNHC's .org report is numerically inconsistent.

To: Milton Mueller <Mueller@syr.edu>
Subject: Re: [ncdnhc-discuss] [long] The NCDNHC's .org report is numerically inconsistent.
From: Thomas Roessler <roessler@does-not-exist.org>
Date: Tue, 20 Aug 2002 22:48:28 +0200
Cc: discuss@icann-ncc.org, lynn@icann.org
In-reply-to: <sd625dec.084@gwia201.syr.edu>
References: <sd625dec.084@gwia201.syr.edu>
On 2002-08-20 15:18:42 -0400, Milton Mueller wrote:



>Annex 5 (an Excel file) was not formatted for public view, it was

>used purely for calculation purposes. That is why it appears

>confusing. We didn't sort the data by the final scores, which is

>what you complained about in your first message, because we didn't

>think it would show up in the final report.



Well, the table on page 49 is in fact a bit worse than just that:

The names column is sorted by the actual final score.  The

responsiveness column is sorted in decreasing order (ups, not

precisely - neustar and register.org fall out of that pattern - ok,

it's just not sorted in any sensible way at all ;).  The support and

differentiation columns are once again sorted by the actual final

score.  Consequently, the "total" column of that table is just

garbage, since it adds up scores coming from different applications.



The more I think about this, the more this looks like someone did

serious damage by trying to sort this table.



Mh.  Looking more closely, it seems like the same numbers also made

it into the table on page 47, but in a different order.  This entire

appendix 5 is extremely confusing and should be cleaned up and

re-published.



>ISOC really is a 21.25, check your arithmetic. I think you made

>the arithmetic mistake this time ;-)



I don't think so (I hope this makes it to you in reasonably readable

form):



ISOC 3       3       5       5       3       5       2

weight     2.0     0.25    0.5     1.0     1.0     1.0     0.5



     6       + 0.75  + 2.5   + 5     + 3     + 5     + 1 = 23.25



I'm attaching a spreadsheet for you.  Column I has your results,

column J has a forumla.  They _should_ agree...



>I don't agree with your critique of averaging the

>rankings. The problem is that the different dimensions

>of evaluation - differentiation, public support, and

>responsiveness/governance - are not commensurable.

>In ranking applications on each of the three dimensions,

>we created distinct numerical scales - one for "public

>support" one for differentiation, and one for governance.

>Each of these scales is SOMEWHAT arbitrary, but

>does have internal consistency in measuring the specific

>thing it is measuring.



Agreed, with the exception of "public support".  More about that

further below.



>But to then treat all 3 of those scales as if they could be

>measured against each other takes the arbitrariness well past the

>breaking point. We don't really know HOW a score of 21.75 in

>"responsiveness" relates to a score of 84 in "public support." And

>it is, in my opinion, bad practice to "normalize" or combine them

>in any way. So the ONLY useful measure of an overall ranking,

>imho, is to average the rankings themselves, or simply to look at

>the three rankings together.



Looking at the three rankings together - in the way the Gartner

people did this with the technical evaluation - is probably the best

approach: Give the respective scores in a matrix, and color-code the

fields according to the tiers in the individual categories (see

results.xls, attached).  That makes for a very nice, graphical

presentation to the board which clearly, and at a glance,

demonstrates who has what strenghts and weaknesses,



Of course, you are right that arbitrary normalizations are a

considerable problem.  However, you are in that business anyway by

summing up scores for various aspects of governance and

responsiveness in table 2, for instance.  Your judgement on the

relative importance of these aspects is encoded in the weights you

apply.



Pretending that you don't make that judgement by averaging ranks

just doesn't work: Because, by just averaging ranks, you give the

same weight to relatively unimportant, minor differences (for

instance, the difference between Unity and GNR in the responsiveness

scoring; same tier) as to major differences (like the one between

Unity and register.org on the differentiation criteria scale;

different tiers).



On their respective scales, you even make the difference of

assigning these to different tiers, and in the final evaluation, you

just say that the differences are basically the same thing.  That

doesn't make much sense to me.







Now, for the "public support" category (see my blog for a much more

 - possibly too - polemic version of this): In this category, I have

the strongest reservations.  To begin with, the other categories

may include some arbitrary judgement on your side, but there is

certainly a lot of valuable information in this, at the very least

on the "tier" level - in particular since tiers have a reasonable

safety margin between them in most cases.



The only source you can draw from for public support are the

postings to ICANN's .org forum.  The problem with this is that it's

a classic, self-selected (or, even worse, orchestrated) survey.

What do these numbers really tell us?  The answer is: They only tell

us how good the various proponents were at mobilizing their

respective "fan clubs".  We just don't know how representative these

results may be for the population of .org domain name holders.

Further, we can be pretty sure that these are not informed comments

 - who does actually have the time to read all that material?



(I notice that you tried to find out about this last point in your

verification e-mail, but I don't seem to be able to find the results

of this undertaking.)



Thus, this is, from the very beginning, the worst and most

insignificant input you have.  To make it still worse, you have to

_estimate_ the number of ISOC class B responses on page 23, because

you can't reasonably make the distinction between class As and class

Bs for this application (which may indicate that the distinction was

the wrong approach to solve this particular problem).  Also, in the

evaluation in that chapter, you assign the weight of one class A

response to 5 class Bs (you even write that this is arbitrary).

That's the starting point for the scores and values on page 22, and

for the "averaged rating" evaluation.



But for the "score-based" approach, you then use some kind of

(supposed to be) pseudo-logarithmic rating with additional weights,

described on page 43.  There is no rationale for this, and it leads

to interesting results.  Just look at the assignment of applications

to tiers: In the score-based approach, GNR (like Neustar) has a

score of 3, but it's in the C tier (Neustar is in B), while dotorg

foundation has 1, and is in the B tier.  That doesn't make sense.





Now, what does this mess tell us, ultimately?  That you have tried

to make sense out of numbers which don't make any.  You are trying

to force some meaning onto these numbers which just may not be

there.



My suggestion: Don't even try.  Don't try to use these particular

numbers as a basis for anything - except, perhaps, in the case that

_all_ other scores are equal.  And, in particular, please don't

intermingle them with the numbers from the evaluation of the other

criteria.  Mixing these numbers in _any_ way has a strong smell of

manipulation around it.  That smell doesn't make your evaluation

stronger - in fact, it even weakens the stronger points of your

evaluation, responsiveness and differentiation.







Finally, one question which doesn't have anything to do with

numbers: I have looked a bit at your evaluation of the probable

"winner" of the entire process, ISOC.  In the "responsiveness"

category, you write:



>ISOC proposes a number of very innovative services designed to

>respond to the needs of noncommercial entities, not just

>registrants generally. ISOC therefore received a High rating in

>this category.  Finally, the Committee notes that although it has

>made no commitment to support good works, profits from the

>registry will go to ISOC.  On the arguable proposition that

>support for IAB/IETF standards processes constitutes good works we

>awarded ISOC a Low ranking in this category rather than a None.



I'm sorry, but I fail to find these services.  I find services which

are generally useful for registrants, and services useful for IP

owners.  But none specifically targeted at noncommercial entities.

Maybe you can shed some light on this?





I apologize for the length of this letter.



Kind regards,

--

Thomas Roessler                        <roessler@does-not-exist.org>
Attachment: sc.xls
Description: application/excel
Attachment: results.xls
Description: application/excel