[foaf-protocols] survey: encoding of integers in foaf+ssl ontology

Story Henry henry.story at bblfish.net
Sat Dec 19 20:24:46 CET 2009


On 19 Dec 2009, at 16:59, Bruno Harbulot wrote:

> Hi Henry,
> 
> 
> Before going through a vote, I think we should state what the purpose of this choice is: there are both practical and theoretical matters to consider.

I would like to get some idea on where we are with this debate. There are comment boxes at the bottom of the vote to give opinions btw.

> I'll try to clarify some of the points I was trying to make in [1], my main concerns here being the consequences on the implementations and the possibility to broaden the approach to a more general model for certificates in RDF. (I also question why using hex notation at all to some extent.)

Yes your  points are very good, and the mistake on which they are building is subtle, so it is good that we are having this discussion now. I give my detailed response below. Please take time to read this carefully.

To summarise my response to your deeper point: The main reason you give of modelling X509 certificates does not in fact get us to a more general model, but rather brings us to a specific implementation of a format that uses an out of date ASN.1 format. To remain general, we should model the invariants across formats, namely the public/private crypt maths.

> In theory, -0x03 is fine, in practice I agree, but this notation is of no use for what we do. It's not because wikipedia mentions it that we should use it.

Or the fact that you can write -0x03 in Java? Hexadecimal notation is a reasonable notation, and for all accounts and purposes there is no difference between hexadecimal and complement 2 notation when one is limited to positive numbers. Well except this one: 
 - that the initial 00 might make a difference.
 - that complement 2 is even more difficult to understand than hex for most people

> In theory, the base in which we write the modulus doesn't matter. In practice, the hexadecimal notation is convenient:
> 
> 1. Tools such as Firefox and OpenSSL show this number in hexadecimal notation, so it's easier to debug when we (humans) look at something like <https://foaf.me/simpleLogin.php> to see if it matches what's in our certificate.

yes, if I look in firefox my public key modulus is:

e1 dc d5 e1 00 8f 21 5e d5 cc 7c 7e c4 9c ad 86 
64 aa dc 29 f2 8d d9 56 7f 31 b6 bd 1b fd b8 ee 
51 0d 3c 84 59 a2 45 d2 13 59 2a 14 82 1a 0f 6e 
d3 d1 4a 2d a9 4c 7e db 90 07 fc f1 8d a3 8e 38 
25 21 0a 32 c1 95 31 3c ba 56 cc 17 45 87 e1 eb 
fd 9f 0f 82 16 67 9f 67 fa 91 e4 0d 55 4e 52 c0 
66 64 2f fe 98 8f ae f8 96 21 5e ea 38 9e 5c 4f 
27 e2 48 ca ca f2 90 23 ad 99 4b cc 38 32 6d bf 

As you know this is in fact in complement 2, a negative number. So the firefox representation itself makes a 'mistake' at this point. But Firefox is not the only one. I think OpenSSO also removes the initial two bytes, at least on some platforms.

So to make this clear I wrote an little Java Program that would show this.


--------------8<-------------------------8<--------------------------

hjs at bblfish-2:0$ cat HexEncoding.java 

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.math.BigInteger;
import java.util.Arrays;

/*
 * Licenced under BSD licence, by Sun Microsystems.
 *
 * @author Henry Story, http://bblfish.net/
 */
public class HexEncoding {

   public static void main(String[] args) throws IOException {
      String line;
      BufferedReader win = new BufferedReader(new InputStreamReader(System.in));
      StringBuffer result = new StringBuffer();
      do {
         line = win.readLine();
         line = line.trim();
         result.append(line);
      } while (!"".equals(line));
      
      String inputStr = cleanHex(result.toString()); //remove newlines, spaces, etc... (but not newlines)
      
      System.out.println("the value of the above interpreted as a hex is:");
      System.out.println("(as an integer base 10)");
      BigInteger hexadec = new BigInteger(inputStr, 16);
      System.out.println(hexadec);

      System.out.println();
      System.out.println("the value of the above interpreted as a hex in complement 2 notation:");
      System.out.println("(as an integer base 10)");
      BigInteger hexaCompl2 = new BigInteger(asBytes(inputStr));
      System.out.println(hexaCompl2);
      
   }
   
   private static byte[] asBytes(String ofCleanHex) {
      int l =ofCleanHex.length();
      if (l%2!=0) { //make sure the string is length modulo 2 (ie a pair number of chars
          ofCleanHex = "0"+ofCleanHex;
      }  
      byte[] result = new byte[ofCleanHex.length()/2]; //two charaters form 1 byte      
      for(int i = 0, chari=0; i < result.length; i++, chari++) {
         byte b1 = Byte.parseByte(ofCleanHex.substring(chari,chari+1),16);
         byte b2 = Byte.parseByte(ofCleanHex.substring(++chari,chari+1),16);
         int r = b1 << 4;
         r = r | b2;
         result[i]= (byte)r;
      }
      return result;
   }

   private static String cleanHex(String strval) {
      StringBuffer cleanval = new StringBuffer();
      for (char c : strval.toCharArray()) {
         if (Arrays.binarySearch(hexchars, c) >= 0) {
            cleanval.append(c);
         }
      }
      return cleanval.toString();
   }

   static final private char[] hexchars = {'0', '1', '2', '3', '4', '5', '6',
       '7', '8', '9', 'A', 'a', 'B', 'b', 'C', 'c',  'D', 'd', 'E', 'e', 'F', 'f'};

   static {
      Arrays.sort(hexchars);
   }
}

hjs at bblfish-2:0$ javac HexEncoding.java

#here I entered the key as copied and pasted from Firefox
#notice how in complement 2 notation the number is negative
hjs at bblfish-2:0$ java HexEncoding
e1 dc d5 e1 00 8f 21 5e d5 cc 7c 7e c4 9c ad 86 
64 aa dc 29 f2 8d d9 56 7f 31 b6 bd 1b fd b8 ee 
51 0d 3c 84 59 a2 45 d2 13 59 2a 14 82 1a 0f 6e 
d3 d1 4a 2d a9 4c 7e db 90 07 fc f1 8d a3 8e 38 
25 21 0a 32 c1 95 31 3c ba 56 cc 17 45 87 e1 eb 
fd 9f 0f 82 16 67 9f 67 fa 91 e4 0d 55 4e 52 c0 
66 64 2f fe 98 8f ae f8 96 21 5e ea 38 9e 5c 4f 
27 e2 48 ca ca f2 90 23 ad 99 4b cc 38 32 6d bf

the value of the above interpreted as a hex is:
(as an integer base 10)
158606138559806377335550770106317005265577130125767819706255482868083267042258351998404052695475901152993815638688779448776648749241932566498859083238638256607175345248416308016576869882960807615893115291173813835869453079742387444699799905789529937781098194129830197452322509164212689523238793832514658135487

the value of the above interpreted as a hex in complement 2 notation:
(as an integer base 10)
-21163174926425213437379748972585468096220567768462837567174598289649408763242611134304424626931634868126298241182613908882141019572484055993988347400835867770592548176449177259725349718285286503559967660911191932968697602600075436774113204751297299382252316554756100787624736774267026781596562497109566001729


#here I entered the key as copied and pasted from Firefox with a couple of extra 00
hjs at bblfish-2:0$ java HexEncoding
00 e1 dc d5 e1 00 8f 21 5e d5 cc 7c 7e c4 9c ad 86 
64 aa dc 29 f2 8d d9 56 7f 31 b6 bd 1b fd b8 ee 
51 0d 3c 84 59 a2 45 d2 13 59 2a 14 82 1a 0f 6e 
d3 d1 4a 2d a9 4c 7e db 90 07 fc f1 8d a3 8e 38 
25 21 0a 32 c1 95 31 3c ba 56 cc 17 45 87 e1 eb 
fd 9f 0f 82 16 67 9f 67 fa 91 e4 0d 55 4e 52 c0 
66 64 2f fe 98 8f ae f8 96 21 5e ea 38 9e 5c 4f 
27 e2 48 ca ca f2 90 23 ad 99 4b cc 38 32 6d bf

the value of the above interpreted as a hex is:
(as an integer base 10)
158606138559806377335550770106317005265577130125767819706255482868083267042258351998404052695475901152993815638688779448776648749241932566498859083238638256607175345248416308016576869882960807615893115291173813835869453079742387444699799905789529937781098194129830197452322509164212689523238793832514658135487

the value of the above interpreted as a hex in complement 2 notation:
(as an integer base 10)
158606138559806377335550770106317005265577130125767819706255482868083267042258351998404052695475901152993815638688779448776648749241932566498859083238638256607175345248416308016576869882960807615893115291173813835869453079742387444699799905789529937781098194129830197452322509164212689523238793832514658135487

--------------8<-------------------------8<--------------------------

As you can see the program clearly shows that the value shown by Firefox is not in complement 2 notation in fact, but in simple hexadecimal.

> 2. I'm not sure how library support is for parsing big decimal numbers. BigInteger in Java works fine, but direct support for big numbers in programming languages is likely to be a problem: that's the reason we use sequences of bytes at this stage. The hexadecimal notation has the advantages that the algorithm for parsing such a sequence is fairly easy: each block of 2 characters make a byte, and more or less all languages can support that.

It does not matter how you decide internally to represent numbers. You just need to do it consistently. So for the moment you can do simple string comparison in hexadecimal if you feel like it. You just need to put your string in some canonical form such that for every number there is only one representation for it in that canonical format. So for example with the suggested hexadecimal format, remove all leading zeroes, and all other no hex characters, and uppercase all of the letters.

The java code shows how you can convert the hexadecimal notation to a byte stream in complement 2 notation. You could then compare the byte stream, for identity.

Or you use one of the BigInteger classes available in most programming languages.

> If you don't think these reasons matter when making this choice, then why use cert:hex at all? Decimal notation using xsd:integer would have been just fine and we can drop cert:hex and its variants altogether. After all, isn't it usually recommended on the semantic web not to re-invent new vocabularies, but reuse what's well established?

integers are a lot longer in decimal notation as you can see from the above examples. There is nothing stopping us from also allowing decimal and base64 notation in the future. That is part of what the survey is trying to test the waters for. My guess is that we should wait for some reason showing that Users want base64 notation before implementing it.

Also the hex notation allows people to cut and paste from Firefox, or usual OpenSSO output.

We could support normal decimal notation, and probably will in the longer term. No need to make things complicated right now.


> If you do think these reasons matters, then that's why it becomes a bit tricky. The main problems being (a) the ready-available implementations to parse big numbers (or arrays of bytes) and (b) what happens if we model more than the public key components, for example other attributes or the entire certificate.

Which other numbers in the certificate can be negative? Are they usually very long? do they need to be in hexadecimal notation? Do we need to model those?


> (a) The question here is do many languages support big integers more or less natively (like Java's BigInteger) or should we rather assume that we're going to have to compare the sequence of bytes (I suspect the latter). I know comparing the sequence of bytes instead of the big number doesn't sound very elegant from a semantic point of view, but that's what happens in practice. It's not our fault if RDF isn't well-suited for handling what's not text.

As I said above, as long as you find a way to compare the integers correctly you are fine. How your implementation does it is up to you.

For the moment our only trouble is I think that people are doing simple string comparison without doing canonicalisation first. This is bound to be problematic because if some put their hexadecimal characters in upper case, and others in lower case in their rdf then the will false negatives.

> (b) I think we should aim to have a vocabulary to model everything possible about an X.509 certificate, so as to be able to build a bridge with existing certificate-based mechanisms (e.g. what Peter has been investigating in the Windows world). This will be important to integrate FOAF+SSL with existing enterprise solutions.

We can do that at a later stage.
You are bringing up here the possibility of something more complicated that we may need, and yet we don't have a proof that we do need those.

Can you prove that the hex encoding issue that we are discussing here and now will have a relevance on the future ontology that you wish to build?

> I'm not saying we should model everything in an X.509 certificate right here, right now, but at least be able to write things. I'm not planning to chase up all the semantics for all the OIDs of the extensions that may be present in a certificate (associated with the fun of ASN.1 parsing), but there may be cases where it would be useful to store an extension in an RDF store without losing information, if only to let another tool that understand that extension use it at a later point.
> This will required storing the arrays of bytes as they are, not as integers. In this domain, zeros that are not significant in the arithmetic context are actually significant.

Can you tell me which of those numbers are negative?

> 
> This is also very important when it comes to modelling signatures: you have to remember that one never signs the semantics but the representations instead: how this information is encoded into a sequence of bytes really matters then.

yes, but then we are in a very different world. Signatures on X509 certificates are signatures on ASN.1 binary blobs. You can model those binary blobs, but unless you put them into the ASN.1  serialisation you will not get the correct signature. So you will not be able to transpose those signatures in any very interesting way into XML. In other words, you cannot take an X509 signature and add the same signature to an XML file or rdf file, and say that signs your file. It does not. To sign an XML serialisation you have to sign the XML.

This is why the signature is not that interesting to us as an identifier. Because there is no reason to tie us to the obsolete ASN.1 format. Why not take PGP signatures instead then? What is consistent across all these formats is the maths of public/private key cryptography. By sticking to that we can model the relations without being stuck with various syntax problems.

I think ontologically with foaf, cert, rsa ontolgies we have enough to get the same thing as an x509 certificate going. All that is needed is to sign the content in a XML format. XMLS-DSIG would be a good way do this. Note that the XML-DSIG format uses base64 notation for its signatures and not hex! 

> Best wishes,


Thanks for the detailed feeback.

But please let's move on! :-)

Henry



> Bruno.
> 
> 
> [1] http://lists.foaf-project.org/pipermail/foaf-protocols/2009-December/001210.html
> 
> Story Henry wrote:
>> I put a survey together for this question. Please take a bit of time to answer it. This will help us get a general feel for how the discussion has progressed.
>>   http://www.surveymonkey.com/s/WX7DPYY
>> PS. I habe kept Peter Williams' proposal, even though he no longer supports it, because it was part of the discussion. Henry Story
>> Social Web Architect
>> Sun Microsystems		
>> Blog: http://blogs.sun.com/bblfish
>> _______________________________________________
>> foaf-protocols mailing list
>> foaf-protocols at lists.foaf-project.org
>> http://lists.foaf-project.org/mailman/listinfo/foaf-protocols



More information about the foaf-protocols mailing list