regex - W3C: Can't read EBNF's SPARQL IRIREF specification? -


(specifications: https://www.w3.org/tr/sparql11-query/#ririref)

according specification, iriref can parsed this:

[139]   iriref    ::=   '<' ([^<>"{}|^`\]-[#x00-#x20])* '>' 

what bothering me part of expression:

\]-[ 

if consider \ escaping character in bracketed character class (which case in perl regular expression), means \ alone not problem in iriref , valid: <http://hello\world>

then there big problem range: ]-[. character ] has ordinal value of 93 , [ of 91. means have invalid range: 93 92. not allowed in regex engines tested.

what means?

  1. should consider - regular character in bracketed character class, invalid iriref: <http://new-example.org>. makes no sense.
  2. should consider range ]-[ null , iriref valid: <http://hello[world]>
  3. what think more range inverted , not problem w3c specifications, means characters [, \ , ] invalid characters. makes sense.

the sparql spec says grammar written using notation defined xml 1.1 specification.

in notation, right-hand side quote,

'<' ([^<>"{}|^`\]-[#x00-#x20])* '>' 

denotes sequence of

  • a '<' character
  • zero or more characters matching expression [^<>"{}|^`]-[#x00-#x20]; set difference denoting

    • any character matched [^<>"{}|^\] = character other '<', '>', '"', '{', '}', '|', '^', '', or '\'; n.b. '\' not escape character in notation (which has no escape characters @ all)
    • except matched [#x00-#x20] = c1 area of control characters plus blank

    this odd way write pattern; equally written [^<>"{}|^`#x00-#x20]; i'm not sure why editors wrote way did.

  • a '>' character

so answer questions 1 one:

should consider - regular character in bracketed character class, invalid iriref: http://new-example.org. makes no sense.

no. when a , b expressions in notation, a - b denotes string in language of a not string in language of b. here a , b each character-class expressions, 1 negative , 1 positive.

you right make no sense prohibit hyphens grammar rule intended accept iris bracketed angle brackets.

should consider range ]-[ null , iriref valid: http://hello[world]

']-[' not denote range here, null or otherwise; ] ends first character class expression , [ begins second.

what think more range inverted , not problem w3c specifications, means characters [, \ , ] invalid characters. makes sense.

if parsing of expression correct, '[' , ']' legal (they not excluded first expression, , not excluded second); '\' excluded first expression.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -