regex - W3C: Can't read EBNF's SPARQL IRIREF specification? -
(specifications: https://www.w3.org/tr/sparql11-query/#ririref)
according specification, iriref can parsed this:
[139] iriref ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>' what bothering me part of expression:
\]-[ if consider \ escaping character in bracketed character class (which case in perl regular expression), means \ alone not problem in iriref , valid: <http://hello\world>
then there big problem range: ]-[. character ] has ordinal value of 93 , [ of 91. means have invalid range: 93 92. not allowed in regex engines tested.
what means?
- should consider
-regular character in bracketed character class, invalid iriref:<http://new-example.org>. makes no sense. - should consider range
]-[null , iriref valid:<http://hello[world]> - what think more range inverted , not problem w3c specifications, means characters
[,\,]invalid characters. makes sense.
the sparql spec says grammar written using notation defined xml 1.1 specification.
in notation, right-hand side quote,
'<' ([^<>"{}|^`\]-[#x00-#x20])* '>' denotes sequence of
- a '<' character
zero or more characters matching expression [^<>"{}|^`]-[#x00-#x20]; set difference denoting
- any character matched [^<>"{}|^
\] = character other '<', '>', '"', '{', '}', '|', '^', '', or '\'; n.b. '\' not escape character in notation (which has no escape characters @ all) - except matched [#x00-#x20] = c1 area of control characters plus blank
this odd way write pattern; equally written [^<>"{}|^`#x00-#x20]; i'm not sure why editors wrote way did.
- any character matched [^<>"{}|^
a '>' character
so answer questions 1 one:
should consider - regular character in bracketed character class, invalid iriref: http://new-example.org. makes no sense.
no. when a , b expressions in notation, a - b denotes string in language of a not string in language of b. here a , b each character-class expressions, 1 negative , 1 positive.
you right make no sense prohibit hyphens grammar rule intended accept iris bracketed angle brackets.
should consider range ]-[ null , iriref valid: http://hello[world]
']-[' not denote range here, null or otherwise; ] ends first character class expression , [ begins second.
what think more range inverted , not problem w3c specifications, means characters [, \ , ] invalid characters. makes sense.
if parsing of expression correct, '[' , ']' legal (they not excluded first expression, , not excluded second); '\' excluded first expression.
Comments
Post a Comment