regex - W3C: Can't read EBNF's SPARQL IRIREF specification? -
(specifications: https://www.w3.org/tr/sparql11-query/#ririref)
according specification, iriref can parsed this:
[139] iriref ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
what bothering me part of expression:
\]-[
if consider \
escaping character in bracketed character class (which case in perl regular expression), means \
alone not problem in iriref , valid: <http://hello\world>
then there big problem range: ]-[
. character ]
has ordinal value of 93 , [
of 91. means have invalid range: 93 92. not allowed in regex engines tested.
what means?
- should consider
-
regular character in bracketed character class, invalid iriref:<http://new-example.org>
. makes no sense. - should consider range
]-[
null , iriref valid:<http://hello[world]>
- what think more range inverted , not problem w3c specifications, means characters
[
,\
,]
invalid characters. makes sense.
the sparql spec says grammar written using notation defined xml 1.1 specification.
in notation, right-hand side quote,
'<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
denotes sequence of
- a '<' character
zero or more characters matching expression [^<>"{}|^`]-[#x00-#x20]; set difference denoting
- any character matched [^<>"{}|^
\] = character other '<', '>', '"', '{', '}', '|', '^', '
', or '\'; n.b. '\' not escape character in notation (which has no escape characters @ all) - except matched [#x00-#x20] = c1 area of control characters plus blank
this odd way write pattern; equally written [^<>"{}|^`#x00-#x20]; i'm not sure why editors wrote way did.
- any character matched [^<>"{}|^
a '>' character
so answer questions 1 one:
should consider - regular character in bracketed character class, invalid iriref: http://new-example.org. makes no sense.
no. when a , b expressions in notation, a - b denotes string in language of a not string in language of b. here a , b each character-class expressions, 1 negative , 1 positive.
you right make no sense prohibit hyphens grammar rule intended accept iris bracketed angle brackets.
should consider range ]-[ null , iriref valid: http://hello[world]
']-[' not denote range here, null or otherwise; ]
ends first character class expression , [
begins second.
what think more range inverted , not problem w3c specifications, means characters [, \ , ] invalid characters. makes sense.
if parsing of expression correct, '[' , ']' legal (they not excluded first expression, , not excluded second); '\' excluded first expression.
Comments
Post a Comment