regex - Why doesn't Perl v5.22 find all the sentence boundaries? -


this fixed in perl 5.22.1. write in perl v5.22 adds fancy unicode word boundaries.


perl v5.22 added unicode assertions tr #29. i've been playing sentence boundary assertion, seems find start , end of text:

use v5.22;  $_ = "see spot. (spot dog.) see spot run. run spot, run!\x{2029}new paragraph.";  while( m/\b{sb}/g ) {     "sentence boundary @ ", pos;     } 

the output notes sentence boundaries @ start , end of text, not after full stops, sentence terminators, or parens:

sentence boundary @ 0 sentence boundary @ 70 

the unicode breaks tester shows them expect them based on tr #29.

i couldn't find non-trivial tests in perl source feature. i'm digesting technical report create appropriate test cases, far looks untested , broken feature.

calle dybedahl's comment gets right (and when turn answer i'll accept that). broken feature in v5.22.0, , far can tell, untested. had issue compiling stuff latest perls last night , ended day question.

the perl5.22.1 perldelta not mention particular changes (and "mention" might strong since merely alludes possible things wrong without enumerating them). mentions incompatible change 5.20.0 (a cut , paste error?), "single" exception, more 1 issue. reference "sane" made me think of changes related panic issue in next subsection. mention of "several bugs" 1 rt.perl.org reference made me think bugs related panic issue.

=head1 incompatible changes

there no changes intentionally incompatible 5.20.0 other following single exception, deemed sensible change make in order new c<\b{wb}> , (in particular) c<\b{sb}> features sane before people decided they're worthless because of bugs in perl 5.22.0 implementation , avoided them in future. if others exist, bugs, , request submit report. see l below.

=head2 bounds checking constructs

several bugs, including segmentation fault, have been fixed bounds checking constructs (introduced in perl 5.22) c<\b{gcb}>, c<\b{sb}>, c<\b{wb}>, c<\b{gcb}>, c<\b{sb}>, , c<\b{wb}>. c<\b{}> ones match empty string; none of c<\b{}> ones do. l<[perl #126319]|https://rt.perl.org/ticket/display.html?id=126319>

additionally, perlrebackslash, new boundaries documented, doesn't mention don't work in v5.22.0.

i disregarded possible fix because of incongruities in perldelta , prior experience i've had new features aren't adequately (or @ all) tested in perl source. prematurely cut off line of investigation , have saved myself couple of hours. it's fault not getting code running on latest binaries, had become fixated on idea doing wrong , code problem. despite numerous past experiences contrary, wasn't entertaining thoughts (other update ucd) perl wrong.

now i'm @ different machine , have working perl-5.22.1, see program works expected in point release. perldelta have been better here.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -