regex - Trying to use /^\s*$/ match multiple blank lines and replace them failed and get a confusing result -


Pearl version: 5.16.01

I am reading a book about regex which perl 5.8

The book states that s / ^ \ s * $ / blabla / mg can match and replace many blank rows. But when I got acclaimed, I got confusing results.

  code: $ text = "c \ n \ n \ n \ n \ nb"; $ Text = ~ s / ^ \ s * $ / & lt; P & gt; / mg; Print "$ text";   

Here is the result:

  c: \ user \ administrator \ desktop \ regex> perl t2h.pl c & lt ; P & gt; & Lt; P & gt; B   


I want to know that I have a single & lt; P & gt; is not found, but what between two between 'c' and 'b' varies after parallel to / $ / 5.8?

Here's a lesson, beware of regular expression, which will match zero-width pattern, you can get unexpected results Can get it.

We can see here that both of the replacements are showing the premiere, match and post match:

  Use strict; Use warnings; My $ text = "c \ n \ n \ n \ nb"; $ Text = ~ s {^ \ s * $} {printf qq {& lt; "% S" - "% s" - "% s" & gt; \ N}, map s / \ n / \\ n / gr, ($ `, $ & Amp; $ '); "& Lt; p & gt;" } EMG; $ Text = ~ s / \ n / \\ n / g; Print qq {results: "$ text"};   

output & lt; "Prematch" - "Mail" - "Postmatch" & gt; :

  & lt; "C \ n" - "\ n \ n" - "\ nb" & gt; & Lt; "c \ n \ n \ n" - "" - "\ nb" & gt; The result: "C \ n 

Actually, regex positions 2 to 4, captures 2 return characters is. After that replacement, it starts searching from position 4 and matches a zero-width pattern, so a second and lt; P & gt; Adds .

One reason for this is not simple: our regex has <2> on the terms \ n \ n 3 a and lt; P & gt; With however, by default it is believed that (which is ^ special edition) treat the string, because it is any / g regex May not have been replaced by the previous pass. Therefore, when position 4 corresponds to, regex c c \ n & lt; P & gt; Instead of instead of looking at it (shown in our output above), and so forth immediately with a difference between ^ again and $ will not come.

Its solution is by using + in this example instead of zero width pattern + . Another example of

Secondary Example

is the following, simple regex

  my $ text = "caab" ; $ Text = ~ s / a * / & lt; P & gt; / G; Print $ text;   

Output:

  & lt; P & gt; c & lt; P & gt; & Lt; P & gt; B & lt; P & gt; The positional breakdown of this match is as follows:  
  0c - a zero-width pattern 1a - match a 2-character pattern 2 to 3 B - A zero-width pattern 4 $ - corresponds to a zero-width pattern   

Therefore, the last lesson is only to beware of regexes which will match the zero-width pattern.

Comments

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -