Unescape url
#1
Hi,

Il need help to parse thumbs url in my scraper: how can I unescape this :

Code:
http:\/\/images.myurl.com\/medias\/nmedia\/18\/35\/35\/91\/18988018.jpg

I can't find a better way than using an '<expression repeat="yes">' to match between backslashes and append the results to a buffer ? Is there a way to do it using only a regexp ?

Regards,

Trois Six
Reply
#2
Something like:
Code:
<RegExp input="$$1" output="\1" dest="2">
    <expression repeat="yes">\\?(.)</expression>
</RegExp>
?

Or is that more-or-less what you're already using? I can't quite tell from your question.

That would be the simplest way I can think of, unless you want to tailor the expression specifically to that specific URL format
(i.e. (http:)\\/\\/(images\.myurl\.com)\\/(..)etc. and output "\1//\2/\3/\4etc.").
Reply
#3
That's what I was thinking about, but I just wanted to known if there was another way.
Reply
#4
I finally did like that :

Code:
<RegExp input="$$6" output="\1" dest="8+">
   <expression noclean="1">([^\\]*)\\/</expression>
</RegExp>
<RegExp input="$$6" output="/\1" dest="8+">
   <expression noclean="1" repeat="yes">\\/([^\\]*)</expression>
</RegExp>

$$6 contains the URL and $$8 is my destination's buffer.
Reply

Logout Mark Read Team Forum Stats Members Help
Unescape url0