2010-08-18, 20:18
I'm trying to understand how the scrapers work so I'm looking at imdb.xml.
It contains the following code:
From what I've read, the inner RegExp executes before the outer RegExp. (and siblings execute in document order, top-down)
What I don't understand is that the inner RegExp has an input of $$2. Where is that coming from? I thought $$1 was the only valid input and it contained the file name.
Thanks for any help!
It contains the following code:
Code:
<CreateSearchUrl dest="3" SearchStringEncoding="iso-8859-1">
<RegExp input="$$1" output="<url>http://akas.imdb.com/find?s=tt;q=\1$$4</url>" dest="3">
<RegExp input="$$2" output="%20(\1)" dest="4">
<expression clear="yes">(.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>
From what I've read, the inner RegExp executes before the outer RegExp. (and siblings execute in document order, top-down)
What I don't understand is that the inner RegExp has an input of $$2. Where is that coming from? I thought $$1 was the only valid input and it contained the file name.
Thanks for any help!