Little help with GetSearchResults?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #1
I'm no scraper developer or regex shark by far, so I need alittle help Sad

From the TMDB scraper:

Regex:
<movie>.*?<title>([^<]*)</title>.*?<id>([^<]*)</id>.*?</movie>

XML:
<results for="terminator" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<moviematches>
<movie>
<score>1.0</score>
<popularity>45</popularity>
<title>Terminator 2: Judgment Day</title>
<alternative_title>Terminator 2</alternative_title>
<type>movie</type>
<id>280</id>
<imdb>tt0103064</imdb>
<url>http://www.themoviedb.org/movie/280</url>
<short_overview>It has been ten long years since a Terminator failed to kill Sarah Connor and her unborn son, John. Now, Skynet has sent back another Terminator. This one being more advanced than the last one. John Connor, who is now ten years old, is the target. The future John sends back a replica of the Terminator that tried to kill him back in time to 1995. It's Terminator vs. Terminator.</short_overview>
<release>1991-07-03</release>
</movie>
</moviematches>
</results>

Here is my XML:

<response>
<titles>
<title id="1c5f16f0-d5f3-4107-bcb6-007681e76be1" type="DVD" country="United Kingdom" barcode="5014138026448" title="Rosemary Conley - Shape Up And Salsacise" edition="" year="2005" thumbnail="http://fs.luckydata.com/Covers/2e92c6ce-378c-4535-a938-2c20d0b2b517.jpg" thumbnailwidth="97" thumbnailheight="140" completepercentage="100" />
</titles>
</response>

Now I need the "id" and the "title" for my search result, just like TMDB.

Here's what I have so far (non-working):

<title id="([^<]*)" type=.*?title="([^<]*)" edition=


Can anyone help me out here Smile
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #2
that's fine except you need to escape those quotes; i.e.

Code:
<title id=&quot;([^&quot;]*)&quot; type=.*?title="([^&quot;]*)&quot; edition=
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #3
Thanks spiff! Scraper Tester still fails though:

System.Xml.XmlException: '=' is an unexpected token. The expected token is ';'. Line 1, position 111.

XBMC crashes when scraper is used! Need some exception handling somewhere in there Wink

Perhaps '=' needs escaping also?
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #4
id="([^<]*)" type=.*?title="([^<]*)" edition

works in a regex tester and gives me the correct results. however it still crashes both Scraper tester and XBMC

<title id=&quot;([^<]*)&quot; type=.*?title=&quot;([^<]*)&quot; edition=

does not give any results at all.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #5
well of course it works in the regexp tester - you're not writing xml there! i forgot to escape the first <
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #6
I'm using a XML editor so don't think I need those quotes at all. Something else is wrong here
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #7
RegEx:
id="(.*?)" type=.*?title="(.*?)" edition

Against XML:

...
<title id="bd0798c5-9e9f-4fdf-86d4-d2bcaf936cf6" type="Blu-ray" country="United States" barcode="027616151285" title="Terminator" edition="" year="1984" thumbnail="http://fs.luckydata.com/Covers/19e4c5f1-cf41-41a1-a4a1-a7c77ea34737.jpg" thumbnailwidth="100" thumbnailheight="115" completepercentage="100" />
...

Results in:

System.Xml.XmlException: '=' is an unexpected token. The expected token is ';'. Line 1, position 111.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos)
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.FinishPartialValue()
at System.Xml.XmlTextReaderImpl.get_Value()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at ScraperXML.ScraperParser.GetSearchResults(String strUrl)


Scraper XML:

...
<GetSearchResults dest="8">
<RegExp input="$$3" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;id&gt;\2&lt;/id&gt;&lt;url&gt;https://somewhere.com/Default.aspx?command=LoadTitle&amp;titleid=\2&amp;username=$INFO[username]&amp;password=$INFO[password]&amp;locale=1033&lt;/url&gt;&lt;/entity&gt;" dest="3">
<expression repeat="yes">id=&quot;(.*?)&quot; type=.*?title=&quot;(.*?)&quot; edition</expression>
</RegExp>
<expression noclean="1" />
</RegExp>
</GetSearchResults>
...

It's beyond me why it does not work in the tester. Now it doesn't bring down XBMC anymore but does not show any results either (did a new SVN build which might have helped also since it now saves settings also).

Can anyone spot the problem? Tnx!
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #8
I'm pretty sure that the regex works now and problem is elsewhere in the XML result or in my scraper. Sadly I cannot debug the application as it's written in VB

Scraper XML is valid and GetSearchResults seems correct to me Sad
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #9
Found the problem... I'm a XML noob Smile

XBMC supports https right? because do not get any results within XBMC. Now scraper works fine in ScraperXMLTest
find quote