Little help with GetSearchResults?
#1
I'm no scraper developer or regex shark by far, so I need alittle help Sad

From the TMDB scraper:

Regex:
<movie>.*?<title>([^<]*)</title>.*?<id>([^<]*)</id>.*?</movie>

XML:
<results for="terminator" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<moviematches>
<movie>
<score>1.0</score>
<popularity>45</popularity>
<title>Terminator 2: Judgment Day</title>
<alternative_title>Terminator 2</alternative_title>
<type>movie</type>
<id>280</id>
<imdb>tt0103064</imdb>
<url>http://www.themoviedb.org/movie/280</url>
<short_overview>It has been ten long years since a Terminator failed to kill Sarah Connor and her unborn son, John. Now, Skynet has sent back another Terminator. This one being more advanced than the last one. John Connor, who is now ten years old, is the target. The future John sends back a replica of the Terminator that tried to kill him back in time to 1995. It's Terminator vs. Terminator.</short_overview>
<release>1991-07-03</release>
</movie>
</moviematches>
</results>

Here is my XML:

<response>
<titles>
<title id="1c5f16f0-d5f3-4107-bcb6-007681e76be1" type="DVD" country="United Kingdom" barcode="5014138026448" title="Rosemary Conley - Shape Up And Salsacise" edition="" year="2005" thumbnail="http://fs.luckydata.com/Covers/2e92c6ce-378c-4535-a938-2c20d0b2b517.jpg" thumbnailwidth="97" thumbnailheight="140" completepercentage="100" />
</titles>
</response>

Now I need the "id" and the "title" for my search result, just like TMDB.

Here's what I have so far (non-working):

<title id="([^<]*)" type=.*?title="([^<]*)" edition=


Can anyone help me out here Smile
Reply
#2
that's fine except you need to escape those quotes; i.e.

Code:
<title id=&quot;([^&quot;]*)&quot; type=.*?title="([^&quot;]*)&quot; edition=
Reply
#3
Thanks spiff! Scraper Tester still fails though:

System.Xml.XmlException: '=' is an unexpected token. The expected token is ';'. Line 1, position 111.

XBMC crashes when scraper is used! Need some exception handling somewhere in there Wink

Perhaps '=' needs escaping also?
Reply
#4
id="([^<]*)" type=.*?title="([^<]*)" edition

works in a regex tester and gives me the correct results. however it still crashes both Scraper tester and XBMC

<title id=&quot;([^<]*)&quot; type=.*?title=&quot;([^<]*)&quot; edition=

does not give any results at all.
Reply
#5
well of course it works in the regexp tester - you're not writing xml there! i forgot to escape the first <
Reply
#6
I'm using a XML editor so don't think I need those quotes at all. Something else is wrong here
Reply
#7
RegEx:
id="(.*?)" type=.*?title="(.*?)" edition

Against XML:

...
<title id="bd0798c5-9e9f-4fdf-86d4-d2bcaf936cf6" type="Blu-ray" country="United States" barcode="027616151285" title="Terminator" edition="" year="1984" thumbnail="http://fs.luckydata.com/Covers/19e4c5f1-cf41-41a1-a4a1-a7c77ea34737.jpg" thumbnailwidth="100" thumbnailheight="115" completepercentage="100" />
...

Results in:

System.Xml.XmlException: '=' is an unexpected token. The expected token is ';'. Line 1, position 111.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos)
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.FinishPartialValue()
at System.Xml.XmlTextReaderImpl.get_Value()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at ScraperXML.ScraperParser.GetSearchResults(String strUrl)


Scraper XML:

...
<GetSearchResults dest="8">
<RegExp input="$$3" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;id&gt;\2&lt;/id&gt;&lt;url&gt;https://somewhere.com/Default.aspx?command=LoadTitle&amp;titleid=\2&amp;username=$INFO[username]&amp;password=$INFO[password]&amp;locale=1033&lt;/url&gt;&lt;/entity&gt;" dest="3">
<expression repeat="yes">id=&quot;(.*?)&quot; type=.*?title=&quot;(.*?)&quot; edition</expression>
</RegExp>
<expression noclean="1" />
</RegExp>
</GetSearchResults>
...

It's beyond me why it does not work in the tester. Now it doesn't bring down XBMC anymore but does not show any results either (did a new SVN build which might have helped also since it now saves settings also).

Can anyone spot the problem? Tnx!
Reply
#8
I'm pretty sure that the regex works now and problem is elsewhere in the XML result or in my scraper. Sadly I cannot debug the application as it's written in VB

Scraper XML is valid and GetSearchResults seems correct to me Sad
Reply
#9
Found the problem... I'm a XML noob Smile

XBMC supports https right? because do not get any results within XBMC. Now scraper works fine in ScraperXMLTest
Reply

Logout Mark Read Team Forum Stats Members Help
Little help with GetSearchResults?0