Need help with scraper, removing space

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
stacked Offline
Skilled Python Coder
Posts: 787
Joined: Jun 2007
Reputation: 16
Post: #1
Code:
            <RegExp input="$$1" output="&lt;url function=&quot;GetTrailer1&quot;&gt;http://www.site.com/?q=\1&lt;/url&gt;" dest="5+">
                <expression>&lt;title&gt;([^&lt;]*)&lt;/title&gt;</expression>
            </RegExp>

This returns the url with the found expression. eg "http://www.site.com/?q=The Movie". But somehow this url returns an error in curl. If the url was "http://www.site.com/?q=The%20Movie", everything would work. How can I replace all the space found from the expression with %20?

btw, would trim help me here?
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #2
Code:
<RegExp input="$$1" output="\1%20\2" dest="5">
  <expression repeat="yes">(.*?) (.*)</expression>
</RegExp>

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
stacked Offline
Skilled Python Coder
Posts: 787
Joined: Jun 2007
Reputation: 16
Post: #3
I tried that, but it only works on titles with two words. If there are more than two words, the spaces aren't replaced by %20.

btw, is it possible to scan a movie with imdb, then scan it again with another scraper to just add a trailer without replacing any other details?
(This post was last modified: 2009-07-08 20:16 by stacked.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #4
no, but you can add the trailer lookup to the imdb scraper.

you need to massage the expression a bit to accept more than one space, it was just to give you the idea.
Code:
<RegExp input="$$1" output="\1%20\2" dest="5">
  <expression repeat="yes">(.*?) ([^ ]*)</expression>
</RegExp>
or thereabout should do it

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
stacked Offline
Skilled Python Coder
Posts: 787
Joined: Jun 2007
Reputation: 16
Post: #5
thanks. i changed it up a little and got it working.
find quote
AngryFarmer Offline
Member
Posts: 55
Joined: Oct 2009
Reputation: 0
Post: #6
Hey thanks! It helped me a lot !
find quote
UsagiYojimbo Offline
Member
Posts: 83
Joined: Feb 2010
Reputation: 1
Location: Debrecen, Hungary
Post: #7
Isn't there an encode attribute to the RegExp tag, that should do the trick?
(This post was last modified: 2010-06-21 07:12 by UsagiYojimbo.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #8
these days yes. but not back when this topic was alive.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
UsagiYojimbo Offline
Member
Posts: 83
Joined: Feb 2010
Reputation: 1
Location: Debrecen, Hungary
Post: #9
spiff Wrote:these days yes. but not back when this topic was alive.
BTW, nor the \s construct, neither the trim switch match TAB characters... No
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #10
i added \t to trim a few weeks back.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote