Hello.
I'm trying to improve upon the IMDB movie scraper. It seems that IMDB sometimes puts the MPAA rating at the top, sometimes in the middle, sometimes both places, and sometimes neither place.
The current scraper only collects info from the middle of the html.
So I modded metadata.common.imdb.com/imdb.xml
Code:
<ParseIMDBUSACert dest="5">
<RegExp input="$$2" output="<details>\1</details>" dest="5">
<RegExp input="$$1" output="<mpaa>$INFO[certprefix]\1</mpaa>" dest="2">
<expression>class="absmiddle" title="([^"]*)"</expression>
</RegExp>
<RegExp input="$$1" output="<mpaa>$INFO[certprefix]\1</mpaa>" dest="2">
<expression>MPAA</a>\)</h4>\n?<span itemprop="contentRating">Rated\s([^<]*)</expression>
</RegExp>
<expression noclean="1" />
</RegExp>
</ParseIMDBUSACert>
This works Ok. Except for ratings like PG-13 where the top html always uses an underscore instead of a dash "PG_13". To keep things consistent, I would like to change the underscore to a dash.
Example movie: Bird on a WIre (1990)
http://www.imdb.com/title/tt0099141/
A regex search & replace would be easy enough "s/_/-/"
Is it possible to do this in XBMC?
If not, then I can break the input using the regex below (changed for readability), but I am not sure how to reassemble without a conditional.
Code:
class="absmiddle" title="([^_"]*)(_*)([^"]*)"
PG_13 matches as (PG)(_)(13) . So I could use \1-\3 with output PG-13, but
PG matches as (PG)()() and would end up as PG-
How can the <regexp> <expression>... be written to change underscore to dash in PG_13, NC_17, etc, but leave others alone G,PG,R, et al.
Any help would be appreciated.
At2010