How to use "if then else": URL1 = empty > fetch infos from URL2 for

How to use "if then else": URL1 = empty > fetch infos from URL2 for - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: How to use "if then else": URL1 = empty > fetch infos from URL2 for (/showthread.php?tid=75120)

How to use "if then else": URL1 = empty > fetch infos from URL2 for - Eisbahn - 2010-06-04

Hello,

IMDB is a cool DB, but sadly most older movies haven't a german translation. So I would like to use (sometimes) for the plot and plot summary tags another URL. How can this be done in the scraper? Could you give me please a hint?

Regards,

Eisbahn

- spiff - 2010-06-07

for this you use a chain. you create a function to parse the other page;

Code:
<ParseOtherPage dest="3">

  <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">

   ... stuff things into $$5

  </RegExp>

</ParseOtherPage>

then, in the GetDetails function or whatever you call this function (what i called to chain);

Code:
<RegExp input="$$1" output="&lt;url function=&quot;ParseOtherPage&quot;&gt;someurlor\1orwhatever&lt;/url&gt;" dest="5+">

  ...

</RegExp>

- Eisbahn - 2010-06-07

Hi spiff,

don't know if we talked about different things. What I wanted:
- always scrape the german IMDB and search for the plot
- if plot is missing and showing a text like "no plot available, please translate and insert it in our HP" in german IMDB and the user wants to use another URL (and just in this two cases/conditions), scrape it for the plot

=> OK, the decision/asking the user about scraping another URL is no problem and could be done by the "conditional" flag
=> scraping another URL with a function is as well no problem (having done this for other infos just before)

My real question is: how to get the decision: scraped infos from IMDB are not good, use alternative (if user wants to)

Eisbahn

- spiff - 2010-06-08

just grab the plot to a buffer and check if it's bad, if it is, chain.

- Eisbahn - 2010-06-08

Yes, and how can this be done?
Scraper code is for example

Code:
<?xml version="1.0" encoding="utf-8"?>

<scraper framework="11" date="2010-06-07" name="DE_IMDb" content="movies" thumb="imdb.png" language="de">

    <include>common/fetch_other_url.xml</include>

    <GetSettings dest="3">

        <RegExp input="$$5" output="&lt;settings&gt;\1&lt;/settings&gt;" dest="3">

            <RegExp input="$$1" output="&lt;setting label=&quot;if plot is empty, fetch other URL&quot; type=&quot;bool&quot; id=&quot;fetchurl&quot; default=&quot;true&quot;&gt;&lt;/setting&gt;" dest="5">

                <expression/>

            </RegExp>

        </RegExp>

    </GetSettings>

    <NfoUrl dest="3">

        [...] some RegEx [...]

    </NfoUrl>

    <CreateSearchUrl SearchStringEncoding="iso-8859-1" dest="3">

        [...] some RegEx [...]

    </CreateSearchUrl>

    <GetSearchResults dest="8">

        [...] some RegEx [...]

    </GetSearchResults>

    <GetDetails dest="3">

        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">

            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5">

                <expression/>

            </RegExp>

            <RegExp conditional="fetchurl" input="$$2" output="&lt;url function=&quot;FetchPlotFromOtherURL&quot;&gt;$$3&lt;/url&gt;" dest="5+">

                <expression/>

            </RegExp>

            [...] some RegEx [...]

        </RegExp>

    </GetDetails>

</scraper>

But this will fetch the other URL as soon as the user sets "fetchurl" to true. How can I do the check: info in first URL is not good, conditional RegEx with url function should be run?

Eisbahn

- spiff - 2010-06-08

Code:
<RegExp input="$$1" output="\1" dest="6">

  <expression clear="yes">somethingthatgrabstheplot</expression>

</RegExp>

<RegExp input="$$6" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5+">

  <expression>(.+)</expression>

</RegExp>

<RegExp input="$$6" output="&lt;url function=&quot;theotherone&quot;&gt;theotherurl&lt;/url&gt;" dest="5+">

  <expression>^$</expression>

</RegExp>

1) grab plot to a buffer
2) if buffer is nonempty, use as plot
3) if buffer is empty, do the chain.

elementary, dr watson.

- mkortstiege - 2010-06-08

@Eisbahn, some of the german scrapers are already using something like this to determine if there's an imdb id or if we have to use google.

- Eisbahn - 2010-06-12

Hi Spiff,

not so easy for me, ouch...

In the GetDetails function of the scraper:

Code:
        <RegExp input="$$2" output="&lt;url function=&quot;GetIMDBPlot&quot;&gt;$$3plotsummary&lt;/url&gt;" dest="5+">

            <expression/>

        </RegExp>

this is running fine and calls my IMDB func (in common directory):

Code:
<?xml version="1.0" encoding="utf-8"?>

<scraper framework="1,1" date="2010-06-12" name="IMDB Functions" content="movies" language="de">

    <include>ofdb_de.xml</include>

    <GetIMDBPlot dest="5">

        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">

            <RegExp input="$$1" output="\1" dest="2">

                <expression clear="yes">&lt;div id=&quot;swiki.2.1&quot;&gt;\n\n([^\n]+)</expression>

            </RegExp>

            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">

                <expression>(.+)</expression>

            </RegExp>

            <RegExp conditional="getofdbplot" input="$$1" output="\1" dest="4">

                <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>

            </RegExp>

            <RegExp conditional="getofdbplot" input="$$2" output="&lt;url function=&quot;GetOFDBURL&quot;&gt;http://www.imdb.de/title/$$4/&lt;/url&gt;" dest="3">

                <expression>^$</expression>

            </RegExp>

            <expression noclean="1"/>

        </RegExp>

    </GetIMDBPlot>

</scraper>

Ok, if we do not find any plot, have a look at the OFDB site:

Code:
<?xml version="1.0" encoding="utf-8"?>

<scraper framework="11" date="2010-06-12" name="OFDB Functions" content="movies" language="de">

    <GetOFDBURL dest="5">

        <!--<url function="GetOFDBLink">http://www.ofdb.de/view.php?SText=\1&Kat=IMDb&page=suchergebnis</url>-->

        <RegExp input="$$1" output="&lt;plot&gt;OFDB Function&lt;/plot&gt;" dest="5">

            <expression>&lt;link rel=&quot;canonical&quot; href=&quot;http://www.imdb.de/title/([t0-9]*)</expression>

        </RegExp>

    </GetOFDBURL>

    <GetOFDBLink dest="5">

        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBOutTagline&quot;&gt;http://www.ofdb.de/\1&lt;/url&gt;" dest="5">

            <expression>&lt;br&gt;1. &lt;a href=&quot;.*?([^&quot;]+)</expression>

        </RegExp>

    </GetOFDBLink>

    <GetOFDBOutTagline dest="5">

        <RegExp input="$$1" output="&lt;details&gt;&lt;outline&gt;\1&lt;/outline&gt;&lt;tagline&gt;\1&lt;/tagline&gt;&lt;plot&gt;\1&lt;/plot&gt;&lt;/details&gt;" dest="5">

            <expression>&lt;b&gt;Inhalt:&lt;/b&gt;([^&lt;]+)</expression>

        </RegExp>

        <RegExp input="$$1" output="&lt;url function=&quot;GetOFDBPlot&quot;&gt;http://www.ofdb.de/plot/\1&lt;/url&gt;" dest="5+">

            <expression>&lt;a href=&quot;plot/([^&quot;]+)</expression>

        </RegExp>

    </GetOFDBOutTagline>

    <GetOFDBPlot dest="5">

        <RegExp input="$$3" output="&lt;details&gt;\1&lt;/details&gt;" dest="5+">

            <RegExp input="$$1" output="\1" dest="2">

                <expression noclean="1">Eine Inhaltsangabe von(.*)Zur &amp;Uuml;bersichtsseite des Films</expression>

            </RegExp>

            <RegExp input="$$2" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="3">

                <expression noclean="1">&lt;br&gt;([^&lt;]+)(?:&lt;/font&gt;)</expression>

            </RegExp>

            <expression noclean="1"/>

        </RegExp>

    </GetOFDBPlot>

</scraper>

As far as I can see, the OFDB feature has a problem/is never used. It's not a typo at the conditional flags: if I delete them, prblem still exists.
On my paper, pen and mind it works and one URL after the other is fetched and checked by the scraper.
What went wrong?

Regards,

Eisbahn

- Eisbahn - 2010-06-18

error message in log is always (with different IMDB tt-IDs)

Code:
CIMDB::InternalGetDetails: Unable to parse web site [http://www.imdb.de/title/tt0499549/]

What have I tried: put all returns in <detail> tags => problem still exists