Login at Kodi Home

flobbes · 2009-08-24, 14:28

I wanted to write a scraper for a site and read a bit about how it works. Then i wanted to get started and realised that the site hides the search url and just shows http://www.url.com/SEARCH for every search.

Is it still possible to write a scraper for the site or does this already inhibit the possibility of implementing a scraper?

**spiff** · 2009-08-24, 14:34

as long as you can perform the search and each entry has a separate url, the actual search url is irrelevant. look at the html and the form..

flobbes · 2009-08-25, 13:31

So i got the GetSearchResults Working as well, now I just cant figure out how I can trigger the search without hitting the button on the page.

Wireshark shows me:

Code:
Hypertext Transfer Protocol

POST /film/list/dvd/SRCH.htm HTTP/1.1\r\n

Request Method: POST

Request URI: /film/list/dvd/SRCH.htm

Request Version: HTTP/1.1

and

Code:
Line-based text data: application/x-www-form-urlencoded

SRCHTYP=TITLE&SRCHSTR=my+Search

Is it possible to create a legit search with this information or what else could I do the official guides are quite short about this part and use trivial examples.

I would be really thankful if somebody had an idea so I could finally finish my first scraper Smile

**spiff** · 2009-08-25, 13:37

yes, its just a post field. see e.g. the allmusic scraper

flobbes · 2009-08-25, 13:42

Great i found it. Thanks a lot!

Just had to add SRCHTYP=TITLE&SRCHSTR=my+Search with in questionmark in front to the search url.

flobbes · 2009-08-25, 14:05

Thanks a lot spiff for your help and thanks to Nicezia for his really nice Editor.

When i started i yesterday didn't really expect to get it up and running today.

Its really nice to implement for beginners.

**spiff** · 2009-08-25, 14:08

nice. lovely to see the editor being useful for folks Smile

flobbes · (This post was last modified: 2009-08-25, 14:32 by flobbes.)

There are still several fields in the editor where I couldn't figure out what they are for (Trim, Inverse from "Expression" and "PostData", "Cache as" from the Tester), but I just copied the settings from existing scrapers and it worked without any problems.

Really nice piece of work.

**spiff** · 2009-08-25, 14:27

trim removes whitespace on the boundaries of strings. no idea about that inverse. and cache is useful if you want to run several functions on one page. the parameter is a local file which we cache the url to, hence avoiding refetching it when you run the second function

flobbes · 2009-08-25, 14:35

Ah ok thanks, maybe I'll use it for my next scraper Smile

I have one last issue the page has small thumbnails and sometimes big ones.

Is it possible to select the big one but have a fallback to the smaller one if it doesnt exist?

**spiff** · 2009-08-25, 14:44

add as many <thumb> tags as you see fit. whatever pops up first gets the priority.

flobbes · 2009-08-25, 14:51

Ok then i have to manuely select the lower when, because I have the bigger one first and want to leave it this way so I dont get the smaller Image if a larger one is availible.

Thanks so far, dont want to bother you anymore Wink

**spiff** · 2009-08-25, 14:54

why add the large one if it's not available Huh

flobbes · 2009-08-25, 15:06

If there is no large Image, I see it right away and can select the lower one.

But if I set the priority to the lower ones and a larger is availible, I might not realise it and stick with the one xbmc downloaded for me.

**spiff** · 2009-08-25, 15:14

i mean, if there isn't a large image for the movie available on the site, don't add it. simple as that. if you then process them as

1) large image expression
2) small image expression

large image always gets priority, yet the small image is used if there is no large available.