2012-11-18, 20:34
Hi.
I wrote the Filmdelta scraper some years ago. To be perfectly honest I don't remember much of how it's working or scraper developement at all. Anyway it hasn't been fetching any images at all for some time now and I thought it might be a good idea to fix it up before Frodo release. I hope there is some scraper guru in here that can explain to me what's going wrong...
What happens is it first tries to fetch the image from themoviedb, using the function GetTMDBThumbsById, sending it the Google search for the original title on imdb. I don't know, has that function changed in any way or is it still supposed to work the same way as it did? The regexp for getting the original title from filmdelta still works.
Anyway, the second thing it does is to call GetFilmdeltaThumb which is my fallback function in case GetTMDBThumbsById doesn't return anything. It tries fetching the low resolution image that's on filmdelta. It doesn't work either and I suspect it's because filmdelta changed the css style tag for the image. The scraper style='width:px' but the page says style='max-width: 240px; max-height: 240px;'. I don't know if that's true for all films though. Would the best practice here be to go for a wildcard instead? I'm thinking something like style='[^']*'
/Daniel
I wrote the Filmdelta scraper some years ago. To be perfectly honest I don't remember much of how it's working or scraper developement at all. Anyway it hasn't been fetching any images at all for some time now and I thought it might be a good idea to fix it up before Frodo release. I hope there is some scraper guru in here that can explain to me what's going wrong...
What happens is it first tries to fetch the image from themoviedb, using the function GetTMDBThumbsById, sending it the Google search for the original title on imdb. I don't know, has that function changed in any way or is it still supposed to work the same way as it did? The regexp for getting the original title from filmdelta still works.
Anyway, the second thing it does is to call GetFilmdeltaThumb which is my fallback function in case GetTMDBThumbsById doesn't return anything. It tries fetching the low resolution image that's on filmdelta. It doesn't work either and I suspect it's because filmdelta changed the css style tag for the image. The scraper style='width:px' but the page says style='max-width: 240px; max-height: 240px;'. I don't know if that's true for all films though. Would the best practice here be to go for a wildcard instead? I'm thinking something like style='[^']*'
/Daniel