[RELEASE] EPG Scraper for Switzerland, Germany, and Austria
#16
There is a new release 0.9J for the epg-script

http://code.google.com/p/epg-swiss/downl...z&can=2&q=

Note :
The ID-has changed inside the script.
Now the script produce 100 % conform xml
Reply
#17
If you get the latest svn trunk it is possible to run this script

- On Ubuntu 10.04. LTS
- Debian 6.0
- allmost all other linux
- Mac OS X Snow Leopard

The support for Mac OS was added by Thomas Oeding (thnx)

Regards
Hans
Reply
#18
This grabber is great.

I found a little bug concerning program time timezone ( so that the epg was off by 1 hour in tvtime & gtvg ).

You just set the timezone to +0100 in line 392 which is wrong since those times are CET summer time by now which is UTC +0200 :

Code:
filler="00 +0100" # CET

Timezone actually depends on sumer / winter time for CET as can be read in the output of the tv_grab_eu_epgdata grabber :

http://www.linuxforen.de/forums/showthread.php?t=254468 :

Quote:Enter the time offset from UTC here. Think of it as your time zone. For example: during winter in Germany, you should enter "+0100". During summer, use "+0200". (without quotation marks)

Maybe it would be possible to set the timezone to winter / summer time accordingly. Or maybe even better all program times could be converted to UTC ( time->utc +0000 ) so that no summer / winter time conversion would be needed at all.
( I am currently working on the implementation )

Another suggestion : It would be nice if there was a way to manually set channel id and name for all channels, e.g. by using a channelname[:channelid] structure in setup.cfg ( :channelid being optional ).

Especially the channel ids are very uncommon right now ( e.g. the channel id for zdf should be zdf.de not 2.tv.search.ch ). Right now I grep threw the database to replace channel ids.
Reply
#19
I found an easy solution to the timezone problem :

Replace

Code:
filler="00 +0100" # CET

with

Code:
filler="00 `TZ=Europe/Zurich date +%z`" # Current epg server timezone

It will set the timezone according to the current timezone at server location which is the timezone of the program times of the website ( +0100 at winter time, +0200 at summer time ) and it works no matter from where in the world you execute it.
Reply
#20
Thanks for the feedback. I worked over 6 weeks for the initial code release.

As soon I have time I do integrate your little patch into svn. Yes, I know the generated
id's are not the same id's that xmltv provides but I do work with channel-nummbers from 1-270 and not with the name.
You may have a patch to make the id's more xmltv compatible ?
I guess you came from german speaking part of europe ?

Regards from switzerland
Hans
Reply
#21
it is done ....
do get this version you have to checkout trunk over svn
Regards
Hans
Reply
#22
This fix with the timezone is working but has 2 little disadvantages.
If you start 1 day prior to the time-change all following data inside the xml
have a wrong timezone entry.
Regards Hans
Reply
#23
Yes, I thought about that as well. To manage an update that will cross winter / summer timezone change requires a more complex behavior ( since the times given after that change are wrong at that time ). I will take a look if I find a good solution.

Grüsse aus Deutschland.
Reply
#24
OK, I think this will work :

Code:
filler="00 `TZ=Europe/Zurich date +%z -d $year$month$day`" # Server timezone at program date

It will output the timezone in Zürich at the program date. Well - I think it will be problematic anyway since most xmltv viewer software probably will not respect the timezone change ( since they will apply the given timezone after the timezone change to the current timezone - which is wrong ). Most software which shows the data before the timezone change will display the times for programs after the timezone change wrong. But thats a bug in xmltv viewer softwares ( and would happen with the times given in UTC as well ).

A really complex topic considering the fact that DST is useless nowadays...
Reply
#25
OK I made a new release 0.9K
http://epg-swiss.googlecode.com/files/ep....9K.tar.gz
It is not longer beta .... it is RC 1
Reply
#26
There is a heavy bug in the script. Programs at night ( after 0:00 a clock ) have a wrong date, e.g. :

Code:
<programme start="[b]20100831[/b]010500 +0200" channel="zdf.de">
    <title lang="de">Spiel mit der Angst</title>
    <category lang="de">thriller</category>
  </programme>

actually should be

Code:
... start="[b]20100901[/b]010500 +0200" ...

The data you parse shows the program for a given date from 6:00 to 5:59 but you set the date for all programs to the same date which will result in a wrong date for the programs from 0:00 to 05:59. You need to check the program time and adjust the date for programs beyond midnight. I did that change as follows ( starting at line 469 ) :

Code:
y=$($GNU_CAT f1 | $GNU_WC -l )
         z=0
[b]... filler removed ...[/b]
         while [ $z -lt "$y" ]
         do
            z=$(( $z + 1 ))

            # Extract title

            title=$($GNU_CAT f2 | $GNU_SED $z,$z!d | $GNU_SED 's/&/&amp;/g')

            # Extract time

            time1=$($GNU_CAT f1 | $GNU_SED $z,$z!d | $GNU_SED "s/:/ /g" | $GNU_SED "s/ //g")

[b]            # Set program date - program with a start time before 06:00 is part of the next day

            if [ "$time1" -lt "0600" ]; then
                programdate=`TZ=Europe/Zurich $GNU_DATE +%Y%m%d --date="$year$month$day +1 day"`
            else
                programdate=`TZ=Europe/Zurich $GNU_DATE +%Y%m%d --date=$year$month$day`
            fi

            # Set server timezone at program date

            filler="00 `TZ=Europe/Zurich $GNU_DATE +%z --date=$programdate`"
[/b]
            # Extract description

            desc=$($GNU_CAT f3 | $GNU_SED $z,$z!d)

            echo '<programme start="'[b]$programdate[/b]$time1$filler'" channel="'$channel_index'">' >> ../xml/epg.xml
            echo '     <title lang="de">'$title'</title>' >> ../xml/epg.xml
            echo '     <category lang="de">'$desc'</category>' >> ../xml/epg.xml
            echo "</programme>" >> ../xml/epg.xml
         done

Program dates after midnight are correct with those changes. I can supply you with a patch against 0.9k if you prefer that.

BTW the website itself shows the wrong date too for shows between 0:00 and 5:59. You may want to check that I am right by comparing some shows, e.g. :

"Spiel mit der Angst"

http://tv.search.ch/programm/detail/inde...9010105002
http://www.prisma.de/film/2006_spiel_mit...sehen.html

Actually its not unusual in tv guides that a day ranges from 0600 to 0600 but its totally wrong within a xmltv database where a day ranges from 0:00 to 0:00.
Reply
#27
Another bug shows up if the script is being run at night ( 0:00 - 5:59 ). epg-swiss fails to grab the whole data for the current day since the data between 0:00 - 5:59 is part of the previous day. This leads to missing data if the script is being run at night. Maybe grabbing should always start at the day before the current day to ensure that the data is complete. E.g. if epg-swiss is run at 31.08. 04:00 grabbing should start at 30.08. since the program for 31.08. 04:00 is part of that day.
Reply
#28
OK I allways do scrap on 08.00 am and I had never a problem.
Could you may please send you diff against the current version to the
bug-tracking system ?
Thanks anayway :_)
Hans
Reply
#29
I could create a warning inside the script if it is running between 00:00 and 06:00 or set
the day to grab not with index 0
I release 0.9L soon ;-)
Reply
#30
As I wrote - and explained - in http://forum.xbmc.org/showpost.php?p=594...stcount=26 data for programs with start times between 0:00 and 5:59 all have a wrong date with your current version - no matter when you run the script. That bug is not related to the second one about runtime at night.
Reply

Logout Mark Read Team Forum Stats Members Help
[RELEASE] EPG Scraper for Switzerland, Germany, and Austria0