Duplicate Tvshow Library Entries -- Almost Fixed!
#1
Because there were a rash of reports about duplicate tvshow entries due to the same show being found in many locations, including multiple path sources, I decided to try and remove the reliance on the path for a tvshow entry in the database.

Unfortunately, I spent considerable amount of time on a dead end. It never really worked correctly. I had to keep making more hacks to work around the times when a path was required. So, I decided to dump all that work and rethink it.

My first thought was to add a hash field to the tvshow table and hash all the fields together into some unique id per show. The first issue was that I really didn't want to have to change the schema as its an annoyance. You need to update the database version and have it update itself on next load, etc. The other issue was that its possible that to have the same show in two locations and use two different scrapers, thus potentially making the hash value different. So, I abandoned that idea after getting it partially working.

Last night, while staring at the database in Sqlite Spy after a few beers, I thought why not just key off the title?! It's simple enough. I expect that would always be the same, regardless of the scraper used. So, I spent a few hours and got that working!

Duplicate entries are "stacked" into a single show, using the show info from the first tv show entry in the database. (It's actually the lowest id which is typically the first one added.) The episode counts, and watch counts are just tallied up for the additional entries. Everything else remains the same. Very little code had to change. It was mostly just some nested sql queries. There were no changes to the scanner. The database still has duplicate entries by path, but they are just hidden from the display.

So far, it works perfectly. I can navigate tvshows using any criteria. Seasons gets stacked if episodes are split across paths. I need to do some more testing, but I dont expect anything to break since all the work is done in the database.

The one issue I thought of, because I encountered it myself, was if the scanner incorrectly names a show, it could get stacked into the show that's correctly named. This didn't happen to me while working on this stacking solution, but before. I had episodes from the original Battlestar Galactica and the new BSG both in folders named Battlestar Galactica but they were in different paths. The scanner thought they were the same show. It even skipped the episodes beyond season one for the new BSG as there's only one season of the original. I had to manually refresh the duplicate which was incorrectly titled. (And the only way to tell which was which was to enter one of them and play an episode.) After that it found the rest of the episodes and all was good.

My idea to combat this issue is to make this a setting in video library. "Stack tv shows by Title" or something like that. It'll default to off. The user can then correct any issues, and turn it on.

What's the consensus? Does anyone see an others issues keying the stacking only off the Title, other than the issue I mentioned?
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#2
Very cool, kraqh3d. I don't know enough about the inner workings nor have I used Library Mode enough to predict where problems might lie... but look forward to testing it out at some point.
Reply
#3
thanks for the reply. i think i got this problem nailed but i want to make sure i didn't miss anything, especially something obvious Smile

i still have to add the setting and test it some more before committing it. and, if no one raises any concerns, i'll probably commit it to the linuxport branch this weekend.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#4
Sounds like one of those elegantly simple solutions to the problem, kraqh3d. Smile Might I suggest turning it on by default instead? If people have identically named series they're going to run into minor headaches with the scraper anyway. It might be better to just turn it on and have a wiki article saying that people should name their series according to the site they're scraping. So from my site they'd be "Battlestar Galactica" and "Battlestar Galactica (2003)". Not sure what the potential headaches are with support questions, though.
Reply
#5
kraqh3d:

Your solution sounds like one that should work, though I'm wondering if it could be generalizable to other stuff, but I suspect not.

For instance it's easy to "stack" down the directory retrieved after we get the list back from the db, but harder to then act on the resulting URLs for stuff like videoinfo lookup and so on - I presume this is why you have done it within the SQL?

Mind posting a patch when you get things cleaned up so I can take a look?

I guess the "ideal" is to allow the updating of the path field in the tvshow (i.e. when given a path, check whether there's already a show with a different path and use that, extending the path as necessary, and ofcourse check within the multipaths for the current path) though that may be a little fiddly. If that was done, however, it would also fix the problem I think without having to play with the SQL for GetSeasonsNav and GetEpisodesNav?

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#6
Yes. That's why I did it all within the sql queries. It was so much easier to leave everything else in place, and virtually stack the tvshows together as the items list is generated. It was too difficult to start unravelling all of spiff's work.

And honestly, modifying the queries in the "Nav" functions was rather easy. I just made a helper function which takes the idshow, does a nested query and returns a stringified list of "IN (a,b,c,d)". This also makes it easy to turn this into an option. If the option is enabled call the helper function, otherwise just use "idshow=a" and it'll work exactly like it did before.

I thought about trying to "add" to an existing show of the same title, but then we still have those potential naming issues like I ran into. It's easier to correct them by un-stacking the shows, and doing a refresh.

Once I get the option rolled into this, I'll post a patch on sourceforge so you can take a peek.

** edit **

the genre parsing is annoying me (atleast for tvshows). I'm winding up with things like "action and adventure". Stuff like this should be broken down into two genres. I have not encountered this problem with movies yet.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#7
That's true - it is certainly a lot easier to correct things if the shows are unstackable, though one presumes we could detect this (as we have the multipaths) on a refresh and prompt the user at that point.

I presume you stack also in GetTVShowsNav, or do you just return the one id (returning just one I guess is the way to go?)

Will discuss with cptspiff to see if he has any ideas on this front as well. My primary concern is that there may be a better fix available that could be applied elsewhere when duplicates arise, though as you say it would be a lot more work.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#8
GetTVShowsNav was a special case... I used a map to keep a mapping of title and path. The sql is returned in idshow order, so the first one populates the map. Then if another show title exists in the map, it means its a duplicate so i find the previous item, and update the episode counts and stuff like that.

It;s funny how a few beers make you think more clearly Smile
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#9
http://wiki.xbmc.org/?title=TV_Shows_%28...ng_TV_Show
Quote:The scraper picks the wrong TV Show

Try adding the year within parenthes to the end of the TV Show folder-name, (this might be need for some TV Shows such as "Doctor Who and "Knight Rider" which has multiple entries on a TV metadata database). See the exampel bellow:

\TV Shows\Battlestar Galactica (1978)\Season 1\Battlestar Galactica - S01E01.avi
\TV Shows\Battlestar Galactica (2003)\Season 1\Battlestar Galactica - S01E01.avi
\TV Shows\Doctor Who (1963)\Season 1\Doctor Who - S01E01.avi
\TV Shows\Doctor Who (2005)\Season 1\Doctor Who - S01E01.avi
\TV Shows\Knight Rider (1982)\Season 1\Knight Rider - S01E01.avi
\TV Shows\Knight Rider (2008)\Season 1\Knight Rider - S01E01.avi
Doctor Who and Knight Rider are two more examples

Eek
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#10
I know. I did that afterward in my further testing. But the potential still exists for a poorly named folder to be matched against the wrong title.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#11
kraqh3d Wrote:I know. I did that afterward in my further testing. But the potential still exists for a poorly named folder to be matched against the wrong title.

I'd think that's not a big deal, so long as one can clean it up by renaming the folder(s) properly and rescanning. (i.e. something short of flushing your whole database and starting over)
Reply
#12
jmarshall... i'm confident this patch wont break anything. no logic was changed in the scanner. everything is pretty much self contained to the "nav" functions. i cleaned up the code, and added a gui setting "stack tv shows by title" which defaults to true.

https://sourceforge.net/tracker/index.ph...tid=581840

plooger... if the items are stacked together, it may take a while to realize whats actually missing. if you see the duplicate, it stands out. you dont have to dump the database and start over. you dont even have to change the folder name. if u unstack, and refresh with a manual name, xbmc will correct everything. i just tested this Smile
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#13
Looking forward to your updates.
Reply
#14
Hi kraqh3d,

Nice work. I've taken a quick scan, and I think it's probably going to be the only sane way of handling duplicates for tvshow items at this point without getting rid of the path dependence altogether (or trying to hack in some multipath stuff). The patch looks fine to me in this regard - I'll ask spiff to have a look over it as well.

As for the show level stacking, this could be done in a more generic manner by fetching the list and then doing the stacking, rather than doing it as we go (i.e. a simple variant of CFileItemList::Stack). That way the fanart etc. could be caught from a different paths as well, we can tag as "dupe" via a property, and we can easily adapt it for movies + musicvideos. Having an "IncrementProperty" function might be useful as well by the looks Smile

BTW: Check your mail regarding the trackers.

Cheers,
Jonathan
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#15
Thanks for looking it over so quickly. I was considering something like CFileItemList::StackTvShows() but it just made more sense to do it right within the GetTvShowsNav() function. Though, this is because I first spent some time trying to work out some nested query to do it for me, but I couldn't. In the end, it was much easier to pull out all the rows.

Stacking movies and music videos by title is going to be more difficult. I was thinking about how this could be generalized and so I took a look at how they work. The difference between them and tv shows is those entries are not folders, they are actual playable items. While it's possible to build a stack:// url from the duplicate entries, I think it would take some sophisticated logic to try and figure out the order of the items.

In the meantime, do you see any issue with committing this as-is, for just TvShows?

The other things on my TODO list is some kind of TvShow stacking to handle multi-file episodes (ie s01e01-part1, s01e01-part2), and fixing the genre parsing.

My idea on stacking mult-file episodes is to sort the folder by filename when its scanned, and create a stack within the library while they are added. As the scanner adds files, we can first check to see if the database contains a matching episode from the same path. If so, the file gets appended somehow. I'm thinking that strFileName can be mutated into a stack:// url, and GetEpisodeDetails() will just copy strFileName into strFileNameAndPath instead of building the full path like it does today.

This means that multi-file episodes have to exist in the SAME folder, and they must sort in-order by name, but the naming convention is wide open. And it will not handle duplicates that exist in two different paths.
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply

Logout Mark Read Team Forum Stats Members Help
Duplicate Tvshow Library Entries -- Almost Fixed!0