Developing a Regex for Anime TV Series

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #1
I thought this might be the best place to post this question.

I'm looking to develop a regex or few for the Anime TV Series I have. I plan on doing one for movies and OVAs too, but that's later.

Here is what I have now.
Code:
[-_ .]([0-9]+)[-_ \[\].v(]

I have excluded the "Season" part and just focusing on the file name for right now. This line will get, as far as I understand, almost all of the files I have. I am running into a few files that are giving me trouble. Also i'm not sure if i'm making the line properly since it's also matching the character to the left, and to the right of the string I actually want to match. Here is a list of files that are giving me issues:
Code:
[Bleach-Society]Bleach_-_73-74[Xvid][C03A425E].avi
[Lunar] Bleach - 52-53 [B937F496].avi
[Lunar] Bleach - 67 v2 [A1C97A64].avi
[Lunar] Bleach - 68-69 [C23724B5].avi
[m.3.3.w] Chaos Head - 01v2 (H.264) [094A3E22].mkv
[a4e]Get_Backers_20[divx5.2.1].mkv
[a4e]Get_Backers_21v2[h.264].mkv
[a4e]Mahoromatic_Summer_Special[divx5.1.1].mkv
[B-G_&_w.0.0.f]_Shigofumi_Opening.DVD(H.264_DD2.0)_[91FDE4D2].mkv
[Exiled-Destiny]_Wolfs_Rain_Ep01_(6F7967EA).mkv

In order to test the line I've been using this site here: http://regexpal.com/

The issues i'm having are regarding double episodes, where it doesn't pickup the second episode. The group names that are abbreviated similar to the characters around the episode numbers. Codec names appearing similar to the episodes. And the "Ep" that appears in front of wolf's rain.

I'm very new to regular expression so please forgive my ignorance.
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #2
Ok. This string here:
Code:
[-_ p.](\d{2})[-_ (v.\[]
gets almost all of the the episodes except for these two:
Code:
[Bleach-Society]Bleach_-_73-74[Xvid][C03A425E].avi
[Lunar] Bleach - 52-53 [B937F496].avi
It's not able to get them because it doesn't match the first part "[-_ p.]" with anything because what is before it is the first match. I'm not sure how to go from here.
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #3
Oh man. Totally got it. So exciting.

Here is the string in all it's glory:
Code:
[-_ p.](\d{2})[-_ (v.\[](\d{2})?

This regexp is able to find the episode numbers in 930 files from various release groups and several varying filename formats. I must say that Regex is simply awesome.
(This post was last modified: 2009-02-12 21:38 by Maxim.)
find quote
althekiller Offline
Team-XBMC Developer
Posts: 4,935
Joined: May 2004
Reputation: 12
Post: #4
That last expression group seems to be ignored? You probably want (?:expr) in that case.
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #5
Well, I haven't tested it in XBMC yet. I was using this webpage to act as my test bed:

http://www.fileformat.info/tool/regex.htm

It was able to isolate group 1 which would be the first episode or only episode, and it was able to isolate the group 2 which would be picked up as the second episode in the sequence ##-##.

I'll have to do some testing when I get a chance to see if it actually works in XBMC.

I also used this page here:

http://www.gskinner.com/RegExr/

Which is by far the best regular expression page i've seen.

I'm not familiar with that (?:) sequence, i'll take a look at that also. From what I understand putting the ? makes the preceding token optional.
(This post was last modified: 2009-02-12 22:51 by Maxim.)
find quote
althekiller Offline
Team-XBMC Developer
Posts: 4,935
Joined: May 2004
Reputation: 12
Post: #6
(?:expr) groups expressions but doesn't create an output based on the field. Have a look at the wiki for how we currently handle multiple eps.
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #7
Hmm, after looking at the wiki I seem to have come up confused.

On this page:

http://wiki.xbmc.org/?title=Advancedsett...atching.3E

It has a note regarding multi part episode files:

NOTE: for multi-episode matching to work, there needs to be a third set of parentheses on the end. This part is fed back into the regexp engine.

I wasn't really sure what that meant, but I added a second grouping on, so that with the Season grouping which has been excluded it would make three sets of parentheses, with the second episode in the filename being the third grouping.

However, this theory is limited to just two episode files, three episode files it wouldn't work, or probably not return the same results.

Then there is this page here:

http://wiki.xbmc.org/index.php?title=TV_...t_Episodes

Which describes multi-part episodes and says that the regex will get applied multiple times, but is not clear on whether then "entire" expression gets applied to the same string (filename) or whether just the episode portion does. If the latter is true then how does it differentiate between the season portion, and the episode portion.

The regex given in that section is this:
Code:
[-EeXx]+([0-9]+)
However, testing that expression, with the given criteria on the page displays unexpected results in that the 201 is picked up as an episode where as it should be a season, but then again, that expression doesn't have a Season section just an episode.

All in all i'm pretty confused over the matter, but really won't be able to do anything until I get home and poke around in advancedsettings and the debug log.
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #8
It seems that it's not picking up the regex at all.

I have a DEBUG log here:

http://pastebin.com/m7bb94dff

I can see in the log that it loads advancedsettings.xml properly, however when it gathers the files it doesn't say that it's checking against the regex like i've seen in other log files in other regex threads. An example of what appears to be missing from my log file:

DEBUG: running expression \[[Ss]([0-9]+)\]_\[[Ee]([0-9]+)\]?([^\\/]*)$ on label m:\tv\30 rock\season 01\1 - pilot.avi

Here is the contents of my advancedsettings.xml
Code:
<advancedsettings>
  <loglevel>3</loglevel>

  <tvshowmatching>
    <regexp>Season ([0-9]+)[\\/][-_ p.](\d{2})[-_ (v.\[](\d{2})?[^\\/]*</regexp>
  </tvshowmatching>
</advancedsettings>
(This post was last modified: 2009-02-13 01:40 by Maxim.)
find quote
Maxim Offline
Fan
Posts: 706
Joined: Sep 2004
Reputation: 0
Post: #9
It seems my regex was yet again malformed. The latest incarnation which works with XBMC is:
Code:
Season ([0-9]{2}).*[\\/].*[-_ p.]([0-9]{2})[-_ (v.\[]([0-9]{2})?[^\]\\/]*
(This post was last modified: 2009-02-15 20:53 by Maxim.)
find quote