Find on Page

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #1
how do i find something in a page , basically im scraping an .nfoview for easynews which has all the information about language DTS, blah blah


but all the nfoviews are different what i would like to do is

search page

find english,german,french, blah blah

if found print results basically its just so people dont click on an mkv and find its in the wrong language


the info view looks similar this

Code:
ÞÛÛ²²         °  ÜÛÛ²²ßÜÛÛ²ÜÛÛÛÛÛ²° ÛÛÛÛ ÞÛÛÝÛÛ²²Ý ÛÛÛÛ ÞÛÛݲÛÛ²± ÜÜÞÜÜþ  °°°°°
ÛÛÛ²²   ßÜÞÛܱ ÞÛÛ²²Ý±²ÛßÛÛÛßÛÛÛ²± ÛÛÛÛÜÛ²ß ÛÛÛ²Ý ÛÛÛÛÜÛÛß  ÛÛÛ²Ü ßßÞ²²° ²²²²²
  ÛÛÛ²²  ÝÛß ß² ÞÛÛ²²Ý²ÛÛ ßßß ÞÛÛ²±°ÛÛ²²°  Ü ÞÛÛ²² ÛÛ²²ÛÛÛÛÛÜÜßÛÛ²²ÜÜ  ßß ÛÛÛÛß
   ÛÛÛ²²Ü ßßܲ²Ý ÛÛÛ²² ßß ÜÜÜ  ßßßß ßßßß ÞÛ²  ßß   ßßßß  ßßÛÛÛÛÛßßÛ²²²²ÛÜÜÛÛÛÝ
    ßÛÛÛ²²ÜÜ  ßß  ÛÛÛ²² °²ß ßÛÛÛÛÜÜÜÝÜÜÜÛ²²±ÛÜÜÜÜÜÛÛÛÜÜÜÜÜ   ßßÛ Ü  ßßß²²²²²²²Ü
      ßßÛÛ²²²²²ÜÜÜßÛÛÛ²² Ûß ßßßÝß  ßß    °      ß     ßß²ÛßÛÜÜÝ Ü²²ÜÜÞܱ      
      Ü  þßßßÛÛ²²²²²ÛÛ²²ÝÝ                               °    ß  ±     °Ü  
     Ü    °²ÜÜÜ  ßßßÛÛ²²ÝÝ  OBEY THE EMPiRE, UNDERLING.          °     ° Ü  
  Ü²ß      ÛßÞÜÛ²°  ÛÛÛÛÛÝ                                       °        ß²Ü
ÞÛÝÜ   Ü  ²  Û°°  ßßß                                                    ÜÞÛÝ
  ßÛÝ  ß²ß °  Û                  Real Steel (2011)                        ÞÛß
ܲÜÛß Ü       ²                                                         Ü ßÛܲÜ
ß ÛÛÜ   þ    °                                                      þ   ÜÛÛ ß
   ÛßÛ Ü           Release date .............: 27.03.2012              Ü ÛßÛ
   Û°ÛÛÝ           BluRay date ..............: 12.04.2012              ÞÛÛ°Û
   Û±Û߲ܠ         Cinema date ..............: 03.11.2011             ܲßÛ±Û
   Û²Û Ü ß         Runtime ..................: 127 minutes           ß Ü Û²Û
   Û²Û߲ߠ         Genre ....................: Action                 ß²ßÛ²Û
   ÛÛÛÜ            Subtitles.................: German, English VOB      ÜÛÛÛ
   ÛÛÛ ÜÜ          Source ...................: BluRay                 ÜÜ ÛÛÛ
   ÛÛÛ  ß²²Ü       Format ...................: x264                Ü²²ß  ÛÛÛ
   ÛÛÛ  ß²²Ü       Video ....................: [ ] Untouched       ܲ²ß  ÛÛÛ
   ÛÛÛ  ß²²Ü                                   [X] Reencoded       ܲ²ß  ÛÛÛ
   ÛÛÛ ßÜÞÛÛ       Language .................: [ ] German DD 5.1   ÛÛÝÜß ÛÛÛ
   ÛÛÛ  Þ²Ûß                                   [X] German DTS      ÛÛÝÜß ÛÛÛ
   ÛÛÛ  Þ²Ûß                                   [ ] Englisch DD 5.1 ÛÛÝÜß ÛÛÛ
   ÛÛÛ  Þ²Ûß                                   [X] Englisch DTS    ÛÛÝÜß ÛÛÛ
   ÛÛÛ  Þ²Ûß       Extras ...................: [ ] Untouched       ß۲ݠ ÛÛÛ
   ÛÛÛ  Þ²Ûß                                   [X] None            ß۲ݠ ÛÛÛ
   ÛÜÛ  þ          Disks ....................: 53 * 100 MB            þ  ÛÜÛ
   ÛÛßÜß                                                               ßÜßÛÛ
   ÛÛÛÝ   Ü        iMDB......................: 7.2/10 (81549)        Ü  ÞÛÛÛ
   ÛÛÛ   Ü         http://www.imdb.com/title/tt0433035/              Ü   ÛÛÛ
   ÞÛÛÜÛß Ü                                                         Ü ßÛÜÛÛÝ
    ß²²Ý ²Ý                                                         Þ² Þ²²ß
      ßÛÜßÛÜ           Ü                               Ü           ÜÛßÜÛß
    ÜßÞÛÝþ ²ßÜ        Ü    °°          Ü          °°    Ü        Üß² þÞÛÝßÜ
   ÞÝ  ß²Ü °       ܲߠ  ß ²²    ÜßÜ  ß±ß  ÜßÜ    ²² ß   ߲ܠ      ° ܲߠ ÞÝ
    ÛÜ    ß ÜÜ°   ÞÛÝÜ Ü²Ü ÞÛÝ    ßÜ   Ü   Üß    ÞÛÝ Ü²Ü ÜÞÛÝ   °ÜÜ ß    ÜÛ
    ²  ß þ     ßßßÞ߲ݠ ß Ü ß²ÜÜ   ÞÛÜÛÛ²ÜÛÝ   ÜÜ²ß Ü ß  Þ²ßÝßßß     þ ß  ²
    ±          Ü  Þ  ß Ü      ßß²²Û ÛÛ²²²²Û Û²²ßß      Ü ß  Ý  Ü          ±
    ° Ü      ÜÜ   ß       þ       ²ÜÜÜÜÜÜÜÜܲ       þ       ß   ÜÜ      Ü °
     Ü    Ü²²ß                                                   ß²²Ü    Ü
  Ü²ß    ÞÛÛÝÜß                                                 ßÜÞÛÛÝ    ß²Ü
ÞÛÝÜ     ß۲ݠ               M O V I E   P L O T                Þ²Ûß     ÜÞÛÝ
ÞßÛÝ  °°    ß Ü                                               Ü ß    °°  ÞÛßÝ
ÞÜ°ß Ü                                                                 Ü ß°ÜÝ
ßÛ߲ܠ  þ                                                           þ   ܲßÛß
  Ü ܲ Ü                                                               Ü ²Ü Ü
  ÛÛÛÛ Ü ß           http://www.cinefacts.de/blu-ray-film/           ß Ü ÛÛÛÛ
  ÛÛÛÛ߲ܠ           69522-real-steel.html                            Ü²ßÛÛÛÛ
  ÛÛÛÛ Ü ß                                                           ß Ü ÛÛÛÛ
  ÛÛÛÛ߲ߠ                                                            ß²ßÛÛÛÛ
  ÛÛÛÛ                                                                   ÛÛÛÛ
  ÛÛÛÛ                                                                   ÛÛÛÛ
  ÛÛÛÛ                              I.N.F.O                              ÛÛÛÛ
  ÛÛÛÛ                                                                   ÛÛÛÛ
  ÛÛÛÛ                       1280x544 @ crf20 (2533kbps)                 ÛÛÛÛ
  ÞÛÛÜ    Ü                  German DTS @ 1509 kbps                  Ü    ÜÛÛÝ
   ß²²Ý  Ü                   English DTS @ 1509 kbps                  Ü  Þ²²ß


also like this
Code:
*******************************************************************************
                                  Real Steel
*******************************************************************************

-------------------------------------------------------------------------------
                              General Information
-------------------------------------------------------------------------------
Type.................: Movie
Platform.............: windows vista
Part Size............: 200,000,000 bytes
Compression Format...: RAR
File Validation......: SFV

Year.................: 12
Type.................: German
Duration.............: 122
Cover(s) Included....: Yes

Audio Format.........: Dolby Digital
Encoder..............: AC3 5.1
Bitrate..............: 256
Hz...................: 48,000
Channels.............: 5,1
Source...............: DVDRip

Video Format.........: MKV   Xvid
Video Bitrate........: 2500Kbps
Resolution...........: 1280x720
FPS..................: 29,97
Source...............: DVD 16x9
Original Format......: PAL
Genre................: Action/Abenteuer
IMDb Rating..........: 9.5

-------------------------------------------------------------------------------
                               Post Information
-------------------------------------------------------------------------------
Posted by............: Gollum fuer usenet
Posted on............: 22.04.2012


-------------------------------------------------------------------------------
Generated with Cool NFO Creator - http://fly.to/coolbeans
-------------------------------------------------------------------------------
find quote
sphere Offline
Team-XBMC Member
Posts: 1,179
Joined: Jul 2009
Reputation: 49
Location: Germany
Post: #2
There are different ways, best would be to use regular expressions because then you can search case insensitive.

Code:
import re

text = ' fooo asasa germAn baar'  # replace with your nfoview content

if re.search('german', text, re.IGNORECASE):
    print 'german found'
if re.search('english', text, re.IGNORECASE):
    print 'english found'

regards,
sphere

My GitHub. My Add-ons:
[Image: IOoywq0.jpg]
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #3
if i do this

Code:
import re

text = ' fooo asasa germAn baar'  

try:
    if re.search('english', text, re.IGNORECASE):
        re.search= 'english found'
except:
        re.search='hello'
print re.search


i get this error
Code:
<function search at 0x01DAFC70>

obviously because i hasnt found it but shouldn't it print 'hello'

because if i do this
Code:
import re

text = ' fooo asasa germAn baar'  

try:
    if re.search('german', text, re.IGNORECASE):
        re.search= 'german found'
except:
        re.search='hello'
print re.search

it prints german found
(This post was last modified: 2012-08-10 17:01 by mikey1234.)
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #4
done it

Code:
import re

text = ' fooo asasa english baar'  


if re.search('german', text, re.IGNORECASE):
    print 'german found'
if not re.search('german', text, re.IGNORECASE):
    print 'hello'
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #5
if i do this everything is ok

Code:
import re

link = ' fooo asasa english baar french 5.1 DtS'

if re.search('Engli', link, re.IGNORECASE):
     print 'English Audio'
if not re.search('Engli', link, re.IGNORECASE):
     print ''    
if re.search('german', link, re.IGNORECASE):
     print 'German Audio'
if not re.search('german', link, re.IGNORECASE):
     print ''    
if re.search('Deuts', link, re.IGNORECASE):
     print 'German Audio'
if not re.search('Deuts', link, re.IGNORECASE):
     print ''          
if re.search('french', link, re.IGNORECASE):
     print 'French Audio'
if not re.search('french', link, re.IGNORECASE):
     print ''    
if re.search('turk', link, re.IGNORECASE):
     print 'Turkish Audio'
if not re.search('turk', link, re.IGNORECASE):
     print ''          
if re.search('DTS', link, re.IGNORECASE):
     print 'DTS'
if not re.search('DTS', link, re.IGNORECASE):
     print ''    
if re.search('DD', link,):
     print 'DD'
if not re.search('DD', link,):
     print ''          
if re.search('AC3', link, re.IGNORECASE):
     print 'AC3'
if not re.search('AC3', link, re.IGNORECASE):
     print ''    
if re.search('5.1', link, re.IGNORECASE):
     print '5.1 Surround Sound'
if not re.search('5.1', link, re.IGNORECASE):
     print ''

but when doing this
Code:
import re

link = ' fooo asasa english baar french 5.1 DtS'

if re.search('Engli', link, re.IGNORECASE):
     re.search= 'English Audio'
if not re.search('Engli', link, re.IGNORECASE):
     re.search= ''    
if re.search('german', link, re.IGNORECASE):
     re.search= 'German Audio'
if not re.search('german', link, re.IGNORECASE):
     re.search= ''    
if re.search('Deuts', link, re.IGNORECASE):
     re.search= 'German Audio'
if not re.search('Deuts', link, re.IGNORECASE):
     re.search= ''          
if re.search('french', link, re.IGNORECASE):
     re.search= 'French Audio'
if not re.search('french', link, re.IGNORECASE):
     re.search= ''    
if re.search('turk', link, re.IGNORECASE):
     re.search= 'Turkish Audio'
if not re.search('turk', link, re.IGNORECASE):
     re.search= ''          
if re.search('DTS', link, re.IGNORECASE):
     re.search= 'DTS'
if not re.search('DTS', link, re.IGNORECASE):
     re.search= ''    
if re.search('DD', link,):
     re.search= 'DD'
if not re.search('DD', link,):
     re.search= ''          
if re.search('AC3', link, re.IGNORECASE):
     re.search= 'AC3'
if not re.search('AC3', link, re.IGNORECASE):
     re.search= ''    
if re.search('5.1', link, re.IGNORECASE):
     re.search= '5.1 Surround Sound'
if not re.search('5.1', link, re.IGNORECASE):
     re.search= ''


it gives this error
Code:
Traceback (most recent call last):
  File "C:\Users\Mike\Desktop\link.py", line 7, in <module>
    if not re.search('Engli', link, re.IGNORECASE):
TypeError: 'str' object is not callable
find quote
giftie Offline
Skilled Python Coder
Posts: 2,331
Joined: Mar 2010
Reputation: 53
Location: Calgary, Alberta
Post: #6
you can't set a reg-ex function with a string..

Code:
re.search()

you need to set strings.

Code:
import re

audio_language = ""
audio_codec = ""
audio_channels = ""

link = ' fooo asasa english baar french 5.1 DtS'

if re.search('Engli', link, re.IGNORECASE):
     audio_language = 'English Audio'
if re.search('german', link, re.IGNORECASE):
     audio_language = 'German Audio'
if re.search('Deuts', link, re.IGNORECASE):
     audio_language = 'German Audio'
if re.search('french', link, re.IGNORECASE):
     audio_language = 'French Audio'
if re.search('turk', link, re.IGNORECASE):
     audio_language = 'Turkish Audio'
if re.search('DTS', link, re.IGNORECASE):
     audio_codec = 'DTS'
if re.search('DD', link,):
     audio_codec = 'DD'
if re.search('AC3', link, re.IGNORECASE):
     audio_codec = 'AC3'
if re.search('5.1', link, re.IGNORECASE):
     audio_channels = '5.1 Surround Sound'

print audio_language
print audio_codec
print audio_channels

The problem you will have is when there are multiple language tracks, with what you have it will overwrite the previous one. Might need to set up a dict for storing each track.

[Image: e4f63e45ba34fe4695b3bb08eb2499d8e4ee484e...4c076g.jpg]
For troubleshooting and bug reporting please make sure you read this first you can also use XBMC Log Uploader Script.
Cinema Experience
Cinema Experience Wiki
cdART Manager
fanart.tv


find quote
Bstrdsmkr Offline
Posting Freak
Posts: 802
Joined: Oct 2010
Reputation: 16
Post: #7
Just for posterity sake, I'll mirror my solution posted at xbmchub.com:

Your error is because you're trying to set re.search (a function) equal to 'English Audio' (a string), then trying to use it as a function again in the next if statement. You'll want to make a new variable to hold the result. Something like this:

Code:
if re.search('Engli', link, re.IGNORECASE):
     lang = 'English Audio'


A couple of things to think about though. In the first .nfo, they have the audio listed as check boxes. Your current method would trigger on those options even though they're not "checked"
I think what I would do is create an array of regex's to match for each language, then loop through them (you'll probably want to store them in a separate file and import the file for easy maintenance)

Code:
all_languages = {
"English" : ["\[X\] Englisch DD 5\.1", "\[X\] Englisch DTS", "Type(?:.)+?: English"],
"German": ["\[X\] German DD 5\.1", "\[X\] German DTS", "Type(?:.)+?: German"]
}
available_languages = []
for language in all_languages:
    for regex in language:
        if re.search(regex, link, re.IGNORECASE):
            available_languages.append(language)

That should give you back a list of all languages that are indicated in the nfo. When you find a new nfo format that doesn't match any of the existing regexes, just add a new regex to the list for that language.
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #8
Lol
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #9
im trying to do this cause all i want is a quick dialog coming up
Code:
def EasySearch(name,iconimage,fanart):
        search_entered = str(name).replace(' ','+') .replace(':','') .replace(', ','+').replace(',','+').replace('[','').replace(']',' ').replace('The','').replace('(','') .replace(')','') .replace('-','+')      
        theurl = 'http://members.easynews.com/global5/index.html?gps='+search_entered+'&sbj=&
        print theurl      
        username = ADDON.getSetting('easy_user')
        password = ADDON.getSetting('easy_pass')
        passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
        passman.add_password(None, theurl, username, password)
        authhandler = urllib2.HTTPBasicAuthHandler(passman)
        opener = urllib2.build_opener(authhandler)
        urllib2.install_opener(opener)
        pagehandle = urllib2.urlopen(theurl)
        link= pagehandle.read()      
        match=re.compile('<a href="(.+?)" target="subjTarget".+?<span class="autounrarlink">(.+?)</span></a>.+?class="fSize" nowrap>(.+?)</td>').findall(link)
        class MyClass():
            if re.search('alt="English"', link, re.IGNORECASE):
                    eng= 'Found English Audio'    
            if not re.search('alt="English"', link, re.IGNORECASE):
                    eng= ''          
            if re.search('alt="German"', link, re.IGNORECASE):
                    ger= 'Found German Audio'
            if not re.search('alt="German"', link, re.IGNORECASE):
                    ger= ''          
            if re.search('alt="French"', link, re.IGNORECASE):
                    fre= 'Found French Audio'
            if not re.search('alt="French"', link, re.IGNORECASE):
                    fre= ''    
            if re.search('alt="Turkish"', link, re.IGNORECASE):
                    tur= 'Found Turkish Audio'
            if not re.search('alt="Turkish"', link, re.IGNORECASE):
                    tur= ''    
            dialog = xbmcgui.Dialog()
            dialog.ok= (MyClass())

but i get this error

Code:
NameError: free variable 'MyClass' referenced before assignment in enclosing scope
(This post was last modified: 2012-08-13 13:08 by mikey1234.)
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #10
ok even if i do this

Code:
eng=''
        ger=''
        fre=''
        tur=''
        if re.search('alt="English"', link, re.IGNORECASE):
                eng= 'Found English Audio'    
        if not re.search('alt="English"', link, re.IGNORECASE):
                eng= ''          
        if re.search('alt="German"', link, re.IGNORECASE):
                ger= 'Found German Audio'
        if not re.search('alt="German"', link, re.IGNORECASE):
                ger= ''          
        if re.search('alt="French"', link, re.IGNORECASE):
                fre= 'Found French Audio'
        if not re.search('alt="French"', link, re.IGNORECASE):
                fre= ''    
        if re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= 'Found Turkish Audio'
        if not re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= ''
        dialog= xbmcgui.Dialog()
        dialog.ok= (eng,ger,fre,tur)


error is
Code:
AttributeError: 'xbmcgui.Dialog' object attribute 'ok' is read-only
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #11
thanks i have fixed it
Code:
eng=''
        ger=''
        fre=''
        tur=''
        if re.search('gb.png" alt="English"', link, re.IGNORECASE):
                eng= 'Found English Audio'    
        if not re.search('gb.png" alt="English"', link, re.IGNORECASE):
                eng= ''          
        if re.search('alt="German"', link, re.IGNORECASE):
                ger= 'Found German Audio'
        if not re.search('alt="German"', link, re.IGNORECASE):
                ger= ''          
        if re.search('alt="French"', link, re.IGNORECASE):
                fre= 'Found French Audio'
        if not re.search('alt="French"', link, re.IGNORECASE):
                fre= ''    
        if re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= 'Found Turkish Audio'
        if not re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= ''
        xbmcgui.Dialog().ok('Found These Audios',eng,ger,fre)



only problem with dialog is it only has 4 attributes what other dialog or notifications can i use to display more
find quote
mikey1234 Offline
Banned
Posts: 408
Joined: Nov 2011
Post: #12
thanks i have fixed it
Code:
eng=''
        ger=''
        fre=''
        tur=''
        if re.search('gb.png" alt="English"', link, re.IGNORECASE):
                eng= 'Found English Audio'    
        if not re.search('gb.png" alt="English"', link, re.IGNORECASE):
                eng= ''          
        if re.search('alt="German"', link, re.IGNORECASE):
                ger= 'Found German Audio'
        if not re.search('alt="German"', link, re.IGNORECASE):
                ger= ''          
        if re.search('alt="French"', link, re.IGNORECASE):
                fre= 'Found French Audio'
        if not re.search('alt="French"', link, re.IGNORECASE):
                fre= ''    
        if re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= 'Found Turkish Audio'
        if not re.search('alt="Turkish"', link, re.IGNORECASE):
                tur= ''
        xbmcgui.Dialog().ok('Found These Audios',eng,ger,fre)



only problem with dialog is it only has 4 attributes what other dialog or notifications can i use to display more
find quote