Help with a new Korean Music Scraper?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
kimp93 Offline
Aeon Group
Posts: 157
Joined: Mar 2004
Question  Help with a new Korean Music Scraper?
Post: #1
This is my third scraper. this time, I'm trying to make a music scraper.
However, I don't even get search results.
Here is a debug log that may relevant.

Code:
23:50:03 T:668 M:154071040   DEBUG: FileCurl::Open(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1
23:50:03 T:668 M:154066944    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.search.cyworld.com
23:50:04 T:668 M:149823488   DEBUG: Curl::Debug About to connect() to music.search.cyworld.com port 80 (#0)
23:50:04 T:668 M:149823488   DEBUG: Curl::Debug   Trying 117.53.105.15...
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Connected to music.search.cyworld.com (117.53.105.15) port 80 (#0)
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug GET /cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1 HTTP/1.1
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug User-Agent: XBMC/pre-9.04 r18650 (Windows; Windows XP Professional Service Pack 2 build 2600; http://www.xbmc.org)
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Host: music.search.cyworld.com
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Accept: */*
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Connection: keep-alive
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug HTTP/1.1 200 OK
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Date: Fri, 20 Mar 2009 03:50:02 GMT
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Server: Apache
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Connection: close
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Transfer-Encoding: chunked
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Content-Type: text/html
23:50:05 T:668 M:174272512   DEBUG: Curl::Debug Expire cleared
23:50:05 T:668 M:174272512   DEBUG: Curl::Debug Closing connection #0
23:50:05 T:668 M:174272512   DEBUG: FileCurl::Close(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1

full debug log


link to music scraper

If I put same address in web browser, it works all right. XBMC don't seem to get anything after that.
I tried scrap.exe after change couple of tags to run. It seems to work Ok. It makes all xml properly. I don't know why XBMC don't.

I don't think it a bug in XBMC, since other music scrapers work. Please give me some idea.

Thanks
Young-cho
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #2
hi,

since i don't read korean it's hard for me to locate the source where you found the form. i suspect you might need to post the form, not submit
find quote
kimp93 Offline
Aeon Group
Posts: 157
Joined: Mar 2004
Post: #3
I really appreciate your comment. I was totally lost.
I did tried "post" and it does not seem to work.
After your comment, I look at html more carefully.

From http://music.search.cyworld.com/cymusic/search.html
[HTML]<script type="text/javascript" src="http://music.cyworld.com/common/cybgm_snb_script.asp?query=&v=1"></script>
[/HTML]

From http://music.cyworld.com/common/cybgm_snb_script.asp

[HTML] + ' <form name="search" id="search" autocomplete="off" action="http://music.search.cyworld.com/cymusic/search.html">'
+ ' <p id="selectTxt" onclick="ct_toggle();">전체</p>'
+ " <input type=\"text\" class=\"text\" name=\"query\" id=\"query\" onclick=\"ac_toggle(this);\" maxlength=\"100\" onKeyPress=\"if( event.keyCode == 13 ) { go_search(); return false; }\" title=\"검색어 입력\" value=\"\" onblur=\"toggleSearchBar(0);\" onfocus=\"toggleSearchBar(1);\" />"
+ ' <input type="button" class="btn" title="검색" onclick="go_search();" />'
+ ' </form>[/HTML]

I don't know much about java script. Based on this, they don't seem to use post.
After the form. there are more java probably for ajax auto-completion which I'm not sure it matter though.
find quote
kimp93 Offline
Aeon Group
Posts: 157
Joined: Mar 2004
Post: #4
Now I gave up on previous music scraper. So I made a new one that I familiar with.

I'm trying to scrape same site as movie scraper (DAUM) that I made before.
However same thing are happening as before. I can not get search result.

Here is a part of the scraper

Code:
    <CreateAlbumSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url&gt;http://music.daum.net/search/album.do?query=\1&lt;/url&gt;" dest="3">
            <expression noclean="1"></expression>
        </RegExp>
    </CreateAlbumSearchUrl>
    <GetAlbumSearchResults dest="8" SearchStringEncoding="UTF-8">
        <RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">&lt;a href=&quot;(.[^&quot;]*)&quot; class=&quot;fl&quot;&gt;(.[^\n]*)\n[^\:]*\:[^\:]*\:[^&gt;]*&gt;(.[^&lt;]*)&lt;</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>        
    </GetAlbumSearchResults>


Here is a debug log


Code:
14:22:09 T:3828 M:232128512   DEBUG: thread start, auto delete: 0
14:22:09 T:3828 M:232022016   DEBUG: FileCurl::Open(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album%29
14:22:09 T:3828 M:232017920    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.daum.net
14:22:11 T:3828 M:231038976   DEBUG: FileCurl::Close(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album%29
14:22:11 T:3828 M:230985728   DEBUG: Thread 3828 terminating
14:22:11 T:3372 M:231096320    INFO: Loading skin file: DialogOK.xml
14:22:11 T:3372 M:231092224   DEBUG: Load DialogOK.xml: 2.17ms



Once again, I could get search result with the URL in debug log from firefox.

Since same kind of movie scraper daum.xml work ok, I don't know why same site with similar search URL don't work in music scraper.

http://music.daum.net/search/album.do?qu...20Album%29


If anything need to solve the problem, please let me know.

Here is a link to the scraper and a sample.
download
(This post was last modified: 2009-05-15 20:58 by kimp93.)
find quote