Task force/China/Public recognition for top websites in China

    From Strategic Planning

    The Chinese Internet market is different from US, it is helpful that we could have a good understanding for it. So we did a survey on the public recognition for top websites in China.

    Survey method

    The method we take is relative cheap and easy, just using number of items in Google index to judge the popularity of a specific brand.

    The brand we take into account are from Alexa top 60 websites (merged or filtered some websites out, for they are obviously duplicated or biased data by Alexa). The brands are as below: 百度, QQ, 谷歌, 新浪, 淘宝, 网易, 搜狐, 开心网, 优酷, 土豆网, soso, 雅虎, 天涯, 人人网, 搜房, 凤凰网, MSN, 迅雷, 搜狗, 猫扑, 我乐网, 新华网, 阿里巴巴, hao123, tom, 豆瓣, 我要啦, 人民网, 和讯网, 东方财富, 北青网, 天极网, 有道, IT168, VeryCD, CSDN, 51job, 维基百科, 百度百科, 互动百科.

    We then send request to Google and limits the query in Tianya, the largest online forum in China. The returned indexed page number will reflect the popularity of a specific brand.

    The program to perform the query is as below:

    require 'rubygems'
    require 'hpricot'
    require 'open-uri'
    require 'cgi'
    
    brands = ['百度','QQ','谷歌','新浪','淘宝','网易','搜狐','开心网','优酷','土豆网','soso', \
    '雅虎','天涯','人人网','搜房','凤凰网','MSN','迅雷','搜狗','猫扑','我乐网','新华网','阿里巴巴', \
    'hao123','tom','豆瓣','我要啦','人民网','和讯网','东方财富','北青网','天极网','友道', \
    'it168 ','verycd','csdn','51job','维基百科','百度百科','互动百科']
    
    
    puts "=================================="
    brands.each do |b|
      query = CGI::escape(b)
      doc = open("http://www.google.com/search?q=site%3Awww.tianya.cn+inurl%3Ahttp%3A%2F%2Fwww.tianya.cn%2Fpublicforum%2F+%22" + query + "%22") { |f| Hpricot(f) }
      result = doc.search("//p[@id='resultStats']")[0].search("b")
      if(result.size > 0)
        puts b + ": " + /(\d|,)+/.match(result[2].to_html)[0]
      else
        puts b + ": 0"
      end
      sleep(15 + rand(10))
    end
    puts "=================================="
    

    Result

    The results which above 10,000 indexed pages are as below (hence Tianya is biased on Tianya itself, I removed the result of Tianya):

    Nov 8, 2009

    1. MSN 4620000
    2. QQ 3960000
    3. 百度 457000
    4. 淘宝 324000
    5. 新浪 200000
    6. 搜狐 171000
    7. 网易 104000
    8. 迅雷 85300
    9. 新华网 79000
    10. 土豆(tudou) 65500
    11. tom 61700
    12. 人民网 50300
    13. 谷歌 50100
    14. 猫扑 49300
    15. 阿里巴巴 32800
    16. 豆瓣 30300
    17. 雅虎 29500
    18. 优酷 27400
    19. 凤凰网 19100
    20. 搜狗 18700
    21. 百度百科 18700
    22. verycd 13300
    23. 开心网 13100
    24. 维基百科 11400
    25. 搜房 10800

    April 7, 2010

    1. QQ 7490000
    2. 百度 2830000
    3. MSN 2190000
    4. 淘宝 1750000
    5. 谷歌 1520000
    6. 新浪 1140000
    7. 猫扑 795000
    8. 新华网 755000
    9. verycd 764000
    10. 人民网 703000
    11. 网易 499000
    12. 搜房 475000
    13. 阿里巴巴 461000
    14. 搜狐 428000
    15. 凤凰网 410000
    16. 百度百科 365000
    17. Tom 271000
    18. 迅雷 244000
    19. 开心网 221000
    20. 雅虎 148000
    21. 土豆 117000
    22. 豆瓣 74100
    23. 优酷 56900
    24. 和讯网 56600
    25. 搜狗 39700
    26. 维基百科 37700