release 2016.09.11

[tvplay] Remove unused import
[tube8] Extract categories and tags (Closes #10579 )
2026-03-28 02:18:49 +00:00 · 2016-09-11 23:20:09 +07:00 · 2016-09-11 22:51:12 +07:00 · 2016-09-11 22:50:36 +07:00 · 2016-09-11 19:22:51 +08:00 · 2016-09-11 18:54:24 +08:00
23 changed files with 543 additions and 114 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.08*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.08**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.11*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.11**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.09.08
+[debug] youtube-dl version 2016.09.11
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/9
+++ b/9
@@ -1,3 +1,12 @@
+version 2016.09.11
+
+Extractors
+ [pornhub] Extract categories and tags (#10499)
+ [foxnews] Support Fox News articles (#10598)
+* [iwara] Fix extraction after relaunch (#10462, #3215)
+* [newgrounds] Fix uploader extraction (#10584)
+
+
 version 2016.09.08

 Extractors
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -247,7 +247,8 @@
 - **Formula1**
 - **FOX**
 - **Foxgay**
- - **FoxNews**: Fox News and Fox Business Video
+ - **foxnews**: Fox News and Fox Business Video
+ - **foxnews:article**
 - **foxnews:insider**
 - **FoxSports**
 - **france2.fr:generation-quoi**
@@ -326,6 +327,7 @@
 - **ivi**: ivi.ru
 - **ivi:compilation**: ivi.ru compilations
 - **ivideon**: Ivideon TV
+ - **Iwara**
 - **Izlesene**
 - **JeuxVideo**
 - **Jove**
@@ -339,6 +341,7 @@
 - **KarriereVideos**
 - **keek**
 - **KeezMovies**
+ - **Ketnet**
 - **KhanAcademy**
 - **KickStarter**
 - **KonserthusetPlay**
@@ -540,6 +543,7 @@
 - **podomatic**
 - **Pokemon**
 - **PolskieRadio**
+ - **PolskieRadioCategory**
 - **PornCom**
 - **PornHd**
 - **PornHub**: PornHub and Thumbzilla
@@ -701,9 +705,11 @@
 - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
 - **Telegraaf**
 - **TeleMB**
+ - **TeleQuebec**
 - **TeleTask**
 - **Telewebion**
 - **TF1**
+ - **TFO**
 - **TheIntercept**
 - **ThePlatform**
 - **ThePlatformFeed**
@@ -725,7 +731,6 @@
 - **ToypicsUser**: Toypics user profile
 - **TrailerAddict** (Currently broken)
 - **Trilulilu**
- - **trollvids**
 - **TruTV**
 - **Tube8**
 - **TubiTv**
--- a/youtube_dl/extractor/abc.py
+++ b/youtube_dl/extractor/abc.py
@@ -100,6 +100,7 @@ class ABCIViewIE(InfoExtractor):
    IE_NAME = 'abc.net.au:iview'
    _VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'

+    # ABC iview programs are normally available for 14 days only.
    _TESTS = [{
        'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
        'md5': '979d10b2939101f0d27a06b79edad536',
@@ -112,6 +113,7 @@ class ABCIViewIE(InfoExtractor):
            'uploader_id': 'abc1',
            'timestamp': 1471719600,
        },
+        'skip': 'Video gone',
    }]

    def _real_extract(self, url):
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -23,6 +23,7 @@ class CanalplusIE(InfoExtractor):
                                    (?:(?:www|m)\.)?canalplus\.fr|
                                    (?:www\.)?piwiplus\.fr|
                                    (?:www\.)?d8\.tv|
+                                    (?:www\.)?c8\.fr|
                                    (?:www\.)?d17\.tv|
                                    (?:www\.)?itele\.fr
                                )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
@@ -35,6 +36,7 @@ class CanalplusIE(InfoExtractor):
        'canalplus': 'cplus',
        'piwiplus': 'teletoon',
        'd8': 'd8',
+        'c8': 'd8',
        'd17': 'd17',
        'itele': 'itele',
    }
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -1,11 +1,13 @@
 from __future__ import unicode_literals

+import re
+
 from .common import InfoExtractor
 from ..utils import float_or_none


 class CanvasIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
    _TESTS = [{
        'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
        'md5': 'ea838375a547ac787d4064d8c7860a6c',
@@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
        'params': {
            'skip_download': True,
        }
+    }, {
+        'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+        'info_dict': {
+            'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
+            'display_id': 'herbekijk-sorry-voor-alles',
+            'ext': 'mp4',
+            'title': 'Herbekijk Sorry voor alles',
+            'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 3788.06,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
+        'only_matching': True,
    }]

    def _real_extract(self, url):
-        display_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        site_id, display_id = mobj.group('site_id'), mobj.group('id')

        webpage = self._download_webpage(url, display_id)

-        title = self._search_regex(
+        title = (self._search_regex(
            r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
-            webpage, 'title', default=None) or self._og_search_title(webpage)
+            webpage, 'title', default=None) or self._og_search_title(
+            webpage)).strip()

        video_id = self._html_search_regex(
            r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')

        data = self._download_json(
-            'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
+            'https://mediazone.vrt.be/api/v1/%s/assets/%s'
+            % (site_id, video_id), display_id)

        formats = []
        for target in data['targetUrls']:
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -293,6 +293,7 @@ from .fox import FOXIE
 from .foxgay import FoxgayIE
 from .foxnews import (
    FoxNewsIE,
+    FoxNewsArticleIE,
    FoxNewsInsiderIE,
 )
 from .foxsports import FoxSportsIE
@@ -395,6 +396,7 @@ from .ivi import (
    IviCompilationIE
 )
 from .ivideon import IvideonIE
+from .iwara import IwaraIE
 from .izlesene import IzleseneIE
 from .jeuxvideo import JeuxVideoIE
 from .jove import JoveIE
@@ -407,6 +409,7 @@ from .kankan import KankanIE
 from .karaoketv import KaraoketvIE
 from .karrierevideos import KarriereVideosIE
 from .keezmovies import KeezMoviesIE
+from .ketnet import KetnetIE
 from .khanacademy import KhanAcademyIE
 from .kickstarter import KickStarterIE
 from .keek import KeekIE
@@ -670,7 +673,10 @@ from .pluralsight import (
 )
 from .podomatic import PodomaticIE
 from .pokemon import PokemonIE
-from .polskieradio import PolskieRadioIE
+from .polskieradio import (
+    PolskieRadioIE,
+    PolskieRadioCategoryIE,
+)
 from .porn91 import Porn91IE
 from .porncom import PornComIE
 from .pornhd import PornHdIE
@@ -861,10 +867,12 @@ from .telebruxelles import TeleBruxellesIE
 from .telecinco import TelecincoIE
 from .telegraaf import TelegraafIE
 from .telemb import TeleMBIE
+from .telequebec import TeleQuebecIE
 from .teletask import TeleTaskIE
 from .telewebion import TelewebionIE
 from .testurl import TestURLIE
 from .tf1 import TF1IE
+from .tfo import TFOIE
 from .theintercept import TheInterceptIE
 from .theplatform import (
    ThePlatformIE,
@@ -893,7 +901,6 @@ from .toutv import TouTvIE
 from .toypics import ToypicsUserIE, ToypicsIE
 from .traileraddict import TrailerAddictIE
 from .trilulilu import TriluliluIE
-from .trollvids import TrollvidsIE
 from .trutv import TruTVIE
 from .tube8 import Tube8IE
 from .tubitv import TubiTvIE
--- a/youtube_dl/extractor/foxnews.py
+++ b/youtube_dl/extractor/foxnews.py
@@ -7,6 +7,7 @@ from .common import InfoExtractor


 class FoxNewsIE(AMPIE):
+    IE_NAME = 'foxnews'
    IE_DESC = 'Fox News and Fox Business Video'
    _VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
    _TESTS = [
@@ -66,6 +67,35 @@ class FoxNewsIE(AMPIE):
        return info


+class FoxNewsArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
+    IE_NAME = 'foxnews:article'
+
+    _TEST = {
+        'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html',
+        'md5': '62aa5a781b308fdee212ebb6f33ae7ef',
+        'info_dict': {
+            'id': '5116295019001',
+            'ext': 'mp4',
+            'title': 'Trump and Clinton asked to defend positions on Iraq War',
+            'description': 'Veterans react on \'The Kelly File\'',
+            'timestamp': 1473299755,
+            'upload_date': '20160908',
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._html_search_regex(
+            r'data-video-id=([\'"])(?P<id>[^\'"]+)\1',
+            webpage, 'video ID', group='id')
+        return self.url_result(
+            'http://video.foxnews.com/v/' + video_id,
+            FoxNewsIE.ie_key())
+
+
 class FoxNewsInsiderIE(InfoExtractor):
    _VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
    IE_NAME = 'foxnews:insider'
@@ -83,6 +113,10 @@ class FoxNewsInsiderIE(InfoExtractor):
            'upload_date': '20160825',
            'thumbnail': 're:^https?://.*\.jpg$',
        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
        'add_ie': [FoxNewsIE.ie_key()],
    }

--- a/youtube_dl/extractor/iwara.py
+++ b/youtube_dl/extractor/iwara.py
@@ -0,0 +1,77 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlparse
+from ..utils import remove_end
+
+
+class IwaraIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
+    _TESTS = [{
+        'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
+        'md5': '1d53866b2c514b23ed69e4352fdc9839',
+        'info_dict': {
+            'id': 'amVwUl1EHpAD9RD',
+            'ext': 'mp4',
+            'title': '【MMD R-18】ガールフレンド carry_me_off',
+            'age_limit': 18,
+        },
+    }, {
+        'url': 'http://ecchi.iwara.tv/videos/Vb4yf2yZspkzkBO',
+        'md5': '7e5f1f359cd51a027ba4a7b7710a50f0',
+        'info_dict': {
+            'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
+            'ext': 'mp4',
+            'title': '[3D Hentai] Kyonyu Ã\x97 Genkai Ã\x97 Emaki Shinobi Girls.mp4',
+            'age_limit': 18,
+        },
+        'add_ie': ['GoogleDrive'],
+    }, {
+        'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
+        'md5': '1d85f1e5217d2791626cff5ec83bb189',
+        'info_dict': {
+            'id': '6liAP9s2Ojc',
+            'ext': 'mp4',
+            'age_limit': 0,
+            'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
+            'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
+            'upload_date': '20160910',
+            'uploader': 'aMMDsork',
+            'uploader_id': 'UCVOFyOSCyFkXTYYHITtqB7A',
+        },
+        'add_ie': ['Youtube'],
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        hostname = compat_urllib_parse_urlparse(urlh.geturl()).hostname
+        # ecchi is 'sexy' in Japanese
+        age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
+
+        entries = self._parse_html5_media_entries(url, webpage, video_id)
+
+        if not entries:
+            iframe_url = self._html_search_regex(
+                r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
+                webpage, 'iframe URL', group='url')
+            return {
+                '_type': 'url_transparent',
+                'url': iframe_url,
+                'age_limit': age_limit,
+            }
+
+        title = remove_end(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
+
+        info_dict = entries[0]
+        info_dict.update({
+            'id': video_id,
+            'title': title,
+            'age_limit': age_limit,
+        })
+
+        return info_dict
--- a/youtube_dl/extractor/ketnet.py
+++ b/youtube_dl/extractor/ketnet.py
@@ -0,0 +1,52 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class KetnetIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes',
+        'md5': 'd907f7b1814ef0fa285c0475d9994ed7',
+        'info_dict': {
+            'id': 'zomerse-filmpjes',
+            'ext': 'mp4',
+            'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
+            'description': 'Gluur mee met Ghost Rockers op de filmset',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        }
+    }, {
+        'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        config = self._parse_json(
+            self._search_regex(
+                r'(?s)playerConfig\s*=\s*({.+?})\s*;', webpage,
+                'player config'),
+            video_id)
+
+        title = config['title']
+
+        formats = self._extract_m3u8_formats(
+            config['source']['hls'], video_id, 'mp4',
+            entry_protocol='m3u8_native', m3u8_id='hls')
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': config.get('description'),
+            'thumbnail': config.get('image'),
+            'series': config.get('program'),
+            'episode': config.get('episode'),
+            'formats': formats,
+        }
--- a/youtube_dl/extractor/lrt.py
+++ b/youtube_dl/extractor/lrt.py
@@ -1,8 +1,11 @@
 # coding: utf-8
 from __future__ import unicode_literals

+import re
+
 from .common import InfoExtractor
 from ..utils import (
+    determine_ext,
    int_or_none,
    parse_duration,
    remove_end,
@@ -12,8 +15,10 @@ from ..utils import (
 class LRTIE(InfoExtractor):
    IE_NAME = 'lrt.lt'
    _VALID_URL = r'https?://(?:www\.)?lrt\.lt/mediateka/irasas/(?P<id>[0-9]+)'
-    _TEST = {
+    _TESTS = [{
+        # m3u8 download
        'url': 'http://www.lrt.lt/mediateka/irasas/54391/',
+        'md5': 'fe44cf7e4ab3198055f2c598fc175cb0',
        'info_dict': {
            'id': '54391',
            'ext': 'mp4',
@@ -23,20 +28,45 @@ class LRTIE(InfoExtractor):
            'view_count': int,
            'like_count': int,
        },
-        'params': {
-            'skip_download': True,  # m3u8 download
+    }, {
+        # direct mp3 download
+        'url': 'http://www.lrt.lt/mediateka/irasas/1013074524/',
+        'md5': '389da8ca3cad0f51d12bed0c844f6a0a',
+        'info_dict': {
+            'id': '1013074524',
+            'ext': 'mp3',
+            'title': 'Kita tema 2016-09-05 15:05',
+            'description': 'md5:1b295a8fc7219ed0d543fc228c931fb5',
+            'duration': 3008,
+            'view_count': int,
+            'like_count': int,
        },
-    }
+    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)

        title = remove_end(self._og_search_title(webpage), ' - LRT')
-        m3u8_url = self._search_regex(
-            r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
-            webpage, 'm3u8 url', group='url')
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+
+        formats = []
+        for _, file_url in re.findall(
+                r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
+            ext = determine_ext(file_url)
+            if ext not in ('m3u8', 'mp3'):
+                continue
+            # mp3 served as m3u8 produces stuttered media file
+            if ext == 'm3u8' and '.mp3' in file_url:
+                continue
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    file_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    fatal=False))
+            elif ext == 'mp3':
+                formats.append({
+                    'url': file_url,
+                    'vcodec': 'none',
+                })
        self._sort_formats(formats)

        thumbnail = self._og_search_thumbnail(webpage)
--- a/youtube_dl/extractor/newgrounds.py
+++ b/youtube_dl/extractor/newgrounds.py
@@ -1,15 +1,12 @@
 from __future__ import unicode_literals

-import json
-import re
-
 from .common import InfoExtractor


 class NewgroundsIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
    _TESTS = [{
-        'url': 'http://www.newgrounds.com/audio/listen/549479',
+        'url': 'https://www.newgrounds.com/audio/listen/549479',
        'md5': 'fe6033d297591288fa1c1f780386f07a',
        'info_dict': {
            'id': '549479',
@@ -18,7 +15,7 @@ class NewgroundsIE(InfoExtractor):
            'uploader': 'Burn7',
        }
    }, {
-        'url': 'http://www.newgrounds.com/portal/view/673111',
+        'url': 'https://www.newgrounds.com/portal/view/673111',
        'md5': '3394735822aab2478c31b1004fe5e5bc',
        'info_dict': {
            'id': '673111',
@@ -29,24 +26,20 @@ class NewgroundsIE(InfoExtractor):
    }]

    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        music_id = mobj.group('id')
-        webpage = self._download_webpage(url, music_id)
+        media_id = self._match_id(url)
+        webpage = self._download_webpage(url, media_id)

        title = self._html_search_regex(
            r'<title>([^>]+)</title>', webpage, 'title')

        uploader = self._html_search_regex(
-            [r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
-            webpage, 'uploader')
+            r'Author\s*<a[^>]+>([^<]+)', webpage, 'uploader', fatal=False)

-        music_url_json_string = self._html_search_regex(
-            r'({"url":"[^"]+"),', webpage, 'music url') + '}'
-        music_url_json = json.loads(music_url_json_string)
-        music_url = music_url_json['url']
+        music_url = self._parse_json(self._search_regex(
+            r'"url":("[^"]+"),', webpage, ''), media_id)

        return {
-            'id': music_id,
+            'id': media_id,
            'title': title,
            'url': music_url,
            'uploader': uploader,
--- a/youtube_dl/extractor/ninenow.py
+++ b/youtube_dl/extractor/ninenow.py
@@ -44,7 +44,20 @@ class NineNowIE(InfoExtractor):
        page_data = self._parse_json(self._search_regex(
            r'window\.__data\s*=\s*({.*?});', webpage,
            'page data'), display_id)
-        common_data = page_data.get('episode', {}).get('episode') or page_data.get('clip', {}).get('clip')
+
+        for kind in ('episode', 'clip'):
+            current_key = page_data.get(kind, {}).get(
+                'current%sKey' % kind.capitalize())
+            if not current_key:
+                continue
+            cache = page_data.get(kind, {}).get('%sCache' % kind, {})
+            if not cache:
+                continue
+            common_data = (cache.get(current_key) or list(cache.values())[0])[kind]
+            break
+        else:
+            raise ExtractorError('Unable to find video data')
+
        video_data = common_data['video']

        if video_data.get('drm'):
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -60,7 +60,7 @@ class OpenloadIE(InfoExtractor):
            if j >= 33 and j <= 126:
                j = ((j + 14) % 94) + 33
            if idx == len(enc_data) - 1:
-                j += 1
+                j += 3
            video_url_chars += compat_chr(j)

        video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
--- a/youtube_dl/extractor/parliamentliveuk.py
+++ b/youtube_dl/extractor/parliamentliveuk.py
@@ -1,53 +1,40 @@
 from __future__ import unicode_literals

-import re
-
 from .common import InfoExtractor


 class ParliamentLiveUKIE(InfoExtractor):
    IE_NAME = 'parliamentlive.tv'
    IE_DESC = 'UK parliament videos'
-    _VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'

    _TEST = {
-        'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
+        'url': 'http://parliamentlive.tv/Event/Index/c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
        'info_dict': {
-            'id': '15121',
-            'ext': 'asf',
-            'title': 'hoc home affairs committee, 18 mar 2014.pm',
-            'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
+            'id': 'c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
+            'ext': 'mp4',
+            'title': 'Home Affairs Committee',
+            'uploader_id': 'FFMPEG-01',
+            'timestamp': 1422696664,
+            'upload_date': '20150131',
        },
-        'params': {
-            'skip_download': True,  # Requires mplayer (mms)
-        }
    }

    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
-
-        asx_url = self._html_search_regex(
-            r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
-            'metadata URL')
-        asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
-        video_url = asx.find('.//REF').attrib['HREF']
-
-        title = self._search_regex(
-            r'''(?x)player\.setClipDetails\(
-                (?:(?:[0-9]+|"[^"]+"),\s*){2}
-                "([^"]+",\s*"[^"]+)"
-                ''',
-            webpage, 'title').replace('", "', ', ')
-        description = self._html_search_regex(
-            r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
-            webpage, 'description')
-
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://vodplayer.parliamentlive.tv/?mid=' + video_id, video_id)
+        widget_config = self._parse_json(self._search_regex(
+            r'kWidgetConfig\s*=\s*({.+});',
+            webpage, 'kaltura widget config'), video_id)
+        kaltura_url = 'kaltura:%s:%s' % (widget_config['wid'][1:], widget_config['entry_id'])
+        event_title = self._download_json(
+            'http://parliamentlive.tv/Event/GetShareVideo/' + video_id, video_id)['event']['title']
        return {
+            '_type': 'url_transparent',
            'id': video_id,
-            'ext': 'asf',
-            'url': video_url,
-            'title': title,
-            'description': description,
+            'title': event_title,
+            'description': '',
+            'url': kaltura_url,
+            'ie_key': 'Kaltura',
        }
--- a/youtube_dl/extractor/polskieradio.py
+++ b/youtube_dl/extractor/polskieradio.py
@@ -1,14 +1,17 @@
 # coding: utf-8
 from __future__ import unicode_literals

+import itertools
 import re

 from .common import InfoExtractor
 from ..compat import (
    compat_str,
    compat_urllib_parse_unquote,
+    compat_urlparse
 )
 from ..utils import (
+    extract_attributes,
    int_or_none,
    strip_or_none,
    unified_timestamp,
@@ -97,3 +100,81 @@ class PolskieRadioIE(InfoExtractor):
        description = strip_or_none(self._og_search_description(webpage))

        return self.playlist_result(entries, playlist_id, title, description)
+
+
+class PolskieRadioCategoryIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?polskieradio\.pl/\d+(?:,[^/]+)?/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.polskieradio.pl/7/5102,HISTORIA-ZYWA',
+        'info_dict': {
+            'id': '5102',
+            'title': 'HISTORIA ŻYWA',
+        },
+        'playlist_mincount': 38,
+    }, {
+        'url': 'http://www.polskieradio.pl/7/4807',
+        'info_dict': {
+            'id': '4807',
+            'title': 'Vademecum 1050. rocznicy Chrztu Polski'
+        },
+        'playlist_mincount': 5
+    }, {
+        'url': 'http://www.polskieradio.pl/7/129,Sygnaly-dnia?ref=source',
+        'only_matching': True
+    }, {
+        'url': 'http://www.polskieradio.pl/37,RedakcjaKatolicka/4143,Kierunek-Krakow',
+        'info_dict': {
+            'id': '4143',
+            'title': 'Kierunek Kraków',
+        },
+        'playlist_mincount': 61
+    }, {
+        'url': 'http://www.polskieradio.pl/10,czworka/214,muzyka',
+        'info_dict': {
+            'id': '214',
+            'title': 'Muzyka',
+        },
+        'playlist_mincount': 61
+    }, {
+        'url': 'http://www.polskieradio.pl/7,Jedynka/5102,HISTORIA-ZYWA',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.polskieradio.pl/8,Dwojka/196,Publicystyka',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if PolskieRadioIE.suitable(url) else super(PolskieRadioCategoryIE, cls).suitable(url)
+
+    def _entries(self, url, page, category_id):
+        content = page
+        for page_num in itertools.count(2):
+            for a_entry, entry_id in re.findall(
+                    r'(?s)<article[^>]+>.*?(<a[^>]+href=["\']/\d+/\d+/Artykul/(\d+)[^>]+>).*?</article>',
+                    content):
+                entry = extract_attributes(a_entry)
+                href = entry.get('href')
+                if not href:
+                    continue
+                yield self.url_result(
+                    compat_urlparse.urljoin(url, href), PolskieRadioIE.ie_key(),
+                    entry_id, entry.get('title'))
+            mobj = re.search(
+                r'<div[^>]+class=["\']next["\'][^>]*>\s*<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1',
+                content)
+            if not mobj:
+                break
+            next_url = compat_urlparse.urljoin(url, mobj.group('url'))
+            content = self._download_webpage(
+                next_url, category_id, 'Downloading page %s' % page_num)
+
+    def _real_extract(self, url):
+        category_id = self._match_id(url)
+        webpage = self._download_webpage(url, category_id)
+        title = self._html_search_regex(
+            r'<title>([^<]+) - [^<]+ - [^<]+</title>',
+            webpage, 'title', fatal=False)
+        return self.playlist_result(
+            self._entries(url, webpage, category_id),
+            category_id, title)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -15,6 +15,7 @@ from ..compat import (
 from ..utils import (
    ExtractorError,
    int_or_none,
+    js_to_json,
    orderedSet,
    sanitized_Request,
    str_to_int,
@@ -48,6 +49,8 @@ class PornHubIE(InfoExtractor):
            'dislike_count': int,
            'comment_count': int,
            'age_limit': 18,
+            'tags': list,
+            'categories': list,
        },
    }, {
        # non-ASCII title
@@ -63,6 +66,8 @@ class PornHubIE(InfoExtractor):
            'dislike_count': int,
            'comment_count': int,
            'age_limit': 18,
+            'tags': list,
+            'categories': list,
        },
        'params': {
            'skip_download': True,
@@ -183,6 +188,15 @@ class PornHubIE(InfoExtractor):
            })
        self._sort_formats(formats)

+        page_params = self._parse_json(self._search_regex(
+            r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
+            webpage, 'page parameters', group='data', default='{}'),
+            video_id, transform_source=js_to_json, fatal=False)
+        tags = categories = None
+        if page_params:
+            tags = page_params.get('tags', '').split(',')
+            categories = page_params.get('categories', '').split(',')
+
        return {
            'id': video_id,
            'uploader': video_uploader,
@@ -195,6 +209,8 @@ class PornHubIE(InfoExtractor):
            'comment_count': comment_count,
            'formats': formats,
            'age_limit': 18,
+            'tags': tags,
+            'categories': categories,
        }


--- a/youtube_dl/extractor/telequebec.py
+++ b/youtube_dl/extractor/telequebec.py
@@ -0,0 +1,36 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class TeleQuebecIE(InfoExtractor):
+    _VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
+        'md5': 'fe95a0957e5707b1b01f5013e725c90f',
+        'info_dict': {
+            'id': '20984',
+            'ext': 'mp4',
+            'title': 'Le couronnement de New York',
+            'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
+            'upload_date': '20160220',
+            'timestamp': 1455965438,
+        }
+    }
+
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        media_data = self._download_json(
+            'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
+            media_id)['media']
+        return {
+            '_type': 'url_transparent',
+            'id': media_id,
+            'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
+            'title': media_data['title'],
+            'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
+            'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),
+            'ie_key': 'LimelightMedia',
+        }
--- a/youtube_dl/extractor/tfo.py
+++ b/youtube_dl/extractor/tfo.py
@@ -0,0 +1,53 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+
+from .common import InfoExtractor
+from ..utils import (
+    HEADRequest,
+    ExtractorError,
+    int_or_none,
+)
+
+
+class TFOIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
+        'md5': '47c987d0515561114cf03d1226a9d4c7',
+        'info_dict': {
+            'id': '100463871',
+            'ext': 'mp4',
+            'title': 'Video Game Hackathon',
+            'description': 'md5:558afeba217c6c8d96c60e5421795c07',
+            'upload_date': '20160212',
+            'timestamp': 1455310233,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        self._request_webpage(HEADRequest('http://www.tfo.org/'), video_id)
+        infos = self._download_json(
+            'http://www.tfo.org/api/web/video/get_infos', video_id, data=json.dumps({
+                'product_id': video_id,
+            }).encode(), headers={
+                'X-tfo-session': self._get_cookies('http://www.tfo.org/')['tfo-session'].value,
+            })
+        if infos.get('success') == 0:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, infos['msg']), expected=True)
+        video_data = infos['data']
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'limelight:media:' + video_data['llid'],
+            'title': video_data['title'],
+            'description': video_data.get('description'),
+            'series': video_data.get('collection'),
+            'season_number': int_or_none(video_data.get('season')),
+            'episode_number': int_or_none(video_data.get('episode')),
+            'duration': int_or_none(video_data.get('duration')),
+            'ie_key': 'LimelightMedia',
+        }
--- a/youtube_dl/extractor/trollvids.py
+++ b/youtube_dl/extractor/trollvids.py
@@ -1,36 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .nuevo import NuevoBaseIE
-
-
-class TrollvidsIE(NuevoBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?trollvids\.com/video/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
-    IE_NAME = 'trollvids'
-    _TEST = {
-        'url': 'http://trollvids.com/video/2349002/%E3%80%90MMD-R-18%E3%80%91%E3%82%AC%E3%83%BC%E3%83%AB%E3%83%95%E3%83%AC%E3%83%B3%E3%83%89-carrymeoff',
-        'md5': '1d53866b2c514b23ed69e4352fdc9839',
-        'info_dict': {
-            'id': '2349002',
-            'ext': 'mp4',
-            'title': '【MMD R-18】ガールフレンド carry_me_off',
-            'age_limit': 18,
-            'duration': 216.78,
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        info = self._extract_nuevo(
-            'http://trollvids.com/nuevo/player/config.php?v=%s' % video_id,
-            video_id)
-        info.update({
-            'display_id': display_id,
-            'age_limit': 18
-        })
-        return info
--- a/youtube_dl/extractor/tube8.py
+++ b/youtube_dl/extractor/tube8.py
@@ -1,5 +1,7 @@
 from __future__ import unicode_literals

+import re
+
 from ..utils import (
    int_or_none,
    str_to_int,
@@ -21,7 +23,13 @@ class Tube8IE(KeezMoviesIE):
            'title': 'Kasia music video',
            'age_limit': 18,
            'duration': 230,
+            'categories': ['Teen'],
+            'tags': ['dancing'],
+        },
+        'params': {
+            'proxy': '127.0.0.1:8118',
        }
+
    }, {
        'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
        'only_matching': True,
@@ -51,6 +59,17 @@ class Tube8IE(KeezMoviesIE):
            r'<span id="allCommentsCount">(\d+)</span>',
            webpage, 'comment count', fatal=False))

+        category = self._search_regex(
+            r'Category:\s*</strong>\s*<a[^>]+href=[^>]+>([^<]+)',
+            webpage, 'category', fatal=False)
+        categories = [category] if category else None
+
+        tags_str = self._search_regex(
+            r'(?s)Tags:\s*</strong>(.+?)</(?!a)',
+            webpage, 'tags', fatal=False)
+        tags = [t for t in re.findall(
+            r'<a[^>]+href=[^>]+>([^<]+)', tags_str)] if tags_str else None
+
        info.update({
            'description': description,
            'uploader': uploader,
@@ -58,6 +77,8 @@ class Tube8IE(KeezMoviesIE):
            'like_count': like_count,
            'dislike_count': dislike_count,
            'comment_count': comment_count,
+            'categories': categories,
+            'tags': tags,
        })

        return info
--- a/youtube_dl/extractor/tvplay.py
+++ b/youtube_dl/extractor/tvplay.py
@@ -367,6 +367,10 @@ class ViafreeIE(InfoExtractor):
            'skip_download': True,
        },
        'add_ie': [TVPlayIE.ie_key()],
+    }, {
+        # Different og:image URL schema
+        'url': 'www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
+        'only_matching': True,
    }, {
        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
        'only_matching': True,
@@ -384,14 +388,35 @@ class ViafreeIE(InfoExtractor):

        webpage = self._download_webpage(url, video_id)

+        data = self._parse_json(
+            self._search_regex(
+                r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
+                webpage, 'data', default='{}'),
+            video_id, transform_source=lambda x: re.sub(
+                r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
+                'null', x), fatal=False)
+
        video_id = None

-        thumbnail = self._og_search_thumbnail(webpage, default=None)
-        if thumbnail:
-            video_id = self._search_regex(
-                r'https?://[^/]+/imagecache/(?:[^/]+/)+seasons/\d+/(\d{6,})/',
-                thumbnail, 'video id', default=None)
+        if data:
+            video_id = try_get(
+                data, lambda x: x['context']['dispatcher']['stores'][
+                    'ContentPageProgramStore']['currentVideo']['id'],
+                compat_str)

+        # Fallback #1 (extract from og:image URL schema)
+        if not video_id:
+            thumbnail = self._og_search_thumbnail(webpage, default=None)
+            if thumbnail:
+                video_id = self._search_regex(
+                    # Patterns seen:
+                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
+                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
+                    r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
+                    thumbnail, 'video id', default=None)
+
+        # Fallback #2. Extract from raw JSON string.
+        # May extract wrong video id if relatedClips is present.
        if not video_id:
            video_id = self._search_regex(
                r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2016.09.08'
+__version__ = '2016.09.11'
Author	SHA1	Message	Date
Sergey M․	1c81476cbb	release 2016.09.11	2016-09-11 23:20:09 +07:00
Sergey M․	bc9186c882	[tvplay] Remove unused import	2016-09-11 22:51:12 +07:00
Sergey M․	6599c72527	[tube8] Extract categories and tags (Closes #10579 )	2016-09-11 22:50:36 +07:00
Yen Chi Hsuan	6bb05b32a9	[pornhub] Extract categories and tags (closes #10499 )	2016-09-11 19:22:51 +08:00
Yen Chi Hsuan	fea74acad8	[foxnews] Revert to old extractor names	2016-09-11 18:54:24 +08:00
Yen Chi Hsuan	f01115c933	[openload] Temporary fix (#10408 )	2016-09-11 18:36:59 +08:00
Yen Chi Hsuan	2cdbc06a1f	[foxnews] Support Fox News Articles (closes #10598 )	2016-09-11 18:32:45 +08:00
Sergey M․	2cb93afcd8	[viafree] Improve video id extraction (Closes #10615 )	2016-09-11 14:59:14 +07:00
Yen Chi Hsuan	bfcda07a27	[abc:iview] Skip the test. They are removed soon	2016-09-11 04:06:00 +08:00
Yen Chi Hsuan	001a5fd3d7	[iwara] Fix extraction after relaunch Closes #10462, closes #3215	2016-09-11 03:02:00 +08:00
Remita Amine	1e35999c1e	[tfo] Add new extractor	2016-09-10 19:43:31 +01:00
Sergey M․	2512b17493	[lrt] Fix audio extraction (Closes #10566 )	2016-09-11 01:27:20 +07:00
Sergey M․	56c0ead4d3	[9now] Improve video data extraction (Closes #10561 )	2016-09-11 00:42:13 +07:00
Scott Leggett	7324243750	[9now] Fix extraction	2016-09-11 00:16:29 +07:00
Sergey M․	84a18e9b90	[polskieradio:category] Improve extraction	2016-09-10 22:01:49 +07:00
Sergey M․	b29f842e0e	[canalplus] Add support for c8.fr (Closes #10577 )	2016-09-10 20:46:45 +07:00
Sergey M․	f009fcac0d	Merge branch 'master' of github.com:rg3/youtube-dl	2016-09-10 19:21:03 +07:00
Yen Chi Hsuan	6c3affcb18	[newgrounds] Fix uploader extraction Closes #10584 Also change test URLs to HTTPS, as proposed by @stepshal in #10593. Closes #10593	2016-09-10 20:09:09 +08:00
Sergey M․	1e19ff2984	Merge branch 'polskie-radio-programme' of https://github.com/JakubAdamWieczorek/youtube-dl	2016-09-10 00:42:36 +07:00
Sergey M․	c6129feb7f	[ketnet] Add extractor (Closes #10343 )	2016-09-09 23:20:45 +07:00
Sergey M․	bb5ebd4453	[canvas] Add support for een.be (Closes #10605 )	2016-09-09 22:16:21 +07:00
Remita Amine	cb9cbd84ed	[extractors] add import for TeleQuebecIE	2016-09-08 22:55:27 +01:00
Remita Amine	4d5726b0d7	[telequebec] Add new extractor(closes #1999 )	2016-09-08 22:53:44 +01:00
Remita Amine	4614ad7b59	[parliamentliveuk] fix extraction(closes #9137 )	2016-09-08 20:46:12 +01:00
Jakub Adam Wieczorek	8d3737cda7	[polskieradio] Add support for downloading whole programmes. This extends the Polskie Radio (the Polish national radio) extractor to enable the user to download all the broadcasts of a single programme.	2016-09-06 21:34:44 +02:00