1
0
mirror of https://source.netsyms.com/Mirrors/youtube-dl synced 2026-03-25 10:24:44 +00:00

Compare commits

...

34 Commits

Author SHA1 Message Date
Sergey M․
b1ce2ba197 release 2016.08.10 2016-08-10 00:20:44 +07:00
Sergey M․
5c8411e968 [ChangeLog] Actualize 2016-08-10 00:18:28 +07:00
Sergey M․
cc9c8ce5df [devscripts/prepare_manpage] Fix description strings starting with dash (Closes #10273) 2016-08-09 22:24:58 +07:00
Remita Amine
20ef4123b9 [uol] remove unused import 2016-08-09 15:13:15 +01:00
Remita Amine
4e62d26aa2 [uol] Add new extractor(#4263) 2016-08-09 15:09:08 +01:00
Sergey M․
b657816684 Credit @singh-pratyush96 for #10223 2016-08-09 04:04:45 +07:00
Sergey M․
9778b3e7ee Credit @zvonicek for #10242 and #10253 2016-08-09 04:03:52 +07:00
Sergey M․
25dd58ca6a [metadatafromtitle] Remove unused exception class 2016-08-09 04:01:05 +07:00
nyorain
5e42f8a0ad Make --metadata-from-title non fatal
Output a warning if the metadata can't be parsed from the title (and don't write any metadata) instead of raising a critical error.
2016-08-09 03:56:22 +07:00
Sergey M․
1ad6b891b2 Add more checks for --min/max-sleep-interval arguments and use more idiomatic naming 2016-08-09 03:47:56 +07:00
Sergey M․
7aa589a5e1 Fix --min/max-sleep-interval wording 2016-08-09 03:46:52 +07:00
singh-pratyush96
065bc35489 Add --max-sleep-interval (Closes #9930) 2016-08-09 03:32:42 +07:00
Sergey M․
3a380766d1 [rbmaradio] Improve, simplify and extract all formats (Closes #10242) 2016-08-09 02:46:29 +07:00
Petr Zvoníček
affaea0688 [rbmaradio] Fixed extractor 2016-08-09 02:18:33 +07:00
Sergey M․
77426a087b [sonyliv] Improve (Closes #10258) 2016-08-09 02:16:28 +07:00
Sukhbir Singh
8991844ea2 [sonyliv] Add new extractor 2016-08-09 02:09:13 +07:00
Sergey M․
082395d0a0 [extractor/generic] Add proper default to _search_json_ld call 2016-08-08 22:48:33 +07:00
Sergey M․
e8ed7354e6 [flipagram] Add proper default to _search_json_ld call 2016-08-08 22:46:19 +07:00
Sergey M․
1e7f602e2a [condenast] Make _search_json_ld call non fatal 2016-08-08 22:45:49 +07:00
Sergey M․
522f6c066d [bbc] Add proper default to _search_json_ld call 2016-08-08 22:44:36 +07:00
Sergey M․
321b5e082a [extractor/common] Respect default in _search_json_ld 2016-08-08 22:36:18 +07:00
Sergey M․
3711fa1eb2 Revert "[flipagram] Make _search_json_ld non fatal"
This reverts commit d34995a9e3.
2016-08-08 21:49:45 +07:00
Sergey M․
395c74615c Revert "[extractor/generic] Make _search_json_ld non fatal"
This reverts commit 958849275f.
2016-08-08 21:49:27 +07:00
Yen Chi Hsuan
3dc240e8c6 [sohu] Update _TESTS (closes #10260) 2016-08-08 18:48:21 +08:00
Yen Chi Hsuan
a41a6c5094 [chaturbate] Skip the invalid test 2016-08-08 13:06:02 +08:00
Yen Chi Hsuan
d71207121d [biqle] Skip an invalid test 2016-08-08 12:59:55 +08:00
Yen Chi Hsuan
b1c6f21c74 [aparat] Fix extraction 2016-08-08 12:59:07 +08:00
Yen Chi Hsuan
412abb8760 [bilibili] Update _TESTS 2016-08-08 12:57:17 +08:00
Yen Chi Hsuan
f17d5f6d14 [features.aol.com] Fix _TESTS 2016-08-08 12:52:36 +08:00
Remita Amine
6bb801cfaf [cwtv] extract http formats 2016-08-07 22:58:12 +01:00
Sergey M․
de02d1f4e9 [rozhlas] Fix regexes and improve extraction (Closes #10253) 2016-08-08 04:58:02 +07:00
Petr Zvoníček
e1f93a0a76 [rozhlas] Add new extractor 2016-08-08 04:41:45 +07:00
Charlie Le
d21a661bb4 [README.md] Update Options Link
The link references a bad anchor. The updated link now references the correct anchor.
2016-08-08 03:46:42 +07:00
Yen Chi Hsuan
b2bd968f4b [kuwo:singer] Fix extraction 2016-08-07 22:59:34 +08:00
31 changed files with 427 additions and 149 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.07**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.10*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.10**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.08.07
[debug] youtube-dl version 2016.08.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -179,3 +179,5 @@ Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh

View File

@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?

View File

@@ -1,3 +1,21 @@
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core

View File

@@ -330,7 +330,15 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv
or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each
download.
download when used alone or a lower bound
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT
@@ -1196,7 +1204,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report?

View File

@@ -54,17 +54,21 @@ def filter_options(readme):
if in_options:
if line.lstrip().startswith('-'):
option, description = re.split(r'\s{2,}', line.lstrip())
split_option = option.split(' ')
split = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
else:
ret += line.lstrip() + '\n'
# Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information.
ret += '\n%s\n: %s\n' % (option, description)
continue
ret += line.lstrip() + '\n'
else:
ret += line + '\n'

View File

@@ -564,6 +564,7 @@
- **RoosterTeeth**
- **RottenTomatoes**
- **Roxwel**
- **Rozhlas**
- **RTBF**
- **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio
@@ -621,6 +622,7 @@
- **smotri:user**: Smotri.com user videos
- **Snotr**
- **Sohu**
- **SonyLIV**
- **soundcloud**
- **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search
@@ -747,6 +749,7 @@
- **udemy:course**
- **UDNEmbed**: 聯合影音
- **Unistra**
- **uol.com.br**
- **Urort**: NRK P3 Urørt
- **URPlay**
- **USAToday**

View File

@@ -249,7 +249,16 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download.
sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of

View File

@@ -145,6 +145,16 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@@ -370,6 +380,7 @@ def _real_main(argv=None):
'source_address': opts.source_address,
'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,

View File

@@ -4,6 +4,7 @@ import os
import re
import sys
import time
import random
from ..compat import compat_os_name
from ..utils import (
@@ -342,8 +343,11 @@ class FileDownloader(object):
})
return True
sleep_interval = self.params.get('sleep_interval')
if sleep_interval:
min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
print(min_sleep_interval, max_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval)

View File

@@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016',
},
'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
_TEST = {
'url': 'http://www.aparat.com/v/wP8On',
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
'info_dict': {
'id': 'wP8On',
'ext': 'mp4',
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
video_id + '/vt/frame')
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
webpage = self._download_webpage(embed_url, video_id)
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
for i, video_url in enumerate(video_urls):
file_list = self._parse_json(self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url)
res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@@ -759,7 +759,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title')

View File

@@ -25,13 +25,13 @@ class BiliBiliIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
'id': '1554319',
'ext': 'flv',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'duration': 308.315,
'timestamp': 1398012660,
'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg',
@@ -41,73 +41,33 @@ class BiliBiliIE(InfoExtractor):
}, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'id': '1507019',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'timestamp': 1396530060,
'upload_date': '20140403',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'id': '7802182',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '2880301',
'ext': 'flv',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'uploader': '黑夜为猫',

View File

@@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov',
}
},
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}]
def _real_extract(self, url):

View File

@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'skip': 'Room is offline',
}, {
'url': 'https://en.chaturbate.com/siswet19/',
'only_matching': True,

View File

@@ -816,11 +816,14 @@ class InfoExtractor(object):
json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return {}
return self._json_ld(
json_ld, video_id, fatal=kwargs.get('fatal', True),
expected_type=expected_type)
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):

View File

@@ -143,7 +143,8 @@ class CondeNastIE(InfoExtractor):
})
self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id) if url_type != 'embed' else {}
info = self._search_json_ld(
webpage, video_id, fatal=False) if url_type != 'embed' else {}
info.update({
'id': video_id,
'formats': formats,

View File

@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
'params': {
# m3u8 download
'skip_download': True,
}
},
'skip': 'redirect to http://cwtv.com/shows/arrow/',
}, {
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
'info_dict': {
@@ -44,10 +45,6 @@ class CWTVIE(InfoExtractor):
'upload_date': '20151006',
'timestamp': 1444107300,
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
@@ -61,11 +58,30 @@ class CWTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
video_data = None
formats = []
for partner in (154, 213):
vdata = self._download_json(
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
self._sort_formats(formats)
thumbnails = [{

View File

@@ -696,6 +696,7 @@ from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rozhlas import RozhlasIE
from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
@@ -755,6 +756,7 @@ from .smotri import (
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .sonyliv import SonyLIVIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
@@ -927,6 +929,7 @@ from .udemy import (
from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .uol import UOLIE
from .urort import UrortIE
from .urplay import URPlayIE
from .usatoday import USATodayIE

View File

@@ -48,7 +48,7 @@ class FlipagramIE(InfoExtractor):
flipagram = video_data['flipagram']
video = flipagram['video']
json_ld = self._search_json_ld(webpage, video_id, fatal=False)
json_ld = self._search_json_ld(webpage, video_id, default={})
title = json_ld.get('title') or flipagram['captionText']
description = json_ld.get('description') or flipagram.get('captionText')

View File

@@ -2241,8 +2241,8 @@ class GenericIE(InfoExtractor):
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, fatal=False, expected_type='VideoObject')
if json_ld and json_ld.get('url'):
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
get_element_by_id,
clean_html,
@@ -242,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
return [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
for song_url in re.findall(
r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
webpage)
]

View File

@@ -1,55 +1,71 @@
# encoding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
unified_timestamp,
update_url_query,
)
class RBMARadioIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<videoID>[^/]+)$'
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<show_id>[^/]+)/episodes/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
'url': 'https://www.rbmaradio.com/shows/main-stage/episodes/ford-lopatin-live-at-primavera-sound-2011',
'md5': '6bc6f9bcb18994b4c983bc3bf4384d95',
'info_dict': {
'id': 'ford-lopatin-live-at-primavera-sound-2011',
'ext': 'mp3',
'uploader_id': 'ford-lopatin',
'location': 'Spain',
'description': 'Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
'uploader': 'Ford & Lopatin',
'title': 'Live at Primavera Sound 2011',
'title': 'Main Stage - Ford & Lopatin',
'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2452,
'timestamp': 1307103164,
'upload_date': '20110603',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('videoID')
mobj = re.match(self._VALID_URL, url)
show_id = mobj.group('show_id')
episode_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, episode_id)
json_data = self._search_regex(r'window\.gon.*?gon\.show=(.+?);$',
webpage, 'json data', flags=re.MULTILINE)
episode = self._parse_json(
self._search_regex(
r'__INITIAL_STATE__\s*=\s*({.+?})\s*</script>',
webpage, 'json data'),
episode_id)['episodes'][show_id][episode_id]
try:
data = json.loads(json_data)
except ValueError as e:
raise ExtractorError('Invalid JSON: ' + str(e))
title = episode['title']
video_url = data['akamai_url'] + '&cbr=256'
show_title = episode.get('showTitle')
if show_title:
title = '%s - %s' % (show_title, title)
formats = [{
'url': update_url_query(episode['audioURL'], query={'cbr': abr}),
'format_id': compat_str(abr),
'abr': abr,
'vcodec': 'none',
} for abr in (96, 128, 256)]
description = clean_html(episode.get('longTeaser'))
thumbnail = self._proto_relative_url(episode.get('imageURL', {}).get('landscape'))
duration = int_or_none(episode.get('duration'))
timestamp = unified_timestamp(episode.get('publishedAt'))
return {
'id': video_id,
'url': video_url,
'title': data['title'],
'description': data.get('teaser_text'),
'location': data.get('country_of_origin'),
'uploader': data.get('host', {}).get('name'),
'uploader_id': data.get('host', {}).get('slug'),
'thumbnail': data.get('image', {}).get('large_url_2x'),
'duration': data.get('duration'),
'id': episode_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -0,0 +1,50 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_start,
)
class RozhlasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?prehravac\.rozhlas\.cz/audio/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://prehravac.rozhlas.cz/audio/3421320',
'md5': '504c902dbc9e9a1fd50326eccf02a7e2',
'info_dict': {
'id': '3421320',
'ext': 'mp3',
'title': 'Echo Pavla Klusáka (30.06.2015 21:00)',
'description': 'Osmdesátiny Terryho Rileyho jsou skvělou příležitostí proletět se elektronickými i akustickými díly zakladatatele minimalismu, který je aktivní už přes padesát let'
}
}, {
'url': 'http://prehravac.rozhlas.cz/audio/3421320/embed',
'skip_download': True,
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(
'http://prehravac.rozhlas.cz/audio/%s' % audio_id, audio_id)
title = self._html_search_regex(
r'<h3>(.+?)</h3>\s*<p[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
webpage, 'title', default=None) or remove_start(
self._og_search_title(webpage), 'Radio Wave - ')
description = self._html_search_regex(
r'<p[^>]+title=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
webpage, 'description', fatal=False, group='url')
duration = int_or_none(self._search_regex(
r'data-duration=["\'](\d+)', webpage, 'duration', default=None))
return {
'id': audio_id,
'url': 'http://media.rozhlas.cz/_audio/%s.mp3' % audio_id,
'title': title,
'description': description,
'duration': duration,
'vcodec': 'none',
}

View File

@@ -14,10 +14,10 @@ from ..utils import ExtractorError
class SohuIE(InfoExtractor):
_VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
# Sohu videos give different MD5 sums on Travis CI and my machine
_TESTS = [{
'note': 'This video is available only in Mainland China',
'url': 'http://tv.sohu.com/20130724/n382479172.shtml#super',
'md5': '29175c8cadd8b5cc4055001e85d6b372',
'info_dict': {
'id': '382479172',
'ext': 'mp4',
@@ -26,7 +26,6 @@ class SohuIE(InfoExtractor):
'skip': 'On available in China',
}, {
'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
'md5': '699060e75cf58858dd47fb9c03c42cfb',
'info_dict': {
'id': '409385080',
'ext': 'mp4',
@@ -34,7 +33,6 @@ class SohuIE(InfoExtractor):
}
}, {
'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
'md5': '9bf34be48f2f4dadcb226c74127e203c',
'info_dict': {
'id': '78693464',
'ext': 'mp4',
@@ -48,7 +46,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
},
'playlist': [{
'md5': 'bdbfb8f39924725e6589c146bc1883ad',
'info_dict': {
'id': '78910339_part1',
'ext': 'mp4',
@@ -56,7 +53,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
'info_dict': {
'id': '78910339_part2',
'ext': 'mp4',
@@ -64,7 +60,6 @@ class SohuIE(InfoExtractor):
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
}
}, {
'md5': '8407e634175fdac706766481b9443450',
'info_dict': {
'id': '78910339_part3',
'ext': 'mp4',

View File

@@ -0,0 +1,34 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class SonyLIVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
'id': '5024612095001',
'ext': 'mp4',
'upload_date': '20160707',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
'uploader_id': '4338955589001',
'timestamp': 1467870968,
},
'params': {
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}, {
'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
brightcove_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

128
youtube_dl/extractor/uol.py Normal file
View File

@@ -0,0 +1,128 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
parse_duration,
update_url_query,
str_or_none,
)
class UOLIE(InfoExtractor):
IE_NAME = 'uol.com.br'
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
_TESTS = [{
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
'md5': '25291da27dc45e0afb5718a8603d3816',
'info_dict': {
'id': '15951931',
'ext': 'mp4',
'title': 'Miss simpatia é encontrada morta',
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
}
}, {
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
'info_dict': {
'id': '15954259',
'ext': 'mp4',
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
}
}, {
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/15954259',
'only_matching': True,
}, {
'url': 'http://noticias.band.uol.com.br/brasilurgente/video/2016/08/05/15951931/miss-simpatia-e-encontrada-morta.html',
'only_matching': True,
}, {
'url': 'http://videos.band.uol.com.br/programa.asp?e=noticias&pr=brasil-urgente&v=15951931&t=Policia-desmonte-base-do-PCC-na-Cracolandia',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/cphaa0gl2x8r/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'only_matching': True,
}, {
'url': 'http://noticias.uol.com.br//videos/assistir.htm?video=rafaela-silva-inspira-criancas-no-judo-04024D983968D4C95326',
'only_matching': True,
}, {
'url': 'http://mais.uol.com.br/view/e0qbgxid79uv/15275470',
'only_matching': True,
}]
_FORMATS = {
'2': {
'width': 640,
'height': 360,
},
'5': {
'width': 1080,
'height': 720,
},
'6': {
'width': 426,
'height': 240,
},
'7': {
'width': 1920,
'height': 1080,
},
'8': {
'width': 192,
'height': 144,
},
'9': {
'width': 568,
'height': 320,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
if not video_id.isdigit():
embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
video_data = self._download_json(
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
video_id)['item']
title = video_data['title']
query = {
'ver': video_data.get('numRevision', 2),
'r': 'http://mais.uol.com.br',
}
formats = []
for f in video_data.get('formats', []):
f_url = f.get('url') or f.get('secureUrl')
if not f_url:
continue
format_id = str_or_none(f.get('id'))
fmt = {
'format_id': format_id,
'url': update_url_query(f_url, query),
}
fmt.update(self._FORMATS.get(format_id, {}))
formats.append(fmt)
self._sort_formats(formats)
tags = []
for tag in video_data.get('tags', []):
tag_description = tag.get('description')
if not tag_description:
continue
tags.append(tag_description)
return {
'id': video_id,
'title': title,
'description': clean_html(video_data.get('desMedia')),
'thumbnail': video_data.get('thumbnail'),
'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
'tags': tags,
'formats': formats,
}

View File

@@ -499,9 +499,20 @@ def parseOpts(overrideArguments=None):
dest='bidi_workaround', action='store_true',
help='Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
workarounds.add_option(
'--sleep-interval', metavar='SECONDS',
'--sleep-interval', '--min-sleep-interval', metavar='SECONDS',
dest='sleep_interval', type=float,
help='Number of seconds to sleep before each download.')
help=(
'Number of seconds to sleep before each download when used alone '
'or a lower bound of a range for randomized sleep before each download '
'(minimum possible number of seconds to sleep) when used along with '
'--max-sleep-interval.'))
workarounds.add_option(
'--max-sleep-interval', metavar='SECONDS',
dest='max_sleep_interval', type=float,
help=(
'Upper bound of a range for randomized sleep before each download '
'(maximum possible number of seconds to sleep). Must only be used '
'along with --min-sleep-interval.'))
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
verbosity.add_option(

View File

@@ -3,11 +3,6 @@ from __future__ import unicode_literals
import re
from .common import PostProcessor
from ..utils import PostProcessingError
class MetadataFromTitlePPError(PostProcessingError):
pass
class MetadataFromTitlePP(PostProcessor):
@@ -38,7 +33,8 @@ class MetadataFromTitlePP(PostProcessor):
title = info['title']
match = re.match(self._titleregex, title)
if match is None:
raise MetadataFromTitlePPError('Could not interpret title of video as "%s"' % self._titleformat)
self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
return [], info
for attribute, value in match.groupdict().items():
value = match.group(attribute)
info[attribute] = value

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.08.07'
__version__ = '2016.08.10'