1
0
mirror of https://source.netsyms.com/Mirrors/youtube-dl synced 2026-03-29 23:23:54 +00:00

Compare commits

...

92 Commits

Author SHA1 Message Date
Philipp Hagemeister
62cdb96f51 release 2016.03.26 2016-03-26 08:58:03 +01:00
Sergey M․
e289d6d62c [test_compat] Add tests for compat_urllib_parse_urlencode 2016-03-26 02:38:33 +06:00
Sergey M․
6e6bc8dae5 Use urlencode_postdata across the codebase 2016-03-26 02:19:24 +06:00
Sergey M․
15707c7e02 [compat] Add compat_urllib_parse_urlencode and eliminate encode_dict
encode_dict functionality has been improved and moved directly into compat_urllib_parse_urlencode
All occurrences of compat_urllib_parse.urlencode throughout the codebase have been replaced by compat_urllib_parse_urlencode

Closes #8974
2016-03-26 01:46:57 +06:00
Sergey M․
2156f16ca7 [thescene] Fix extraction and improve style (Closes #8978) 2016-03-25 20:14:34 +06:00
Sergey M․
4db441de72 [once] Relax _VALID_URL (Closes #8976) 2016-03-25 19:51:28 +06:00
Philipp Hagemeister
0be8314dc8 release 2016.03.25 2016-03-25 09:27:18 +01:00
Yen Chi Hsuan
d7f62b049a [iqiyi] Update enc_key 2016-03-25 15:45:40 +08:00
Yen Chi Hsuan
3bb3356812 [douyutv] Extend _VALID_URL 2016-03-25 15:43:29 +08:00
Sergey M․
3f15fec1d1 Credit @Kagami for mnet (#8958) 2016-03-25 03:56:27 +06:00
Sergey M․
98e68806fb [mnet] Improve (Closes #8958) 2016-03-25 03:26:29 +06:00
Kagami Hiiragi
e031768666 [mnet] Add new extractor 2016-03-25 02:32:06 +06:00
Sergey M․
5eb7db4ee9 [udemy] Add support for new URL schema 2016-03-25 02:28:39 +06:00
Sergey M․
f0e83681d9 [udemy] Extract formats from outputs 2016-03-25 02:27:13 +06:00
Sergey M․
ff9d5d0938 [udemy] Improve course enrolling 2016-03-25 02:26:46 +06:00
Sergey M․
d041a73674 [extractor/__init__] Add youtube:live and sort youtube extractors alphabetically 2016-03-25 01:39:25 +06:00
Sergey M․
f07e276a04 [youtube:live] Add extractor (Closes #8959) 2016-03-25 01:18:14 +06:00
Sergey M․
993271da0a [nytimes] Tolerate missing metadata (Closes #8952) 2016-03-24 23:28:24 +06:00
Sergey M․
369e7e3ff0 [iprima] Fix extraction (Closes #8953) 2016-03-24 22:54:26 +06:00
Sergey M․
5767b4eeae [mtv] Fix description extraction (Closes #8962) 2016-03-24 22:23:31 +06:00
Yen Chi Hsuan
622d19160b [utils] Clarify Python versions affected by buggy struct module 2016-03-24 18:06:15 +08:00
Yen Chi Hsuan
32d88410eb [tumblr] Add a test with Instagram embed
Closes #8817
2016-03-24 16:32:53 +08:00
Yen Chi Hsuan
5a51775a58 [generic] Extract Instagram embeds (#8817) 2016-03-24 16:32:27 +08:00
Yen Chi Hsuan
87696e78d7 [instagram] Unescape description (#8817) 2016-03-24 16:30:01 +08:00
Yen Chi Hsuan
c4096e8aea [instagram] Extract embed videos (#8817) 2016-03-24 16:29:33 +08:00
Yen Chi Hsuan
fc27ea9464 [tumblr] Support Vine embeds (#8817) 2016-03-23 23:55:52 +08:00
Yen Chi Hsuan
088e1aac59 [generic] Support Vine embeds (#8817) 2016-03-23 23:55:08 +08:00
Yen Chi Hsuan
81f36eba88 [test/test_utils] Update for escape_url change (again) 2016-03-23 23:23:26 +08:00
Yen Chi Hsuan
2d60465e44 [test/test_utils] Update for escape_url change 2016-03-23 23:20:28 +08:00
Sergey M
4333d56494 Merge pull request #8898 from dstftw/fragment-retries
Add --fragment-retries option (Fixes #8466)
2016-03-23 20:12:32 +05:00
Sergey M․
882c699296 [tunein] Fix stream data extraction (Closes #8899, closes #8924) 2016-03-23 20:45:39 +06:00
Yen Chi Hsuan
efbed08dc2 [utils] Encode hostnames before passing to urllib
With IDN (Internationalized Domain Name) and a proxy, non-ascii URLs
are passed down to urllib/urllib2, causing UnicodeEncodeError

Fixes #8890
2016-03-23 22:24:52 +08:00
Jaime Marquínez Ferrándiz
7da2c87119 Add extractor for thescene.com (closes #8929) 2016-03-22 22:17:59 +01:00
Sergey M․
c6ca11f1b3 [once] Prevent ads from embedding into m3u8 playlists (Closes #8893) 2016-03-22 23:48:05 +06:00
Sergey M․
2beeb286e1 [laola1tv] Add support for livestreams (Closes #8934) 2016-03-22 22:32:59 +06:00
Sergey M․
cc7397b04d [ceskatelevize] Make m3u8 formats extraction non fatal (Closes #8933) 2016-03-22 21:12:29 +06:00
Sergey M․
bc5d16b302 [animeondemand] Skip dash for now 2016-03-21 23:37:39 +06:00
Sergey M․
85c637b737 [animeondemand] Extract teaser when no full episode available (#8923) 2016-03-21 23:35:50 +06:00
Sergey M․
5c69f7a479 [animeondemand] Respect startvideo (Closes #8923) 2016-03-21 23:31:40 +06:00
Sergey M․
ff5873b72d [motherless] Detect friends only videos 2016-03-21 22:24:42 +06:00
Sergey M․
065c4b27bf [xhamster:embed] Extract vars (Closes #8912) 2016-03-21 22:07:34 +06:00
Sergey M․
1600ed1ff9 [rutv] Improve flash version pattern (Closes #8911) 2016-03-21 21:46:49 +06:00
Sergey M․
5886b38d73 Add support for https for all extractors as preventive and future-proof measure 2016-03-21 21:36:32 +06:00
Sergey M․
0cef27ad25 Add missing r prefix for _VALID_URLs 2016-03-21 21:22:37 +06:00
Sergey M․
12af4beb3e [mailru] Add support for https (Closes #8920) 2016-03-21 21:17:29 +06:00
Sergey M․
9016d76f71 [YoutubeDL] Improve _format_note 2016-03-20 22:01:45 +06:00
Sergey M․
3c5d183c19 [animeondemand] Extract all formats (Closes #8906) 2016-03-20 21:51:22 +06:00
Sergey M․
3e8bb9a972 [animeondemand] Detect geo restriction 2016-03-20 20:39:00 +06:00
Yen Chi Hsuan
daef04a4e7 [kwuo] Fix KuwoChartIE and KuwoSingerIE and accept new URL forms 2016-03-20 20:17:56 +08:00
Yen Chi Hsuan
7caae128a7 Credit @vitstradal for the key algorithm in OpenloadIE (#8489)
[ci skip]
2016-03-20 19:12:02 +08:00
Yen Chi Hsuan
2648918c81 [vlive] Fix creator extraction (closes #8814) 2016-03-20 18:15:53 +08:00
Jaime Marquínez Ferrándiz
920d318d3c README: document that BSD make is also supported (#8902) 2016-03-20 10:55:14 +01:00
Yen Chi Hsuan
9e3c2f1d74 [openload] Misc improvements
* Add thumbnail
* Detect errors (#6469)
* Match more (#6469, #8489)
2016-03-20 16:49:44 +08:00
Yen Chi Hsuan
2bfeee69b9 [openload] Add new extractor (closes #8489) 2016-03-20 15:54:58 +08:00
Yen Chi Hsuan
664bcd80b9 [tudou] Use InAdvancePagedList (closes #8884) 2016-03-20 15:45:31 +08:00
Sergey M․
3c20208eff [francetv] Improve formats extraction 2016-03-20 13:00:46 +06:00
Sergey M․
db264e3cc3 [francetvinfo] Add support for france3-regions and strip title (Closes #7673) 2016-03-20 12:44:04 +06:00
Sergey M
d396f30467 Merge pull request #8902 from jaimeMF/bmake
Makefile: make it compatible with bmake
2016-03-20 11:08:57 +05:00
Sergey M․
96a9f22d98 [discovery] Relax _VALID_URL (Closes #8903) 2016-03-20 10:26:58 +06:00
Sergey M․
40025ee2a3 [postprocessort/ffmpeg] Allow embedding webvtt into webm (Closes #8874) 2016-03-20 04:12:34 +06:00
Jaime Marquínez Ferrándiz
3ff63fb365 Makefile: make it compatible with bmake
It's the portable version of BSD make: http://crufty.net/help/sjg/bmake.html
The syntax for conditionals is different in GNU make and BSD make, so we use the shell
2016-03-19 21:51:13 +01:00
Jaime Marquínez Ferrándiz
5c7cd37ebd tox.ini: Exclude test_iqiyi_sdk_interpreter.py 2016-03-19 21:50:16 +01:00
Sergey M․
298c04b464 [91porn] Use common messages' wording 2016-03-20 02:35:48 +06:00
Sergey M․
d95114dd83 [91porn] Unquote final URL (Closes #8881) 2016-03-20 02:34:02 +06:00
Sergey M․
94dcade8f8 Credit @jjatria for biobiochiletv (#7314) 2016-03-20 01:36:20 +06:00
Sergey M․
fa023ccb2c [biobiochiletv] Fix extraction, extract m3u8 formats and overall improve (Closes #7314) 2016-03-20 01:31:55 +06:00
jjatria
e36f4aa72b [biobiotv] Add extractor 2016-03-20 01:29:08 +06:00
Sergey M․
9261e347cc Credit @kasper93 for cda (#8805) 2016-03-19 23:18:04 +06:00
Sergey M․
f1ced6df51 [cda] Improve and simplify (Closes #8805) 2016-03-19 23:17:14 +06:00
Kacper Michajłow
8b0d7a66ef [cda] Add new extractor for cda.pl
Fixes #8760
2016-03-19 22:42:40 +06:00
Sergey M․
3aec71766d [safari:api] Separate extractor (Closes #8871) 2016-03-19 22:30:48 +06:00
Sergey M․
16a8b7986b [downloader/fragment] Document fragment_retries 2016-03-19 20:54:21 +06:00
Sergey M․
617e58d850 [downloader/{common,fragment}] Fix total retries reporting on python 2.6 2016-03-19 20:51:30 +06:00
Sergey M․
e33baba0dd [downloader/dash] Add fragment retry capability
YouTube may often return 404 HTTP error for a fragment causing the
whole download to fail. However if the same fragment is immediately
retried with the same request data this usually succeeds (1-2 attemps
is usually enough) thus allowing to download the whole file successfully.
So, we will retry all fragments that fail with 404 HTTP error for now.
2016-03-19 20:42:23 +06:00
Sergey M․
721f26b821 [downloader/fragment] Add report_retry_fragment 2016-03-19 20:41:24 +06:00
Sergey M․
52bb437e41 [options] Add --fragment-retries option 2016-03-19 20:40:36 +06:00
Jaime Marquínez Ferrándiz
782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 2016-03-19 11:44:49 +01:00
Sergey M․
0d769bcb78 [extractor/generic] Fix missing byte literal prefix 2016-03-19 05:43:43 +06:00
remitamine
4cd70099ea [hbo] Add new extractor 2016-03-18 21:18:18 +01:00
Jaime Marquínez Ferrándiz
09fc33198a utils: lookup_unit_table: Use a stricter regex
In parse_count multiple units start with the same letter, so it would match different units depending on the order they were sorted when iterating over them.
2016-03-18 19:23:06 +01:00
Sergey M․
4c3b16d5d1 [test_YoutubeDL] Add test for format_id format selection 2016-03-19 00:04:26 +06:00
John Peel
d5aacf9a90 Added format_id to the filers on -f. 2016-03-18 23:59:24 +06:00
Sergey M․
19e2617a6f [commonprotocols] Add generic support for rtmp URLs (Closes #8488) 2016-03-18 23:42:15 +06:00
Sergey M․
edd9b71c2c [extractor/generic] Add a test for m3u playlist served without proper Content-Type 2016-03-18 22:49:11 +06:00
Sergey M․
5940862d5a [extractor/generic] Detect m3u playlists served without proper Content-Type 2016-03-18 22:45:28 +06:00
Sergey M․
de6c51e88e [extractor/generic] Fix direct link semantics 2016-03-18 22:43:07 +06:00
Sergey M․
303dcdb995 [extractor/generic] Simplify upload_date extraction 2016-03-18 22:41:16 +06:00
Sergey M․
20938f768b [extractor/generic] Add another test for generic m3u8 2016-03-18 21:54:33 +06:00
Sergey M․
955737b2d4 [extractor/generic] Force Content-Type to lowecase 2016-03-18 21:50:44 +06:00
Sergey M․
263eff9537 [extractor/generic] Properly extract format id from Content-Type
Fixes extraction for cases like: audio/x-mpegURL; charset=utf-8
2016-03-18 21:50:10 +06:00
Sergey M․
cae21032ab [theplatform] Improve geo restriction detection 2016-03-18 21:08:25 +06:00
remitamine
6187091532 [once] check http formats availability 2016-03-18 11:51:34 +01:00
235 changed files with 1681 additions and 641 deletions

View File

@@ -163,3 +163,7 @@ Patrick Griffis
Aidan Rowe
mutantmonkey
Ben Congdon
Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi

View File

@@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* make (both GNU make and BSD make are supported)
* pandoc
* zip
* nosetests

View File

@@ -12,15 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR)

View File

@@ -164,6 +164,8 @@ which means you can modify it, redistribute it or use it however you like.
(e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or
"infinite".
--fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer Do not automatically adjust the buffer
@@ -376,8 +378,8 @@ which means you can modify it, redistribute it or use it however you like.
--no-post-overwrites Do not overwrite post-processed files; the
post-processed files are overwritten by
default
--embed-subs Embed subtitles in the video (only for mkv
and mp4 videos)
--embed-subs Embed subtitles in the video (only for mp4,
webm and mkv videos)
--embed-thumbnail Embed thumbnail in the audio as cover art
--add-metadata Write metadata to the video file
--metadata-from-title FORMAT Parse additional metadata like song title /
@@ -831,7 +833,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* make (both GNU make and BSD make are supported)
* pandoc
* zip
* nosetests

View File

@@ -74,6 +74,7 @@
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BioBioChileTV**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
@@ -100,6 +101,7 @@
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CDA**
- **CeskaTelevize**
- **channel9**: Channel 9
- **Chaturbate**
@@ -244,6 +246,7 @@
- **GPUTechConf**
- **Groupon**
- **Hark**
- **HBO**
- **HearThisAt**
- **Heise**
- **HellPorno**
@@ -344,6 +347,7 @@
- **MiTele**: mitele.es
- **mixcloud**
- **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
@@ -440,6 +444,7 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **Openload**
- **OraTV**
- **orf:fm4**: radio FM4
- **orf:iptv**: iptv.ORF.at
@@ -525,6 +530,7 @@
- **RUTV**: RUTV.RU
- **Ruutu**
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
- **Sandia**: Sandia National Laboratories
- **Sapo**: SAPO Vídeos
@@ -618,6 +624,7 @@
- **TheOnion**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
- **TheSixtyOne**
- **TheStar**
- **ThisAmericanLife**
@@ -786,6 +793,7 @@
- **youtube:channel**: YouTube.com channels
- **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:live**: YouTube.com live streams
- **youtube:playlist**: YouTube.com playlists
- **youtube:playlists**: YouTube.com user/channel playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)

View File

@@ -222,6 +222,11 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
ydl = YDL({'format': 'bestvideo[format_id^=dash][format_id$=low]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
formats = [
{'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
]

View File

@@ -19,6 +19,7 @@ from youtube_dl.compat import (
compat_str,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
)
@@ -70,6 +71,12 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
def test_compat_urllib_parse_urlencode(self):
self.assertEqual(compat_urllib_parse_urlencode({'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@@ -1,4 +1,5 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
@@ -120,5 +121,14 @@ class TestProxy(unittest.TestCase):
response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url))
def test_proxy_with_idn(self):
ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port),
})
url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8')
# b'xn--fiq228c' is '中文'.encode('idna')
self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
if __name__ == '__main__':
unittest.main()

View File

@@ -577,11 +577,11 @@ class TestUtil(unittest.TestCase):
)
self.assertEqual(
escape_url('http://тест.рф/фрагмент'),
'http://тест.рф/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
'http://xn--e1aybc.xn--p1ai/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
)
self.assertEqual(
escape_url('http://тест.рф/абв?абв=абв#абв'),
'http://тест.рф/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
'http://xn--e1aybc.xn--p1ai/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
)
self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0')
@@ -702,6 +702,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1.000'), 1000)
self.assertEqual(parse_count('1.1k'), 1100)
self.assertEqual(parse_count('1.1kk'), 1100000)
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1.1kk views'), 1100000)
def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,))

View File

@@ -8,6 +8,6 @@ deps =
passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -905,7 +905,7 @@ class YoutubeDL(object):
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
@@ -1836,7 +1836,7 @@ class YoutubeDL(object):
if fdict.get('language'):
if res:
res += ' '
res += '[%s]' % fdict['language']
res += '[%s] ' % fdict['language']
if fdict.get('format_note') is not None:
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:

View File

@@ -144,14 +144,20 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit
if opts.retries is not None:
if opts.retries in ('inf', 'infinite'):
opts_retries = float('inf')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
parsed_retries = float('inf')
else:
try:
opts_retries = int(opts.retries)
parsed_retries = int(retries)
except (TypeError, ValueError):
parser.error('invalid retry count specified')
return parsed_retries
if opts.retries is not None:
opts.retries = parse_retries(opts.retries)
if opts.fragment_retries is not None:
opts.fragment_retries = parse_retries(opts.fragment_retries)
if opts.buffersize is not None:
numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
if numeric_buffersize is None:
@@ -299,7 +305,8 @@ def _real_main(argv=None):
'force_generic_extractor': opts.force_generic_extractor,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts_retries,
'retries': opts.retries,
'fragment_retries': opts.fragment_retries,
'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl,

View File

@@ -169,6 +169,31 @@ except ImportError: # Python 2
string = string.replace('+', ' ')
return compat_urllib_parse_unquote(string, encoding, errors)
try:
from urllib.parse import urlencode as compat_urllib_parse_urlencode
except ImportError: # Python 2
# Python 2 will choke in urlencode on mixture of byte and unicode strings.
# Possible solutions are to either port it from python 3 with all
# the friends or manually ensure input query contains only byte strings.
# We will stick with latter thus recursively encoding the whole query.
def compat_urllib_parse_urlencode(query, doseq=0, encoding='utf-8'):
def encode_elem(e):
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
e = encode_list(e)
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e
def encode_dict(d):
return dict((encode_elem(k), encode_elem(v)) for k, v in d.items())
def encode_list(l):
return [encode_elem(e) for e in l]
return compat_urllib_parse.urlencode(encode_elem(query), doseq=doseq)
try:
from urllib.request import DataHandler as compat_urllib_request_DataHandler
except ImportError: # Python < 3.4
@@ -588,6 +613,7 @@ __all__ = [
'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse',
'compat_urllib_request',
'compat_urllib_request_DataHandler',

View File

@@ -115,6 +115,10 @@ class FileDownloader(object):
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % format_bytes(speed))
@staticmethod
def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries
@staticmethod
def best_block_size(elapsed_time, bytes):
new_min = max(bytes / 2.0, 1.0)
@@ -297,7 +301,9 @@ class FileDownloader(object):
def report_retry(self, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %.0f)...' % (count, retries))
self.to_screen(
'[download] Got server HTTP error. Retrying (attempt %d of %s)...'
% (count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""

View File

@@ -4,6 +4,7 @@ import os
import re
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
@@ -36,20 +37,41 @@ class DashSegmentsFD(FragmentFD):
segments_filenames = []
def append_url_to_file(target_url, target_filename):
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
fragment_retries = self.params.get('fragment_retries', 0)
def append_url_to_file(target_url, tmp_filename, segment_name):
target_filename = '%s-%s' % (tmp_filename, segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
break
except (compat_urllib_error.HTTPError, ) as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now.
if err.code != 404:
raise
# Retry fragment
count += 1
if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries)
if count > fragment_retries:
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'] + '-Init')
append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
for i, segment_url in enumerate(segment_urls):
segment_filename = '%s-Seg%d' % (ctx['tmpfilename'], i)
append_url_to_file(segment_url, segment_filename)
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
self._finish_frag_download(ctx)

View File

@@ -19,8 +19,17 @@ class HttpQuietDownloader(HttpFD):
class FragmentFD(FileDownloader):
"""
A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests).
Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only)
"""
def report_retry_fragment(self, fragment_name, count, retries):
self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries)))
def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx)
self._start_frag_download(ctx)

View File

@@ -72,6 +72,7 @@ from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
@@ -108,6 +109,7 @@ from .cbsnews import (
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chaturbate import ChaturbateIE
@@ -136,6 +138,7 @@ from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
@@ -283,6 +286,7 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
@@ -406,6 +410,7 @@ from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import MixcloudIE
from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
@@ -531,6 +536,7 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import OpenloadIE
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
@@ -626,6 +632,7 @@ from .ruutu import RuutuIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
SafariApiIE,
SafariCourseIE,
)
from .sapo import SapoIE
@@ -737,6 +744,7 @@ from .theplatform import (
ThePlatformIE,
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
@@ -953,7 +961,9 @@ from .youtube import (
YoutubeChannelIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
YoutubeLiveIE,
YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
@@ -963,7 +973,6 @@ from .youtube import (
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
YoutubePlaylistsIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
_VALID_URL = r'http://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'http://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0',
@@ -60,7 +60,7 @@ class AddAnimeIE(InfoExtractor):
confirm_url = (
parsed_url.scheme + '://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'http://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'http://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@@ -3,10 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import (
determine_ext,
encode_dict,
extract_attributes,
ExtractorError,
sanitized_Request,
urlencode_postdata,
@@ -34,6 +37,10 @@ class AnimeOnDemandIE(InfoExtractor):
# Episodes without titles
'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True,
}, {
# ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True,
}]
def _login(self):
@@ -44,6 +51,10 @@ class AnimeOnDemandIE(InfoExtractor):
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
if '>Our licensing terms allow the distribution of animes only to German-speaking countries of Europe' in login_page:
self.raise_geo_restricted(
'%s is only available in German-speaking countries of Europe' % self.IE_NAME)
login_form = self._form_hidden_inputs('new_user', login_page)
login_form.update({
@@ -59,7 +70,7 @@ class AnimeOnDemandIE(InfoExtractor):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(encode_dict(login_form)))
post_url, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
@@ -126,33 +137,86 @@ class AnimeOnDemandIE(InfoExtractor):
formats = []
playlist_url = self._search_regex(
r'data-playlist=(["\'])(?P<url>.+?)\1',
episode_html, 'data playlist', default=None, group='url')
if playlist_url:
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
attributes = extract_attributes(input_)
playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
playlist_url = attributes.get(playlist_key)
if isinstance(playlist_url, compat_str) and re.match(
r'/?[\da-zA-Z]+', playlist_url):
playlist_urls.append(attributes[playlist_key])
if not playlist_urls:
continue
playlist = self._download_json(
request, video_id, 'Downloading playlist JSON', fatal=False)
if playlist:
playlist = playlist['playlist'][0]
title = playlist['title']
lang = attributes.get('data-lang')
lang_note = attributes.get('value')
for playlist_url in playlist_urls:
kind = self._search_regex(
r'videomaterialurl/\d+/([^/]+)/',
playlist_url, 'media kind', default=None)
format_id_list = []
if lang:
format_id_list.append(lang)
if kind:
format_id_list.append(kind)
if not format_id_list:
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
playlist = self._download_json(
request, video_id, 'Downloading %s playlist JSON' % format_id,
fatal=False)
if not playlist:
continue
start_video = playlist.get('startvideo', 0)
playlist = playlist.get('playlist')
if not playlist or not isinstance(playlist, list):
continue
playlist = playlist[start_video]
title = playlist.get('title')
if not title:
continue
description = playlist.get('description')
for source in playlist.get('sources', []):
file_ = source.get('file')
if file_ and determine_ext(file_) == 'm3u8':
formats = self._extract_m3u8_formats(
if not file_:
continue
ext = determine_ext(file_)
format_id_list = [lang, kind]
if ext == 'm3u8':
format_id_list.append('hls')
elif source.get('type') == 'video/dash' or ext == 'mpd':
format_id_list.append('dash')
format_id = '-'.join(filter(None, format_id_list))
if ext == 'm3u8':
file_formats = self._extract_m3u8_formats(
file_, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
entry_protocol='m3u8_native', m3u8_id=format_id, fatal=False)
elif source.get('type') == 'video/dash' or ext == 'mpd':
continue
file_formats = self._extract_mpd_formats(
file_, video_id, mpd_id=format_id, fatal=False)
else:
continue
for f in file_formats:
f.update({
'language': lang,
'format_note': format_note,
})
formats.extend(file_formats)
if formats:
self._sort_formats(formats)
f = common_info.copy()
f.update({
'title': title,
@@ -161,16 +225,18 @@ class AnimeOnDemandIE(InfoExtractor):
})
entries.append(f)
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
# Extract teaser only when full episode is not available
if not formats:
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|http://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_TESTS = [{
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@@ -25,7 +25,7 @@ class AolIE(InfoExtractor):
class AolFeaturesIE(InfoExtractor):
IE_NAME = 'features.aol.com'
_VALID_URL = r'http://features\.aol\.com/video/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',

View File

@@ -23,7 +23,7 @@ from ..utils import (
class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
_VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):

View File

@@ -6,16 +6,14 @@ import hashlib
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
)
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
sanitized_Request,
xpath_text,
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
xpath_text,
)
@@ -86,7 +84,7 @@ class AtresPlayerIE(InfoExtractor):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -98,7 +98,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'http://www.azubu.tv/(?P<id>[^/]+)$'
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@@ -9,7 +9,7 @@ from ..utils import unescapeHTML
class BaiduVideoIE(InfoExtractor):
IE_DESC = '百度视频'
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_VALID_URL = r'https?://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
'info_dict': {

View File

@@ -4,15 +4,13 @@ import re
import itertools
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_str,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
)
@@ -58,7 +56,7 @@ class BambuserIE(InfoExtractor):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -942,7 +942,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = 'http://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles'

View File

@@ -8,7 +8,7 @@ from ..utils import url_basename
class BehindKinkIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
_VALID_URL = r'https?://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
_TEST = {
'url': 'http://www.behindkink.com/2014/12/05/what-are-you-passionate-about-marley-blaze/',
'md5': '507b57d8fdcd75a41a9a7bdb7989c762',

View File

@@ -14,7 +14,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',

View File

@@ -0,0 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
'md5': '26f51f03cf580265defefb4518faec09',
'info_dict': {
'id': 'sobre-camaras-y-camarillas-parlamentarias',
'ext': 'mp4',
'title': 'Sobre Cámaras y camarillas parlamentarias',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria',
},
}, {
# different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
'md5': 'edc2e6b58974c46d5b047dea3c539ff3',
'info_dict': {
'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
'ext': 'mp4',
'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Piangella Obrador',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True,
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/exclusivo-hector-pinto-formador-de-chupete-revela-version-del-ex-delantero-albo.shtml',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}

View File

@@ -33,7 +33,7 @@ class BokeCCBaseIE(InfoExtractor):
class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频'
_VALID_URL = r'http://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'http://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@@ -11,7 +11,7 @@ from ..utils import (
class BreakIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'info_dict': {

View File

@@ -6,7 +6,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
class CamdemyIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_TESTS = [{
# single file
'url': 'http://www.camdemy.com/media/5181/',
@@ -104,7 +104,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'http://www.camdemy.com/folder/(?P<id>\d+)'
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_TESTS = [{
# links with trailing slash
'url': 'http://www.camdemy.com/folder/450',
@@ -139,7 +139,7 @@ class CamdemyFolderIE(InfoExtractor):
parsed_url = list(compat_urlparse.urlparse(url))
query = dict(compat_urlparse.parse_qsl(parsed_url[4]))
query.update({'displayMode': 'list'})
parsed_url[4] = compat_urllib_parse.urlencode(query)
parsed_url[4] = compat_urllib_parse_urlencode(query)
final_url = compat_urlparse.urlunparse(parsed_url)
page = self._download_webpage(final_url, folder_id)

View File

@@ -11,7 +11,7 @@ from ..utils import (
class CBSNewsIE(ThePlatformIE):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
@@ -96,7 +96,7 @@ class CBSNewsIE(ThePlatformIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
class CBSSportsIE(InfoExtractor):
_VALID_URL = r'http://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',

96
youtube_dl/extractor/cda.py Executable file
View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
ExtractorError,
parse_duration
)
class CDAIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.cda.pl/video/5749950c',
'md5': '6f844bf51b15f31fae165365707ae970',
'info_dict': {
'id': '5749950c',
'ext': 'mp4',
'height': 720,
'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
'duration': 39
}
}, {
'url': 'http://www.cda.pl/video/57413289',
'md5': 'a88828770a8310fc00be6c95faf7f4d5',
'info_dict': {
'id': '57413289',
'ext': 'mp4',
'title': 'Lądowanie na lotnisku na Maderze',
'duration': 137
}
}, {
'url': 'http://ebd.cda.pl/0x0/5749950c',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True)
title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
formats = []
info_dict = {
'id': video_id,
'title': title,
'formats': formats,
'duration': None,
}
def extract_format(page, version):
unpacked = decode_packed_codes(page)
format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
if not format_url:
return
f = {
'url': format_url,
}
m = re.search(
r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
page)
if m:
f.update({
'format_id': m.group('format_id'),
'height': int(m.group('height')),
})
info_dict['formats'].append(f)
if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
extract_format(webpage, 'default')
for href, resolution in re.findall(
r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
webpage):
webpage = self._download_webpage(
href, video_id, 'Downloading %s version information' % resolution, fatal=False)
if not webpage:
# Manually report warning because empty page is returned when
# invalid version is requested.
self.report_warning('Unable to download %s version information' % resolution)
continue
extract_format(webpage, resolution)
self._sort_formats(formats)
return info_dict

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
@@ -13,6 +12,7 @@ from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
urlencode_postdata,
)
@@ -102,7 +102,7 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=compat_urllib_parse.urlencode(data))
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
@@ -129,7 +129,8 @@ class CeskaTelevizeIE(InfoExtractor):
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', entry_protocol='m3u8_native'))
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8_native', fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']

View File

@@ -19,7 +19,7 @@ def _decode(s):
class CliphunterIE(InfoExtractor):
IE_NAME = 'cliphunter'
_VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/
_VALID_URL = r'''(?x)https?://(?:www\.)?cliphunter\.com/w/
(?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?])
'''

View File

@@ -8,7 +8,7 @@ from ..utils import (
class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_VALID_URL = r'https?://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',

View File

@@ -6,7 +6,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_HTTPError,
)
from ..utils import (
@@ -64,7 +64,7 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url,
})
data_url = self._API_URL % (video_host, compat_urllib_parse.urlencode(form))
data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage(
data_url, video_id, 'Downloading player data')
data = compat_parse_qs(player_data)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ClubicIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html',

View File

@@ -11,7 +11,7 @@ from ..utils import (
class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {

View File

@@ -5,7 +5,7 @@ import re
from .mtv import MTVServicesInfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
@@ -201,7 +201,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# Correct cc.com in uri
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse.urlencode({'uri': uri}))
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')

View File

@@ -21,7 +21,7 @@ from ..compat import (
compat_os_name,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
@@ -1300,7 +1300,7 @@ class InfoExtractor(object):
'plugin': 'flowplayer-3.2.0.1',
}
f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse.urlencode(f4m_params)
f4m_url += compat_urllib_parse_urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue

View File

@@ -0,0 +1,36 @@
from __future__ import unicode_literals
import os
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import url_basename
class RtmpIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)rtmp[est]?://.+'
_TESTS = [{
'url': 'rtmp://cp44293.edgefcs.net/ondemand?auth=daEcTdydfdqcsb8cZcDbAaCbhamacbbawaS-bw7dBb-bWG-GqpGFqCpNCnGoyL&aifp=v001&slist=public/unsecure/audio/2c97899446428e4301471a8cb72b4b97--audio--pmg-20110908-0900a_flv_aac_med_int.mp4',
'only_matching': True,
}, {
'url': 'rtmp://edge.live.hitbox.tv/live/dimak',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
return {
'id': video_id,
'title': title,
'formats': [{
'url': url,
'ext': 'flv',
'format_id': compat_urlparse.urlparse(url).scheme,
}],
}

View File

@@ -5,7 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
@@ -45,7 +45,7 @@ class CondeNastIE(InfoExtractor):
'wmagazine': 'W Magazine',
}
_VALID_URL = r'http://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
_VALID_URL = r'https?://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed(?:js)?)/.+?' % '|'.join(_SITES.keys())
@@ -97,7 +97,7 @@ class CondeNastIE(InfoExtractor):
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
data = compat_urllib_parse.urlencode({'videoId': video_id,
data = compat_urllib_parse_urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})

View File

@@ -11,8 +11,8 @@ from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
)
@@ -78,7 +78,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/7202.
qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse.urlencode(qs, True)))
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE):
@@ -308,7 +308,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
@@ -322,7 +322,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml(
streamdata_req, video_id,

View File

@@ -15,7 +15,7 @@ from .senateisvp import SenateISVPIE
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
_VALID_URL = r'https?://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN'
_TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',

View File

@@ -8,7 +8,7 @@ from ..utils import parse_iso8601, ExtractorError
class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞'
# https connection failed (Connection reset)
_VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
'md5': 'a9875cb790252b08431186d741beaabe',

View File

@@ -8,8 +8,8 @@ import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
@@ -70,7 +70,7 @@ class DaumIE(InfoExtractor):
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse.urlencode({'vid': video_id})
query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info')
@@ -86,7 +86,7 @@ class DaumIE(InfoExtractor):
formats = []
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
format_query = compat_urllib_parse.urlencode({
format_query = compat_urllib_parse_urlencode({
'vid': video_id,
'profile': profile,
})

View File

@@ -6,7 +6,7 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_str,
)
from ..utils import (
@@ -15,6 +15,7 @@ from ..utils import (
sanitized_Request,
smuggle_url,
unsmuggle_url,
urlencode_postdata,
)
@@ -106,7 +107,7 @@ class DCNVideoIE(DCNBaseIE):
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
@@ -133,7 +134,7 @@ class DCNLiveIE(DCNBaseIE):
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
@@ -174,7 +175,7 @@ class DCNSeasonIE(InfoExtractor):
data['show_id'] = show_id
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
compat_urllib_parse.urlencode(data),
urlencode_postdata(data),
{
'Origin': 'http://www.dcndigital.ae',
'Content-Type': 'application/x-www-form-urlencoded'

View File

@@ -6,7 +6,7 @@ from ..compat import compat_str
class DctpTvIE(InfoExtractor):
_VALID_URL = r'http://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'info_dict': {

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
IE_NAME = 'defense.gouv.fr'
_VALID_URL = r'http://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
_VALID_URL = r'https?://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
_TEST = {
'url': 'http://www.defense.gouv.fr/layout/set/ligthboxvideo/base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1',

View File

@@ -9,7 +9,7 @@ from ..compat import compat_str
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'''(?x)http://(?:www\.)?(?:
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|

View File

@@ -10,7 +10,7 @@ from ..compat import (compat_str, compat_basestring)
class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼'
_VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
_TESTS = [{
'url': 'http://www.douyutv.com/iseven',
'info_dict': {
@@ -60,6 +60,9 @@ class DouyuTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.douyu.com/xiaocang',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -10,7 +10,7 @@ from ..utils import int_or_none
class DPlayIE(InfoExtractor):
_VALID_URL = r'http://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',

View File

@@ -6,7 +6,6 @@ import itertools
from .amp import AMPIE
from ..compat import (
compat_HTTPError,
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
@@ -14,6 +13,7 @@ from ..utils import (
clean_html,
int_or_none,
sanitized_Request,
urlencode_postdata
)
@@ -50,7 +50,7 @@ class DramaFeverBaseIE(AMPIE):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -7,7 +7,7 @@ from .zdf import ZDFIE
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',

View File

@@ -15,7 +15,7 @@ class DVTVIE(InfoExtractor):
IE_NAME = 'dvtv'
IE_DESC = 'http://video.aktualne.cz/'
_VALID_URL = r'http://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
_VALID_URL = r'https?://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
_TESTS = [{
'url': 'http://video.aktualne.cz/dvtv/vondra-o-ceskem-stoleti-pri-pohledu-na-havla-mi-bylo-trapne/r~e5efe9ca855511e4833a0025900fea04/',

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class EchoMskIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
_TEST = {
'url': 'http://www.echo.msk.ru/sounds/1464134.html',
'md5': '2e44b3b78daff5b458e4dbc37f191f7c',

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
unescapeHTML
@@ -43,7 +43,7 @@ class EroProfileIE(InfoExtractor):
if username is None:
return
query = compat_urllib_parse.urlencode({
query = compat_urllib_parse_urlencode({
'username': username,
'password': password,
'url': 'http://www.eroprofile.com/',

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = 'exfm'
IE_DESC = 'ex.fm'
_VALID_URL = r'http://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{

View File

@@ -5,19 +5,18 @@ import hashlib
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
encode_dict,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
_VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_NETRC_MACHINE = 'fc2'
_TESTS = [{
@@ -57,7 +56,7 @@ class FC2IE(InfoExtractor):
'Submit': ' Login ',
}
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
login_data = urlencode_postdata(login_form_strs)
request = sanitized_Request(
'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',

View File

@@ -8,7 +8,7 @@ from ..utils import int_or_none
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'http://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390',

View File

@@ -4,8 +4,8 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
@@ -109,7 +109,7 @@ class FiveMinIE(InfoExtractor):
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,

View File

@@ -10,7 +10,7 @@ from ..utils import (
class FKTVIE(InfoExtractor):
IE_NAME = 'fernsehkritik.tv'
_VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_VALID_URL = r'https?://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_TEST = {
'url': 'http://fernsehkritik.tv/folge-1',

View File

@@ -1,7 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
int_or_none,
@@ -42,7 +42,7 @@ class FlickrIE(InfoExtractor):
}
if secret:
query['secret'] = secret
data = self._download_json(self._API_BASE_URL + compat_urllib_parse.urlencode(query), video_id, note)
data = self._download_json(self._API_BASE_URL + compat_urllib_parse_urlencode(query), video_id, note)
if data['stat'] != 'ok':
raise ExtractorError(data['message'])
return data

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class FootyRoomIE(InfoExtractor):
_VALID_URL = r'http://footyroom\.com/(?P<id>[^/]+)'
_VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
'info_dict': {

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class FoxgayIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_TEST = {
'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
'md5': '80d72beab5d04e1655a56ad37afe6841',

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class FranceInterIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',

View File

@@ -60,28 +60,31 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url:
formats.extend(self._extract_f4m_formats(
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id))
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
video_id, f4m_id=format_id, fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id))
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif video_url.startswith('rtmp'):
formats.append({
'url': video_url,
'format_id': 'rtmp-%s' % format_id,
'ext': 'flv',
'preference': 1,
})
else:
formats.append({
'url': video_url,
'format_id': format_id,
'preference': -1,
})
if self._is_valid_url(video_url, video_id, format_id):
formats.append({
'url': video_url,
'format_id': format_id,
})
self._sort_formats(formats)
title = info['titre']
subtitle = info.get('sous_titre')
if subtitle:
title += ' - %s' % subtitle
title = title.strip()
subtitles = {}
subtitles_list = [{
@@ -125,13 +128,13 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'info_dict': {
'id': '84981923',
'ext': 'flv',
'ext': 'mp4',
'title': 'Soir 3',
'upload_date': '20130826',
'timestamp': 1377548400,
@@ -139,6 +142,10 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'fr': 'mincount:2',
},
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': {
@@ -155,11 +162,32 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': '556e03339473995ee145930c',
'id': 'NI_173343',
'ext': 'mp4',
'title': 'Les entreprises familiales : le secret de la réussite',
'thumbnail': 're:^https?://.*\.jpe?g$',
}
'timestamp': 1433273139,
'upload_date': '20150602',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://france3-regions.francetvinfo.fr/bretagne/cotes-d-armor/thalassa-echappee-breizh-ce-venredi-dans-les-cotes-d-armor-954961.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': 'NI_657393',
'ext': 'mp4',
'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de lArmor"',
'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
'thumbnail': 're:^https?://.*\.jpe?g$',
'timestamp': 1458300695,
'upload_date': '20160318',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
@@ -172,7 +200,9 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
return self.url_result(dmcloud_url, 'DailymotionCloud')
video_id, catalogue = self._search_regex(
r'id-video=([^@]+@[^"]+)', webpage, 'video id').split('@')
(r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue)

View File

@@ -5,7 +5,7 @@ from ..utils import ExtractorError
class FreeVideoIE(InfoExtractor):
_VALID_URL = r'^http://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
_VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
_TEST = {
'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html',

View File

@@ -5,7 +5,6 @@ from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
encode_dict,
int_or_none,
sanitized_Request,
ExtractorError,
@@ -54,10 +53,10 @@ class FunimationIE(InfoExtractor):
(username, password) = self._get_login_info()
if username is None:
return
data = urlencode_postdata(encode_dict({
data = urlencode_postdata({
'email_field': username,
'password_field': password,
}))
})
login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
'Content-Type': 'application/x-www-form-urlencoded'

View File

@@ -10,7 +10,7 @@ from .youtube import YoutubeIE
class GamekingsIE(InfoExtractor):
_VALID_URL = r'http://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
_TESTS = [{
# YouTube embed video
'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',

View File

@@ -14,7 +14,7 @@ from ..utils import (
class GameSpotIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
'md5': 'b2a30deaa8654fcccd43713a6b6a4825',

View File

@@ -13,7 +13,7 @@ from ..utils import (
class GameStarIE(InfoExtractor):
_VALID_URL = r'http://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',

View File

@@ -9,7 +9,7 @@ from ..utils import (
class GametrailersIE(InfoExtractor):
_VALID_URL = r'http://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
_VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
_TEST = {
'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',

View File

@@ -3,11 +3,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
remove_end,
HEADRequest,
sanitized_Request,
urlencode_postdata,
)
@@ -123,7 +123,7 @@ class GDCVaultIE(InfoExtractor):
'password': password,
}
request = sanitized_Request(login_url, compat_urllib_parse.urlencode(login_form))
request = sanitized_Request(login_url, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._download_webpage(request, display_id, 'Logging in')
start_page = self._download_webpage(webpage_url, display_id, 'Getting authenticated video page')

View File

@@ -59,6 +59,7 @@ from .videomore import VideomoreIE
from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .instagram import InstagramIE
class GenericIE(InfoExtractor):
@@ -239,6 +240,35 @@ class GenericIE(InfoExtractor):
'format': 'bestvideo',
},
},
# m3u8 served with Content-Type: audio/x-mpegURL; charset=utf-8
{
'url': 'http://once.unicornmedia.com/now/master/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/content.m3u8',
'info_dict': {
'id': 'content',
'ext': 'mp4',
'title': 'content',
'formats': 'mincount:8',
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# m3u8 served with Content-Type: text/plain
{
'url': 'http://www.nacentapps.com/m3u8/index.m3u8',
'info_dict': {
'id': 'index',
'ext': 'mp4',
'title': 'index',
'upload_date': '20140720',
'formats': 'mincount:11',
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@@ -1245,14 +1275,13 @@ class GenericIE(InfoExtractor):
info_dict = {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'upload_date': unified_strdate(head_response.headers.get('Last-Modified'))
}
# Check for direct link to a video
content_type = head_response.headers.get('Content-Type', '')
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>.+)$', content_type)
content_type = head_response.headers.get('Content-Type', '').lower()
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>[^;\s]+)', content_type)
if m:
upload_date = unified_strdate(
head_response.headers.get('Last-Modified'))
format_id = m.group('format_id')
if format_id.endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4')
@@ -1264,11 +1293,8 @@ class GenericIE(InfoExtractor):
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
info_dict.update({
'direct': True,
'formats': formats,
'upload_date': upload_date,
})
info_dict['direct'] = True
info_dict['formats'] = formats
return info_dict
if not self._downloader.params.get('test', False) and not is_intentional:
@@ -1289,18 +1315,21 @@ class GenericIE(InfoExtractor):
request.add_header('Accept-Encoding', '*')
full_response = self._request_webpage(request, video_id)
first_bytes = full_response.read(512)
# Is it an M3U playlist?
if first_bytes.startswith(b'#EXTM3U'):
info_dict['formats'] = self._extract_m3u8_formats(url, video_id, 'mp4')
return info_dict
# Maybe it's a direct link to a video?
# Be careful not to download the whole thing!
first_bytes = full_response.read(512)
if not is_html(first_bytes):
self._downloader.report_warning(
'URL could be a direct video link, returning it as such.')
upload_date = unified_strdate(
head_response.headers.get('Last-Modified'))
info_dict.update({
'direct': True,
'url': url,
'upload_date': upload_date,
})
return info_dict
@@ -1881,6 +1910,19 @@ class GenericIE(InfoExtractor):
self._proto_relative_url(unescapeHTML(mobj.group(1))),
'AdobeTVVideo')
# Look for Vine embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))',
webpage)
if mobj is not None:
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
return self.url_result(instagram_embed_url, InstagramIE.ie_key())
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True

122
youtube_dl/extractor/hbo.py Normal file
View File

@@ -0,0 +1,122 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_element,
int_or_none,
parse_duration,
)
class HBOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
'title': 'Ep. 64 Clip: Encryption',
}
}
_FORMATS_INFO = {
'1920': {
'width': 1280,
'height': 720,
},
'640': {
'width': 768,
'height': 432,
},
'highwifi': {
'width': 640,
'height': 360,
},
'high3g': {
'width': 640,
'height': 360,
},
'medwifi': {
'width': 400,
'height': 224,
},
'med3g': {
'width': 400,
'height': 224,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_xml(
'http://render.lv3.hbo.com/data/content/global/videos/data/%s.xml' % video_id, video_id)
title = xpath_text(video_data, 'title', 'title', True)
formats = []
for source in xpath_element(video_data, 'videos', 'sources', True):
if source.tag == 'size':
path = xpath_text(source, './/path')
if not path:
continue
width = source.attrib.get('width')
format_info = self._FORMATS_INFO.get(width, {})
height = format_info.get('height')
fmt = {
'url': path,
'format_id': 'http%s' % ('-%dp' % height if height else ''),
'width': format_info.get('width'),
'height': height,
}
rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', path)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': fmt['format_id'].replace('http', 'rtmp'),
})
formats.append(fmt)
else:
video_url = source.text
if not video_url:
continue
if source.tag == 'tarball':
formats.extend(self._extract_m3u8_formats(
video_url.replace('.tar', '/base_index_w8.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
format_info = self._FORMATS_INFO.get(source.tag, {})
formats.append({
'format_id': 'http-%s' % source.tag,
'url': video_url,
'width': format_info.get('width'),
'height': format_info.get('height'),
})
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
thumbnails = []
card_sizes = xpath_element(video_data, 'titleCardSizes')
if card_sizes is not None:
for size in card_sizes:
path = xpath_text(size, 'path')
if not path:
continue
width = int_or_none(size.get('width'))
thumbnails.append({
'id': width,
'url': path,
'width': width,
})
return {
'id': video_id,
'title': title,
'duration': parse_duration(xpath_element(video_data, 'duration/tv14')),
'formats': formats,
'thumbnails': thumbnails,
}

View File

@@ -3,16 +3,16 @@ from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
ExtractorError,
HEADRequest,
sanitized_Request,
urlencode_postdata,
)
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',
@@ -35,7 +35,7 @@ class HotNewHipHopIE(InfoExtractor):
r'"contentUrl" content="(.*?)"', webpage, 'content URL')
return self.url_result(video_url, ie='Youtube')
reqdata = compat_urllib_parse.urlencode([
reqdata = urlencode_postdata([
('mediaType', 's'),
('mediaId', video_id),
])

View File

@@ -4,7 +4,7 @@ import json
import time
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
sanitized_Request,
@@ -12,7 +12,7 @@ from ..utils import (
class HypemIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?hypem\.com/track/(?P<id>[^/]+)/'
_VALID_URL = r'https?://(?:www\.)?hypem\.com/track/(?P<id>[^/]+)/'
_TEST = {
'url': 'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
'md5': 'b9cc91b5af8995e9f0c1cee04c575828',
@@ -28,7 +28,7 @@ class HypemIE(InfoExtractor):
track_id = self._match_id(url)
data = {'ax': 1, 'ts': time.time()}
request = sanitized_Request(url + '?' + compat_urllib_parse.urlencode(data))
request = sanitized_Request(url + '?' + compat_urllib_parse_urlencode(data))
response, urlh = self._download_webpage_handle(
request, track_id, 'Downloading webpage with the url')

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'http://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
_TEST = {
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@@ -70,7 +70,7 @@ class ImdbIE(InfoExtractor):
class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'http://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
'info_dict': {

View File

@@ -4,8 +4,10 @@ import re
from .common import InfoExtractor
from ..utils import (
get_element_by_attribute,
int_or_none,
limit_length,
lowercase_escape,
)
@@ -38,6 +40,18 @@ class InstagramIE(InfoExtractor):
'only_matching': True,
}]
@staticmethod
def _extract_embed_url(webpage):
blockquote_el = get_element_by_attribute(
'class', 'instagram-media', webpage)
if blockquote_el is None:
return
mobj = re.search(
r'<a[^>]+href=([\'"])(?P<link>[^\'"]+)\1', blockquote_el)
if mobj:
return mobj.group('link')
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -46,6 +60,8 @@ class InstagramIE(InfoExtractor):
webpage, 'uploader id', fatal=False)
desc = self._search_regex(
r'"caption":"(.+?)"', webpage, 'description', default=None)
if desc is not None:
desc = lowercase_escape(desc)
return {
'id': video_id,

View File

@@ -5,7 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
compat_urllib_parse,
compat_urllib_parse_urlencode,
)
from ..utils import (
xpath_with_ns,
@@ -38,7 +38,7 @@ class InternetVideoArchiveIE(InfoExtractor):
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
return compat_urllib_parse_urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re
@@ -6,6 +6,8 @@ import time
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
sanitized_Request,
)
@@ -30,8 +32,7 @@ class IPrimaIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@@ -43,9 +44,42 @@ class IPrimaIE(InfoExtractor):
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id, note='Downloading player')
m3u8_url = self._search_regex(r"'src': '([^']+\.m3u8)'", playerpage, 'm3u8 url')
formats = []
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
def extract_formats(format_url, format_key=None, lang=None):
ext = determine_ext(format_url)
new_formats = []
if format_key == 'hls' or ext == 'm3u8':
new_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
elif format_key == 'dash' or ext == 'mpd':
return
new_formats = self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False)
if lang:
for f in new_formats:
if not f.get('language'):
f['language'] = lang
formats.extend(new_formats)
options = self._parse_json(
self._search_regex(
r'(?s)var\s+playerOptions\s*=\s*({.+?});',
playerpage, 'player options', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if options:
for key, tracks in options.get('tracks', {}).items():
if not isinstance(tracks, list):
continue
for track in tracks:
src = track.get('src')
if src:
extract_formats(src, key.lower(), track.get('lang'))
if not formats:
for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage):
extract_formats(src)
self._sort_formats(formats)

View File

@@ -14,7 +14,7 @@ from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
@@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺'
_VALID_URL = r'http://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_NETRC_MACHINE = 'iqiyi'
@@ -322,7 +322,7 @@ class IqiyiIE(InfoExtractor):
'bird_t': timestamp,
}
validation_result = self._download_json(
'http://kylin.iqiyi.com/validate?' + compat_urllib_parse.urlencode(validation_params), None,
'http://kylin.iqiyi.com/validate?' + compat_urllib_parse_urlencode(validation_params), None,
note='Validate credentials', errnote='Unable to validate credentials')
MSG_MAP = {
@@ -456,7 +456,7 @@ class IqiyiIE(InfoExtractor):
'QY00001': auth_result['data']['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse.urlencode(param)
api_video_url += compat_urllib_parse_urlencode(param)
js = self._download_json(
api_video_url, video_id,
note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
@@ -494,14 +494,14 @@ class IqiyiIE(InfoExtractor):
}
api_url = 'http://cache.video.qiyi.com/vms' + '?' + \
compat_urllib_parse.urlencode(param)
compat_urllib_parse_urlencode(param)
raw_data = self._download_json(api_url, video_id)
return raw_data
def get_enc_key(self, video_id):
# TODO: automatic key extraction
# last update at 2016-01-22 for Zombie::bite
enc_key = '8ed797d224d043e7ac23d95b70227d32'
enc_key = '4a1caba4b4465345366f28da7c117d20'
return enc_key
def _extract_playlist(self, webpage):

View File

@@ -5,7 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import qualities
@@ -62,7 +62,7 @@ class IvideonIE(InfoExtractor):
quality = qualities(self._QUALITIES)
formats = [{
'url': 'https://streaming.ivideon.com/flv/live?%s' % compat_urllib_parse.urlencode({
'url': 'https://streaming.ivideon.com/flv/live?%s' % compat_urllib_parse_urlencode({
'server': server_id,
'camera': camera_id,
'sessionId': 'demo',

View File

@@ -9,7 +9,7 @@ from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
_VALID_URL = r'https?://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
_TESTS = [{
'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',

View File

@@ -6,7 +6,7 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urlparse,
compat_parse_qs,
)
@@ -71,7 +71,7 @@ class KalturaIE(InfoExtractor):
for k, v in a.items():
params['%d:%s' % (i, k)] = v
query = compat_urllib_parse.urlencode(params)
query = compat_urllib_parse_urlencode(params)
url = self._API_BASE + query
data = self._download_json(url, video_id, *args, **kwargs)

View File

@@ -9,7 +9,7 @@ from ..utils import (
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'http://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568',
'info_dict': {

View File

@@ -12,7 +12,7 @@ from ..utils import (
class KarriereVideosIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?karrierevideos\.at(?:/[^/]+)+/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?karrierevideos\.at(?:/[^/]+)+/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.karrierevideos.at/berufsvideos/mittlere-hoehere-schulen/altenpflegerin',
'info_dict': {

View File

@@ -13,7 +13,7 @@ from ..utils import (
class KontrTubeIE(InfoExtractor):
IE_NAME = 'kontrtube'
IE_DESC = 'KontrTube.ru - Труба зовёт'
_VALID_URL = r'http://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
_VALID_URL = r'https?://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
_TEST = {
'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class Ku6IE(InfoExtractor):
_VALID_URL = r'http://v\.ku6\.com/show/(?P<id>[a-zA-Z0-9\-\_]+)(?:\.)*html'
_VALID_URL = r'https?://v\.ku6\.com/show/(?P<id>[a-zA-Z0-9\-\_]+)(?:\.)*html'
_TEST = {
'url': 'http://v.ku6.com/show/JG-8yS14xzBr4bCn1pu0xw...html',
'md5': '01203549b9efbb45f4b87d55bdea1ed1',

View File

@@ -16,7 +16,7 @@ from ..utils import (
class KUSIIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?kusi\.com/(?P<path>story/.+|video\?clipId=(?P<clipId>\d+))'
_VALID_URL = r'https?://(?:www\.)?kusi\.com/(?P<path>story/.+|video\?clipId=(?P<clipId>\d+))'
_TESTS = [{
'url': 'http://www.kusi.com/story/31183873/turko-files-case-closed-put-on-hold',
'md5': 'f926e7684294cf8cb7bdf8858e1b3988',

Some files were not shown because too many files have changed in this diff Show More