1
0
mirror of https://source.netsyms.com/Mirrors/youtube-dl synced 2026-03-28 07:08:48 +00:00

Compare commits

...

306 Commits

Author SHA1 Message Date
Philipp Hagemeister
8b38f2ac40 release 2016.04.24 2016-04-24 17:06:46 +02:00
Yen Chi Hsuan
a82398bd72 [kwuo:song] Fix extraction and update the test 2016-04-24 22:20:45 +08:00
remitamine
c14dc00df3 [viewster] improve http formats extraction 2016-04-24 14:34:28 +01:00
Yen Chi Hsuan
03dd60ca41 [kuwo:category] Fix the test
Sometimes there are 24 songs and sometimes 30 lol
2016-04-24 21:16:06 +08:00
Yen Chi Hsuan
0738187f9b [ThePlatform] Fix tests failed since 79ba9140dc 2016-04-24 20:46:06 +08:00
Yen Chi Hsuan
a956cb6306 [onionstudios] Fix description extraction
\1 does not work in []. Fixes test_Generic_75
(http://www.clickhole.com/video/dont-understand-bitcoin-man-will-mumble-explanatio-2537)
2016-04-24 20:41:17 +08:00
Yen Chi Hsuan
a8062eabcd [mwave] Skip checking unstable MD5
On my PC the checksum is 02eda6d09fb63131a17a8d44e6237463, while a
recent Travis CI build
(https://travis-ci.org/rg3/youtube-dl/jobs/125341081) shows it's
c930e27b7720aaa3c9d0018dfc8ff6cc
2016-04-24 20:05:24 +08:00
Yen Chi Hsuan
2a7dee8cc5 [yahoo] Improve error detection and update tests 2016-04-24 18:12:16 +08:00
Yen Chi Hsuan
d9ed362116 [yahoo] Extract all <iframe>s
Fixes test_yahoo_6

(https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html)
2016-04-24 17:46:25 +08:00
Yen Chi Hsuan
4f54958097 [yahoo] Update some tests
One has new fields as ThePlatformIE changed, and others have changed
files.
2016-04-24 17:29:01 +08:00
Yen Chi Hsuan
2a7c38831c [yahoo] Extend _VALID_URL and fix extraction
Closes #9271
2016-04-24 17:01:18 +08:00
Yen Chi Hsuan
949b6497cc [generic] Unescape the video URL
Fixes #9279
2016-04-24 16:25:37 +08:00
Sergey M
2c21152ca7 [README.md] Document track metafields in output template 2016-04-24 12:22:18 +06:00
remitamine
fda9a1ca9e [viewster] simplify qualities_basename regex 2016-04-24 03:06:46 +01:00
remitamine
864d5e7231 [viewster] extract all http formats 2016-04-24 02:32:56 +01:00
Sergey M․
5448b781f6 [dplay] Sign unsigned final download hls URLs 2016-04-23 17:28:45 +06:00
Sergey M․
e239413fbc [dplay] Extract subtitles (Closes #9284) 2016-04-23 16:50:31 +06:00
Sergey M․
fd0ff8bad8 [dplay] Improve extraction and document workarounds and tests 2016-04-23 16:36:17 +06:00
Sergey M․
397ec446f3 [dplay] Try secure api for no tld (Closes #9282) 2016-04-23 15:59:30 +06:00
remitamine
29a7e8f6f8 [nhl] Add new extractor(closes #8419)(closes #8798) 2016-04-22 20:18:27 +01:00
Yen Chi Hsuan
eb01e97e10 [youku] Skip streams with channel_type=tail
Fixes #9275

These video segments look like ads and they don't appear in the web
player.
2016-04-23 02:54:09 +08:00
remitamine
cb7d4d0efd [nbc] add support for today.com(closes #2909) 2016-04-22 18:08:20 +01:00
Yen Chi Hsuan
c80037918b [iqiyi] Improve error detection (#9276) 2016-04-23 00:06:49 +08:00
remitamine
237a41108a [eagleplatform] extract all http formats 2016-04-22 14:32:38 +01:00
remitamine
e962ae15d3 [newstube] extract http formats(closes #9253) 2016-04-22 11:26:43 +01:00
remitamine
7c36ea7d54 [rtbf] improve extraction(fixes #9267) 2016-04-21 22:52:49 +01:00
remitamine
9260cf1d97 [tubitv] fix extraction(closes #8741) 2016-04-21 20:30:19 +01:00
Sergey M․
bdbb8530c7 [vimeo] Pass Referer for check-password request 2016-04-22 00:02:39 +06:00
Sergey M․
09a9fadb84 [dump] Remove extractor 2016-04-21 23:31:34 +06:00
Sergey M․
bf09af3acb Add --hls-prefer-ffmpeg 2016-04-21 23:02:17 +06:00
Sergey M․
88296ac326 [planetaplay] Remove remainings of extractor 2016-04-21 22:57:38 +06:00
Sergey M․
870d525848 [options] Remove experimental mark for --hls-prefer-native 2016-04-21 22:44:01 +06:00
Sergey M․
6577112890 [planetaplay] Remove extractor (Closes #9256) 2016-04-21 22:33:54 +06:00
Sergey M․
1988647dda [tvigle] Skip hls completely (#9259) 2016-04-21 22:15:20 +06:00
Yen Chi Hsuan
a292cba256 [mgtv] Fix _VALID_URL and add localized name 2016-04-22 00:07:43 +08:00
Yen Chi Hsuan
982e518a96 [dispeak] Rename DigitalSpeaking to DigitallySpeaking 2016-04-22 00:07:43 +08:00
Yen Chi Hsuan
748e730099 [dispeak] Several fixes 2016-04-22 00:07:43 +08:00
Sergey M
b6c0d4f431 Merge pull request #9110 from remitamine/parse_duration
[utils] imporove parse_duration to handle more formats
2016-04-21 22:53:16 +07:00
remitamine
acaff49575 [utils] imporove parse_duration to handle more formats 2016-04-21 16:34:54 +01:00
Yen Chi Hsuan
1da19488f9 [mgtv] Add new extractor (closes #9212) 2016-04-21 23:29:51 +08:00
Yen Chi Hsuan
442c4d361f [dispeak/gdcvault] Add the test case from #5784 2016-04-21 19:47:10 +08:00
Yen Chi Hsuan
ec59d657e7 [dispeak] Add new extractor
Both GDCVault and GPUTechConf uses the service of DigitalSpeaking.
2016-04-21 19:36:33 +08:00
Yen Chi Hsuan
99ef96f84c [gdcvault] Fix for videos with hard-coded hostnames
Fixes #9248
2016-04-21 18:07:03 +08:00
Yen Chi Hsuan
4dccea8ad0 [streetvoice] Fix extraction
The old API results in URLs with HTTP 403 from time to time.

Hopefully fixes #9219.
2016-04-21 13:07:53 +08:00
Yen Chi Hsuan
2c0d9c6217 [extractor/common] Allow empty post data 2016-04-21 13:06:06 +08:00
Sergey M․
12a5134596 [tvigle] Fix extraction (Closes #9259) 2016-04-20 23:52:41 +06:00
Sergey M․
16e633a5d7 [quickvid] Remove extractor (Closes #9258) 2016-04-20 23:29:02 +06:00
Sergey M․
494ab6db73 [youtube] Capture and output login error message 2016-04-20 22:14:32 +06:00
Sergey M․
107701fcfc [people] Remove bogus comment 2016-04-20 03:40:02 +06:00
Sergey M․
f77970765a [people] Add extractor 2016-04-20 03:37:23 +06:00
Philipp Hagemeister
81215d5652 release 2016.04.19 2016-04-19 03:03:52 +02:00
Sergey M․
241a318f27 [vimeo] Improve _VALID_URL (Closes #9229) 2016-04-18 21:40:28 +06:00
Sergey M․
4fdf082375 [theonion] Remove extractor (Closes #9220)
It now uses generic onionstudios embed
2016-04-17 23:12:23 +06:00
Jaime Marquínez Ferrándiz
1b6182d8f7 [youtube:playlist] Fetch all the videos in a mix (fixes #3837)
Since there doesn't seem to be any indication, it stops when there aren't new videos in the webpage.
2016-04-17 17:07:57 +02:00
remitamine
7bab22a402 [vice] remove unused import and variable 2016-04-17 14:06:19 +01:00
Yen Chi Hsuan
0f97fb4d00 [musicplayon] Relax _VALID_URL and improve metadata extraction
In r'pl=\d+&play=\d+' pages, several metadata items are missing

Closes #9222.
2016-04-17 17:24:33 +08:00
Yen Chi Hsuan
b1cf58f48f [musicplayon] Fix extraction (closes #9222) 2016-04-17 15:08:51 +08:00
remitamine
3014b0ae83 Merge pull request #9195 from remitamine/ffmpeg-pipe
[downloader/external] enable piping for FFmpegFD(closes #2124)
2016-04-16 22:00:49 +01:00
remitamine
b9f2fdd37f [ffmpeg] Clarify rationale for pipe(-) exclusion in _ffmpeg_filename_argument 2016-04-16 21:50:13 +01:00
remitamine
bbb3f730bb [onionstudios] extract m3u8 formats 2016-04-16 20:53:13 +01:00
remitamine
d868f43c58 [ffmpeg] check for - file name in _ffmpeg_filename_argument 2016-04-16 19:45:56 +01:00
Yen Chi Hsuan
21525bb8ca [kuwo:category] Update the test
Now the webpage says there are 24 songs.
2016-04-17 02:38:05 +08:00
Sergey M․
d8f103159f [nerdist] Remove extractor
It now uses brightcove
2016-04-17 00:16:31 +06:00
remitamine
663ee5f0a9 [vice] extract youtube embed 2016-04-16 17:49:39 +01:00
Sergey M․
b6b950bf58 [cbs] Remove unused import 2016-04-16 22:47:10 +06:00
Sergey M․
11e60fcad8 [extractor/generic] Improve instagram embeds (Closes #9213) 2016-04-16 22:39:20 +06:00
Sergey M․
c23533a100 [instagram] Add support for iframe embeds 2016-04-16 22:31:05 +06:00
Sergey M․
0dafea02e6 [instagram] Add support for embed URLs 2016-04-16 22:23:08 +06:00
Sergey M․
5d6360c3b7 [mooshare] Remove extractor 2016-04-16 21:31:50 +06:00
Yen Chi Hsuan
5e5c30c3fd [mdr] Fix extraction and update tests
It's strange that the date is changed. Anyway, new data matches what the
webpage says.
2016-04-16 21:57:28 +08:00
Yen Chi Hsuan
9154c87fc4 [huffpost] Fix a typo 2016-04-16 21:41:22 +08:00
Yen Chi Hsuan
ef0e4e7bc0 [generic] Fix test_Generic_2
Now a HEAD request returns 400 Bad Request
2016-04-16 19:44:45 +08:00
Yen Chi Hsuan
67d46a3f90 [ustream] Fix /embed/ URLs and add a test 2016-04-16 19:39:25 +08:00
Yen Chi Hsuan
bec47a0748 [tudou] Improve error detection (closes #9175) 2016-04-16 19:11:25 +08:00
Yen Chi Hsuan
36b7d9dbfa [twitter] Don't check /cards/ URLs
Fixes #9181

In this tweet, there are two cards:
1. https://twitter.com/i/cards/tfw/v1/719944006306701313
   This shows #TeamCap vs. #TeamIronMan
2. https://twitter.com/i/videos/tweet/719944021058060289
   This is the real video and can be handled by TwitterCardIE

In all current test_Twitter* tests, /videos/tweet/ approach works fine.
2016-04-16 18:57:50 +08:00
Yen Chi Hsuan
8c65e4a527 [bbc] Fix a test 2016-04-16 18:00:19 +08:00
Yen Chi Hsuan
6ad2ef8b7c [audiomack] Update the test
The original test raises 404
2016-04-16 17:54:39 +08:00
Yen Chi Hsuan
00b426d66d [varzesh3] Add md5 to the test 2016-04-16 17:41:56 +08:00
Yen Chi Hsuan
0de968b584 [newgrounds] Support videos (closes #9138) 2016-04-16 17:41:56 +08:00
remitamine
0841d5013c [cbs] do not catch Exceptions raised by by _extract_theplatform_smil 2016-04-16 10:25:59 +01:00
remitamine
a71fca8577 [theplatform] remove _sort_formats from _extract_theplatform_smil 2016-04-16 10:23:56 +01:00
Yen Chi Hsuan
ee94e7e66d [varzesh3] Fix metadata extraction (closes #9197) 2016-04-16 17:13:22 +08:00
Yen Chi Hsuan
759e37c9e6 [gazeta] Relax _VALID_URL and update tests
Closes #9196
2016-04-16 16:48:47 +08:00
Yen Chi Hsuan
ae65567102 [eagleplatform] Fix error handling 2016-04-16 16:47:16 +08:00
Yen Chi Hsuan
c394b4f4cb [puls4] Fix error detection (#9194) 2016-04-16 16:22:44 +08:00
Yen Chi Hsuan
260c7036ba [sportbox] Fix SportBoxEmbedIE
Also fixes test_Generic_29 (http://www.vestifinance.ru/articles/25753)
2016-04-16 16:13:14 +08:00
remitamine
f74197a074 [cbs] extract rtmp formats 2016-04-15 22:38:37 +01:00
remitamine
f3a58d46bf [youtube:user] check if the url didn't match only the other youtube extractors 2016-04-15 19:06:13 +01:00
Sergey M․
b6612c9b11 [karaoketv] Fix extraction 2016-04-15 21:26:54 +06:00
Yen Chi Hsuan
7e176effb2 [iqiyi] Also suuport pps.tv URLs
PPS is acquired by Baidu and merged with iQiyi in 2013 [1]. Now they
have the same page layouts.

[1] http://www.chinanews.com/it/2013/05-07/4792526.shtml
2016-04-15 22:39:18 +08:00
Yen Chi Hsuan
4a252cc2d2 [karaoketv] Update and mark as not _WORKING 2016-04-15 21:49:17 +08:00
Yen Chi Hsuan
f0ec61b525 [huffpost] Fix extraction 2016-04-15 20:55:56 +08:00
Yen Chi Hsuan
66d40ae3a5 Merge pull request #9041 from kasper93/master
[generic] Add support for LiveLeak embeds
2016-04-15 17:23:55 +08:00
Yen Chi Hsuan
e6da9240d4 [mixcloud:stream] Add new extractor
Closes #7633
2016-04-15 17:14:17 +08:00
Yen Chi Hsuan
dd91dfcd67 [mixcloud] Fix extraction by decrypting play info
Fixes #7521
2016-04-15 15:48:22 +08:00
Yen Chi Hsuan
c773082692 Merge branch 'Phaeilo-mixcloud' 2016-04-15 14:33:04 +08:00
Yen Chi Hsuan
9c250931f5 [mixcloud] Improve and simplify mixcloud:user and mixcloud:playlist 2016-04-15 14:32:02 +08:00
Yen Chi Hsuan
56f1750049 [tdslifeway] Use the new Brightcove API
Thanks for @remitamine's suggestion.
2016-04-15 04:28:54 +08:00
Yen Chi Hsuan
f2159c9815 [wayofthemaster] Remove extractor
Now it's using YouTube embeds.
2016-04-15 04:02:23 +08:00
Yen Chi Hsuan
b0cf2e7c1b [ubu] Remove extractor
1. Videos on ubu.com are now hosted on Vimeo
2. The duration is far from correct, and may not exist on other videos
   (For example http://ubu.com/film/hammons_king.html)
2016-04-15 03:48:23 +08:00
Yen Chi Hsuan
74b47d00c3 [xboxclips] Use http:// URL
xboxclips has misconfigured certificates
2016-04-15 03:30:38 +08:00
Yen Chi Hsuan
8cb57bab8e [ministrygrid] Fix extraction and modernize 2016-04-15 02:48:12 +08:00
Yen Chi Hsuan
e1bf277e19 [tdslifeway] Add TDSLifewayIE
Used by MinistryGridIE
2016-04-15 02:48:12 +08:00
remitamine
ce599d5a7e [downloader/external] enable piping for FFmpegFD(closes #2124) 2016-04-14 18:49:02 +01:00
Sergey M․
9e28538726 [arte:creative] Improve _VALID_URL 2016-04-14 21:54:41 +06:00
Sergey M․
404284132c [arte:info] Add extractor (Closes #9182) 2016-04-14 21:52:05 +06:00
remitamine
5565be9dd9 [aol] relex _VALID_URL regex 2016-04-14 08:47:55 +01:00
Yen Chi Hsuan
b3a9474ad1 Merge branch 'mixcloud' of https://github.com/Phaeilo/youtube-dl into Phaeilo-mixcloud 2016-04-14 15:31:58 +08:00
Yen Chi Hsuan
86475d59b1 [metacritic] Add a new valid test case 2016-04-14 15:12:59 +08:00
Yen Chi Hsuan
73d93f948e [lecture2go] Fix extraction
RTSP stream fails to download. Seems it's a mpv bug as direct playback
works well:

$ mpv --ytdl-format rtsp https://lecture2go.uni-hamburg.de/veranstaltungen/-/v/17473
2016-04-14 15:08:01 +08:00
Yen Chi Hsuan
f5d8743e0a [downloader/rtsp] Print the command 2016-04-14 15:07:31 +08:00
Yen Chi Hsuan
d1c4e4ba15 [laola1tv] Improve error detection and skip an invalid test 2016-04-14 14:11:28 +08:00
Yen Chi Hsuan
f141fefab7 [karrierevideos] Fix extraction
The server serves malformed header "Content Type: text/xml" for the XML
request (it should be Content-Type but not Content Type). Python 3.x,
which uses email.feedparser rejects such headers. As a result,
Content-Encoding header is not parsed, so the returned content is kept
not decompressed, and thus XML parsing error.
2016-04-14 14:06:05 +08:00
aystroganov@gmail.com
8334637f4a Make tbr field 'int' rather than 'tuple'
Closes #9180.
2016-04-13 14:29:34 +02:00
Philipp Hagemeister
b0ba11cc64 release 2016.04.13 2016-04-13 08:02:03 +02:00
Kacper Michajłow
b8f67449ec [generic] Add support for LiveLeak embeds 2016-04-13 01:54:19 +02:00
Yen Chi Hsuan
75af5d59ae [netease] Skip all tests: completely georestricted 2016-04-13 04:52:07 +08:00
Sergey M․
b969d12490 Credit @Phaeilo for presstv (#7113) 2016-04-13 01:52:50 +06:00
Philip Huppert
6d67169509 [mixcloud] improved extraction of user description 2016-04-12 21:18:13 +02:00
Philip Huppert
dcaf00fb3e [mixcloud] support older urllib versions 2016-04-12 21:18:13 +02:00
Philip Huppert
f896e1ccef [mixcloud] fixed some tests 2016-04-12 21:18:13 +02:00
Philip Huppert
c96eca426b [mixcloud] Added support for user uploads, playlists, favorites and listens.
Fixes #3750 and #5272
2016-04-12 21:18:13 +02:00
Sergey M․
466a614537 [youtube:playlist] Recognize popular uploads playlist as mix (Closes #9170) 2016-04-12 21:38:31 +06:00
Sergey M․
ffa2cecf72 [ard] Change subtitles extension to ttml (Closes #9169)
ttml is now served instead of srt
2016-04-12 21:20:31 +06:00
Yen Chi Hsuan
a837416025 [jadorecettepub] Remove extractor: website gone 2016-04-12 18:30:53 +08:00
Yen Chi Hsuan
c9d448876f [izlesene] Fix extraction
description may be absent
2016-04-12 18:29:28 +08:00
Yen Chi Hsuan
8865b8abfd [howstuffworks] Skip a broken test case 2016-04-12 17:30:14 +08:00
Yen Chi Hsuan
c77a0c01cb [groupon] Fix extraction 2016-04-12 17:26:09 +08:00
Yen Chi Hsuan
12355ac473 [goshgay] Fix extraction
isFamilyFriendly no longer exists in the webpage and I can't find
another indicator.
2016-04-12 17:23:00 +08:00
Sergey M․
49f523ca50 [mixcloud] Capture error message (#9156) 2016-04-11 20:45:58 +06:00
remitamine
4a903b93a9 Revert "[openclassroom] Add new extractor(closes #9147)"
This reverts commit 13267a2be3.
2016-04-11 14:44:35 +01:00
remitamine
13267a2be3 [openclassroom] Add new extractor(closes #9147) 2016-04-11 14:24:08 +01:00
Yen Chi Hsuan
134c207e3f [arte.tv:embed] Extended support (#2620) 2016-04-11 19:32:27 +08:00
Yen Chi Hsuan
0f56bd2178 Merge branch 'Phaeilo-presstv' 2016-04-11 16:17:05 +08:00
Yen Chi Hsuan
dfbc7f7f3f [presstv] Improve and simplify 2016-04-11 16:14:07 +08:00
Yen Chi Hsuan
7d58ea7c5b Merge branch 'presstv' of https://github.com/Phaeilo/youtube-dl into Phaeilo-presstv 2016-04-11 15:48:10 +08:00
Sergey M․
452908b257 [telebruxelles] Fix extraction (Closes #9142) 2016-04-11 00:06:05 +06:00
Sergey M․
5899e988d5 [glide] Improve extraction and extract upload info 2016-04-10 23:56:23 +06:00
Sergey M․
4a121d29bb [glide] Fix extraction (Closes #9141) 2016-04-10 23:45:17 +06:00
Sergey M․
7ebc36900d [jwplatform:base] Improve subtitles extraction 2016-04-10 22:55:07 +06:00
Sergey M․
d7eb052fa2 [screencastomatic] Add duration to test 2016-04-10 22:48:04 +06:00
Sergey M․
a6d6722c8f [jwplatform:base] Extract duration 2016-04-10 22:47:38 +06:00
Sergey M․
66fa495868 [screencastomatic] Fix extraction (Closes #9136) 2016-04-10 22:37:14 +06:00
Sergey M․
443285aabe [ebaumsworlds] Update _VALID_URL (Closes #9135) 2016-04-10 22:15:11 +06:00
Philip Huppert
de728757ad [presstv] Refactored extractor. 2016-04-10 16:36:44 +02:00
Sergey M․
f44c276842 [extractor/extractors] Remove non-existant imports 2016-04-10 19:21:58 +06:00
Sergey M․
a1fa60a934 [cliprs] Add extractor (Closes #9099) 2016-04-10 18:43:40 +06:00
Sergey M․
49caf3307f [extractor/common] Remove irrelevant comment 2016-04-10 17:10:27 +06:00
Jaime Marquínez Ferrándiz
6a801f4470 [test/InfoExtractors] add test for _download_json 2016-04-09 23:18:41 +02:00
Sergey M․
61dd350a04 [1tv] Fix extraction (Closes #9103) 2016-04-10 03:02:35 +06:00
Jaime Marquínez Ferrándiz
eb9c3edd5e [test/utils] Add test for date_from_str 2016-04-09 22:40:05 +02:00
Philip Huppert
95153a960d [presstv] updated extractor and tests to work with current PressTV website 2016-04-09 16:14:05 +02:00
Yen Chi Hsuan
6c4c7539f2 [test/helper] Check got values to be strings for md5: fields
Seen in PBSIE tests
2016-04-09 22:04:48 +08:00
Yen Chi Hsuan
c991106706 [videodetective] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:35 +08:00
Yen Chi Hsuan
dae2a058de [rottentomatoes] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:12 +08:00
Yen Chi Hsuan
c05025fdd7 [internetvideoarchive] Fix extraction and support json URLs 2016-04-09 21:46:51 +08:00
Philip Huppert
bfe96d7bea [presstv] Added extractor PressTV.
Fixes #7060
2016-04-09 14:55:54 +02:00
Yen Chi Hsuan
ab481b48e5 [funnyordie] Relax M3U8 URL matching
Also, m3u8_url extraction should be fatal as all formats depends
directly or indirectly on it.

This change fixes test_Generic_26 and TestFunnyOrDieSubtitles
2016-04-09 20:17:35 +08:00
Sergey M․
92c7f3157a [aol] Add coding cookie 2016-04-09 17:32:23 +06:00
Yen Chi Hsuan
cacd996662 [utils] Don't touch URLs if not necessary
Fix test_Generic_15 (Google redirect)
2016-04-09 19:27:54 +08:00
remitamine
bffb245a48 [aol] add support for videos with vidible IDs(closes #9124) 2016-04-09 10:51:23 +01:00
Yen Chi Hsuan
680efb6723 Merge pull request #8497 from jaimeMF/lazy-load
Add experimenta lazy loading of info extractors
2016-04-09 14:08:13 +08:00
Jaime Marquínez Ferrándiz
5a9858bfa9 setup.py: add command for building the lazy_extractors module 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
8a5dc1c1e1 lazy extractors: Initialize the real info extractor
According to the docs '__init__' is only called automatically if '__new__' returns an instance of the original class.
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
e0986e31cf lazy extractors: Output if it's enabled in the verbose log 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
6b97ca96fc lazy extractors: Style fixes
* Sort extractors alphabetically
* Add newlines when needed (youtube_dl/extractors/lazy_extractors.py pass the flake8 test now)
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
c1ce6acdd7 lazy extractors: Fix building with python2.6 2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
0d778b1db9 lazy extractors: specify the encoding
When building with python3 the unicode characters are not escaped, python2 needs to know the encoding.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
779822d945 Add experimental support for lazy loading the info extractors
'make lazy-extractors' creates the youtube_dl/extractor/lazy_extractors.py (imported by youtube_dl/extractor/__init__.py), which contains simplified classes that only have the 'suitable' class method and that load the appropiate class with the '__new__' method when a instance is created.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
1b3d5e05a8 Move the extreactors import to youtube_dl/extractor/extractors.py 2016-04-08 21:47:51 +02:00
Jaime Marquínez Ferrándiz
e52d7f85f2 Delay initialization of InfoExtractors until they are needed 2016-04-08 21:43:24 +02:00
Sergey M․
568d2f78d6 [tnaflix] Fix metadata extraction 2016-04-09 00:27:24 +06:00
Sergey M․
2f2fcf1a33 [tnaflix] Fix extraction (Closes #9074) 2016-04-08 23:34:59 +06:00
Sergey M․
bacec0397f [extractor/common] Relax _hidden_inputs 2016-04-08 23:33:45 +06:00
Sergey M․
3c6c7e7d7e [gdcvault] Fix extraction (Closes #9107, closes #9114) 2016-04-08 23:16:02 +06:00
Sergey M․
fb38aa8b53 [extractor/common] Support arbitrary format strings for template based identifiers in mpd manifests (Closes #9119, closes #9120) 2016-04-08 22:48:08 +06:00
Sergey M․
18da24634c [democracynow] Improve extraction 2016-04-08 22:27:27 +06:00
Sergey M․
a134426d61 [democracynow] Fix tests 2016-04-08 22:21:14 +06:00
Sergey M․
a64c0c9b06 [democracynow] Make description optional (Closes #9115) 2016-04-08 22:15:36 +06:00
Sergey M․
56019444cb [novamov] Improve _VALID_URL template (Closes #9116) 2016-04-08 21:26:42 +06:00
remitamine
a1ff3cd5f9 [acast] fix channel extraction(closes #9117) 2016-04-08 15:15:34 +01:00
remitamine
9a32e80477 [acast] fix extraction(#9117) 2016-04-08 14:51:00 +01:00
Sergey M․
536a55dabd [YoutubeDL] Sanitize single thumbnail URL 2016-04-08 00:17:47 +06:00
Sergey M․
ed6fb8b804 [vrt] Add support for direct hls playlists and YouTube (Closes #9108) 2016-04-07 23:22:43 +06:00
Sergey M․
3afef2e3fc [beeg] Improve extraction 2016-04-07 22:40:35 +06:00
Sergey M․
e90d175436 [yandexmusic] Extract music album metafields (Closes #7354) 2016-04-07 02:56:13 +06:00
Sergey M․
7a93ab5f3f [extractor/common] Introduce music album metafields 2016-04-07 02:53:53 +06:00
Philipp Hagemeister
c41cf65d4a release 2016.04.06 2016-04-06 15:13:08 +02:00
Jaime Marquínez Ferrándiz
ec4a4c6fcc Makefile: remove ISSUE_TEMPLATE.md from the 'all' target (fixes #9088)
It isn't included in the tar file, causing build failures.
Since it's only used for GitHub, I think we don't need to store it in the tar file.
2016-04-06 14:16:05 +02:00
Jaime Marquínez Ferrándiz
be0c7009fb Makefile: use full path for the ISSUE_TEMPLATE.md file 2016-04-06 14:09:31 +02:00
Yen Chi Hsuan
92d5477d84 [compat] Handle tuples properly in urlencode()
Fixes #9055
2016-04-06 18:29:54 +08:00
Yen Chi Hsuan
8790249c68 [iqiyi] Improve error detection for VIP-only videos
Closes #9071
2016-04-06 16:12:16 +08:00
Philipp Hagemeister
416930d450 release 2016.04.05 2016-04-05 18:36:24 +02:00
Sergey M․
65150b41bb [deezer] Fix extraction (Closes #9086) 2016-04-05 22:27:33 +06:00
Sergey M․
e42f413716 [rte] Improve thumbnail extraction (Closes #9085) 2016-04-05 22:23:20 +06:00
Sergey M․
40a056d85d [extractor/__init__] Remove novamov extractor and sort novamov based extractors alphabetically 2016-04-05 21:54:09 +06:00
Sergey M․
e7d77efb9d [auroravid] Add extractor (Closes #9070) 2016-04-05 21:52:07 +06:00
Sergey M․
995cf05c96 [novamov] Make title fatal 2016-04-05 21:40:43 +06:00
Jaime Marquínez Ferrándiz
5bf28d7864 [utils] dfxp2srt: add additional namespace
Used by the ZDF subtitles (#9081).
2016-04-04 20:46:35 +02:00
Jaime Marquínez Ferrándiz
8c7d6e8e22 [zdf] Extract subtitles (closes #9081) 2016-04-04 20:44:06 +02:00
Sergey M․
6d4fc66bfc [youtube] Add support for zwearz (Closes #9062) 2016-04-04 02:26:20 +06:00
remitamine
23576edbfc [brightcove:legacy] skip None value for uploader_id 2016-04-02 21:31:21 +01:00
remitamine
4d4cd35f48 [brightcove:legacy] extract uploader_id as a string 2016-04-02 20:55:44 +01:00
remitamine
3aac9b2fb1 [nowness] update tests 2016-04-02 18:57:15 +01:00
remitamine
e47d19e991 [brightcove:new] extract subtitles and strip video title 2016-04-02 18:57:15 +01:00
remitamine
41f5492fbc [brightcove:legacy] improve format extraction and extract uploader_id, duration and timestamp 2016-04-02 18:57:15 +01:00
Jaime Marquínez Ferrándiz
2defa7d75a [instagram:user] Fix extraction (fixes #9059)
The URL for the next page was incorrect and we always got the same page, therefore it got trapped in an infinite loop.
2016-04-02 18:03:56 +02:00
Sergey M․
bbc26c8a01 [bbc] Set vcodec to none for audio formats 2016-04-02 19:00:38 +06:00
Sergey M․
b507cc925b [extractor/common] Carry long line 2016-04-02 18:49:58 +06:00
Sergey M․
db8ee7ec05 [extractor/common] Fix numeric identifiers conversion in DASH URL templates 2016-04-02 18:48:05 +06:00
remitamine
08136dc138 [brightcove] fix format sorting 2016-04-02 10:57:57 +01:00
remitamine
fe7ef95e91 [cbsinteractive] Add support for ZDNet videos 2016-04-01 23:53:32 +01:00
remitamine
5f705baf5e [cnet] extract more formats 2016-04-01 20:42:15 +01:00
remitamine
0750b2491f [ffmpeg] try to convert tt subtitles usng dfxp2srt 2016-04-01 19:47:49 +01:00
remitamine
df634be2ed [common] prefer using mime type over ext for smil subtitle extraction
the subtitle ext for http://www.cnet.com/videos/download-amazon-prime-movies-and-tv/
is adb_xml while using the mime type it get tt(application/smptett+xml)
2016-04-01 19:47:49 +01:00
Jaime Marquínez Ferrándiz
6d628fafca [camwithher] Remove extra blank line 2016-04-01 20:45:21 +02:00
Jaime Marquínez Ferrándiz
0f28777f58 [cbsnews] Remove unused import 2016-04-01 20:43:14 +02:00
Jaime Marquínez Ferrándiz
329c1eae54 [aenetworks] Make pep8 happy 2016-04-01 20:42:19 +02:00
Sergey M․
9aaaf8e8e8 [camwithher] Improve extraction (Closes #8989) 2016-04-01 23:47:27 +06:00
theGeekPirate
04819db58e [camwithher] Add extractor
Corrected unnecessary test

Sane variable naming

RTMP all .flv & url_id for _download_webpage()

Corrected all outstanding issues, next up is a squash!
2016-04-01 23:44:25 +06:00
remitamine
79ba9140dc [theplatform] extract timestamp and uploader 2016-04-01 18:07:17 +01:00
Sergey M․
75d572e9fb [screencast] Improve title regexes (Closes #9025) 2016-04-01 23:01:55 +06:00
Martin Trigaux
791d6aaecc screencast.com: fallback on page title
When determining the title of the page, use the <title> tag of the page
2016-04-01 23:00:52 +06:00
Sergey M․
81de73e5b4 [screencast] Add test 2016-04-01 23:00:45 +06:00
Martin Trigaux
83cedc1cf2 screencast.com: support missing www
The "www." part of the URL is not mandatory
2016-04-01 22:58:16 +06:00
Sergey M․
244cd04237 [pluralsight] Remove unnecessary login/password encode 2016-04-01 22:46:46 +06:00
Sergey M․
fbdaced256 [lynda] Remove unnecessary login/password encode 2016-04-01 22:45:20 +06:00
Sergey M․
a3373823e1 [udemy] Remove unnecessary login/password encode
This is now covered by compat_urllib_parse_urlencode
2016-04-01 22:42:09 +06:00
Sergey M․
03caa463e7 [udemy:course] Skip non-video lectures 2016-04-01 22:38:56 +06:00
remitamine
3f64379eda [movieclips] fix extraction 2016-04-01 16:22:06 +01:00
remitamine
3e0c3d14d9 [cbs] add base extractor 2016-04-01 10:12:29 +01:00
remitamine
d8873d4def [aenetworks] improve format extraction 2016-04-01 09:58:02 +01:00
remitamine
db1c969da5 [theplatform] sign https urls 2016-04-01 09:58:02 +01:00
Philipp Hagemeister
1e02bc7ba2 release 2016.04.01 2016-04-01 09:07:40 +02:00
remitamine
63c55e9f22 [cbs] improve extraction(closes #6321) 2016-04-01 07:33:37 +01:00
remitamine
f9b1529af8 [generic] remove sbnation test(handled by VoxMediaIE) 2016-03-31 23:50:45 +01:00
remitamine
961fc024d2 [voxmedia] improve sbnation support 2016-03-31 23:33:36 +01:00
Sergey M․
b53a06e3b9 [udemy:course] Use new URL format 2016-04-01 02:24:22 +06:00
remitamine
4ecc1fc638 [howstuffworks] improve extraction 2016-03-31 21:11:58 +01:00
Yen Chi Hsuan
5b012dfce8 [tudou] Improve error handling (closes #8988) 2016-04-01 01:42:16 +08:00
remitamine
8369942773 [voxmedia] Add new extractor(closes #3182) 2016-03-31 18:36:41 +01:00
Sergey M․
86f3b66cec [udemy] Remove unused import 2016-03-31 23:00:11 +06:00
Sergey M․
6bb4600717 [udemy:course] Simplify course curriculum downloading 2016-03-31 22:59:19 +06:00
Sergey M․
41d06b0424 [extractor/common] Improve _request_webpage
* Do not ignore data, headers and query for Requests
* Default values for headers and query switched to dicts since these are used by urllib itself
2016-03-31 22:58:38 +06:00
Sergey M․
15d260ebaa [utils] Use update_Request in http_request 2016-03-31 22:55:49 +06:00
Sergey M․
ed0291d153 [utils] Add update_Request 2016-03-31 22:55:01 +06:00
Sergey M․
81da8cbc45 [udemy] Switch to api 2.0 (Closes #9035) 2016-03-31 22:05:25 +06:00
Sergey M․
5299bc3f91 [beeg] Switch to api v6 (Closes #9036) 2016-03-31 20:42:41 +06:00
remitamine
c9c39c22c5 [nationalgeographic] add support for channel.nationalgeographic.com urls 2016-03-31 13:47:38 +01:00
remitamine
d84b48e3f1 [nationalgeographic] improve extraction 2016-03-31 13:44:55 +01:00
remitamine
dd17041c82 [tenplay] remove extractor(fixes #6927) 2016-03-31 12:02:04 +01:00
remitamine
fea7295b14 [brightcove] relax embed_in_page regex 2016-03-31 10:48:22 +01:00
remitamine
9cf01f7f30 [nbc] add new extractor for csnne.com(#5432) 2016-03-31 00:26:42 +01:00
remitamine
ce548296fe [cnbc] fix test 2016-03-31 00:25:11 +01:00
remitamine
c02ec7d430 [cnbc] Add new extractor(closes #8012) 2016-03-30 23:18:31 +01:00
remitamine
6b820a2376 [myspace] improve extraction 2016-03-30 21:18:07 +01:00
Yen Chi Hsuan
e621a344e6 [kwuo] Port to new API and enable --cn-verification-proxy 2016-03-31 02:27:52 +08:00
Yen Chi Hsuan
3ae6f8fec1 [kwuo] Remove _sort_formats() from KuwoBaseIE._get_formats()
Following the idea proposed in 19dbaeece3
2016-03-31 02:11:21 +08:00
Yen Chi Hsuan
597d52fadb [kuwo:song] Correct song ID extraction (fixes #9033)
Bug introduced in daef04a4e7.
2016-03-31 02:00:50 +08:00
Sergey M․
afca767d19 [tumblr] Improve _VALID_URL (Closes #9027) 2016-03-30 22:26:43 +06:00
remitamine
6e359a1534 [comcarcoff] don not depend on crackle extractor(closes #8995)
previously extraction has been delegated to crackle to extract more info
and subtitles #6106 but some of the episodes can't be extracted using
crackle #8995.
2016-03-30 12:27:00 +01:00
Sergey M․
607619bc90 Add manually generated ISSUE_TEMPLATE.md
In order not to wait for the next release
2016-03-29 22:04:29 +06:00
Sergey M․
0b7bfc9422 Improve ISSUE_TEMPLATE_tmpl.md 2016-03-29 22:02:42 +06:00
Sergey M․
7168a6c874 [devscripts/make_issue_template] Fix __version__ again 2016-03-29 03:05:15 +06:00
Sergey M․
034947dd1e Rename ISSUE_TEMPLATE.tmpl in order not to be picked up by github 2016-03-29 02:48:04 +06:00
Sergey M․
3c0de33ad7 Remove ISSUE_TEMPLATE.md 2016-03-29 02:43:48 +06:00
Sergey M․
89924f8230 [devscripts/make_issue_template] Fix NameError under python3 2016-03-29 02:41:27 +06:00
Sergey M․
a39c68f7e5 Exclude make_issue_template.py from flake8 2016-03-29 02:19:24 +06:00
Sergey M․
4a5a67ca25 [devscripts/release.sh] Make ISSUE_TEMPLATE.md and commit it 2016-03-29 02:18:52 +06:00
Sergey M․
8751da85a7 [Makefile] Fix ISSUE_TEMPLATE.md target 2016-03-29 02:17:57 +06:00
Sergey M․
3bf1df51fd [devscripts/make_issue_template] Rework to use ISSUE_TEMPLATE.tmpl (Closes #8785) 2016-03-29 02:16:38 +06:00
Sergey M․
3842a3e652 Add ISSUE_TEMPLATE.tmpl as template for ISSUE_TEMPLATE.md 2016-03-29 02:15:26 +06:00
Sander van den Oever
7710bdf4e8 Add initial ISSUE_TEMPLATE
Add auto-updating of youtube-dl version in ISSUE_TEMPLATE

Move parts of template text and adopt makefile to new format

Moved the 'kind-of-issue' section and rephrased a bit

Rephrased and moved Example URL section upwards

Moved ISSUE_TEMPLATE inside .github folder.

Update makefile to match new folderstructure
2016-03-28 22:43:13 +06:00
Sergey M
8d9dd3c34b [README.md] Add format_id to the list of string meta fields available for use in format selection 2016-03-28 03:08:34 +05:00
Sergey M․
33f3040a3e [YoutubeDL] Fix sanitizing subtitles' url 2016-03-28 03:13:39 +06:00
Sergey M․
03442072c0 [pornhub] Fix typo (Closes #9008) 2016-03-28 01:21:44 +06:00
Sergey M․
c8b13fec02 [foxnews] Restore upload time fields in test 2016-03-28 01:14:12 +06:00
Sergey M․
87d105ac6c [amp] Fix upload timestamp extraction (Closes #9007) 2016-03-28 01:13:47 +06:00
Sergey M․
3454139576 [pornhub:uservideos] Add support for multipage videos (Closes #9006) 2016-03-28 00:50:46 +06:00
Sergey M․
3a23bae9cc [pornhub:playlistbase] Do not include videos not from playlist 2016-03-28 00:32:57 +06:00
Sergey M․
8f9a477e7f [pornhub:playlistbase] Use orderedSet 2016-03-28 00:21:08 +06:00
Sergey M․
a1cf3e38a3 [bbc] Extend vpid regex (Closes #9003) 2016-03-27 23:22:51 +06:00
Philipp Hagemeister
a122e7080b release 2016.03.27 2016-03-27 16:56:33 +02:00
Sergey M․
b22ca76204 [extractor/common] Filter out unsupported encrypted media for f4m formats (Closes #8573) 2016-03-27 07:42:38 +06:00
Sergey M․
f7df343b4a [downloader/f4m] Extract routine for removing unsupported encrypted media 2016-03-27 07:41:19 +06:00
Sergey M․
19dbaeece3 Remove _sort_formats from _extract_*_formats methods
Now _sort_formats should be called explicitly.
_sort_formats has been added to all the necessary places in code.

Closes #8051
2016-03-27 07:03:08 +06:00
Yen Chi Hsuan
395fd4b08a [twitter] Handle another form of embedded Vine
Fixes #8996
2016-03-27 04:36:02 +08:00
Sergey M․
8018028d0f [pluralsight] Extract chapter metadata (Closes #8993) 2016-03-27 02:10:52 +06:00
Sergey M․
00322ad4fd [lynda] Extract chapter metadata (#8993) 2016-03-27 02:00:36 +06:00
Sergey M․
4cf3489c6e [vevo] Update videoservice API URL (Closes #8900) 2016-03-27 01:11:11 +06:00
Sergey M․
b24ab3e341 [udemy] Improve paid course detection 2016-03-27 00:09:12 +06:00
Sergey M․
af4116f4f0 [udemy] Improve format_id 2016-03-27 00:02:52 +06:00
Sergey M․
f973e5d54e [udemy] Drop outputs' formats
Always results in 403
2016-03-26 23:55:07 +06:00
Sergey M․
62f55aa68a [udemy] Add outputs metadata to view_html formats 2016-03-26 23:54:12 +06:00
Sergey M․
02d7634d24 [udemy] Fix outputs' formats format_id 2016-03-26 23:43:25 +06:00
Sergey M․
48dce58ca9 [udemy] Use custom sorting 2016-03-26 23:42:46 +06:00
Sergey M․
efcba804f6 [udemy] Extract formats from view_html (Closes #8979) 2016-03-26 23:42:34 +06:00
Sergey M․
6dee688e6d [youtube:playlistsbase] Restrict playlist regex (Closes #8986) 2016-03-26 20:42:18 +06:00
Sergey M․
eedb7ba536 [YoutubeDL] Sort imports 2016-03-26 19:40:33 +06:00
Sergey M․
dcf77cf1a7 [YoutubeDL] Sanitize final URLs (Closes #8991) 2016-03-26 19:37:41 +06:00
Sergey M․
17bcc626bf [utils] Extract sanitize_url routine 2016-03-26 19:33:57 +06:00
Sergey M․
b5a5bbf376 [mailru] Extend _VALID_URL (Closes #8990) 2016-03-26 19:15:32 +06:00
Yen Chi Hsuan
e68d3a010f [twitter] Fix extraction (closes #8966)
HLS and DASH formats are no longer appeared in test cases. I keep them
for fear of triggering new errors.
2016-03-26 18:34:51 +08:00
Yen Chi Hsuan
d10fe8358c [generic] Add a test case for brightcove embed
Closes #8862
2016-03-26 18:30:43 +08:00
Yen Chi Hsuan
d6c340cae5 [brightcove] Extract more formats (#8862) 2016-03-26 18:21:07 +08:00
Yen Chi Hsuan
5964b598ff [brightcove] Support alternative BrightcoveExperience layout
The full URL lays in the `data` attribute of <object> (#8862)
2016-03-26 17:47:32 +08:00
173 changed files with 4668 additions and 2719 deletions

58
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

58
.github/ISSUE_TEMPLATE_tmpl.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

1
.gitignore vendored
View File

@@ -13,6 +13,7 @@ README.txt
youtube-dl.1
youtube-dl.bash-completion
youtube-dl.fish
youtube_dl/extractor/lazy_extractors.py
youtube-dl
youtube-dl.exe
youtube-dl.tar.gz

View File

@@ -167,3 +167,4 @@ Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi
Philip Huppert

View File

@@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -59,6 +59,9 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md youtube_dl/version.py
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
@@ -85,6 +88,12 @@ youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \

View File

@@ -176,7 +176,9 @@ which means you can modify it, redistribute it or use it however you like.
--xattr-set-filesize Set file xattribute ytdl.filesize with
expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental)
ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
downloader
--hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while
downloading (some players may not be able
@@ -515,6 +517,18 @@ Available for the video that is an episode of some series or programme:
- `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode
Available for the media that is a track or a part of a music album:
- `track`: Title of the track
- `track_number`: Number of the track within an album or a disc
- `track_id`: Id of the track
- `artist`: Artist(s) of the track
- `genre`: Genre(s) of the track
- `album`: Title of the album the track belongs to
- `album_type`: Type of the album
- `album_artist`: List of all artists appeared on the album
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
@@ -600,6 +614,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
@@ -888,14 +903,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -0,0 +1,19 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
class LazyLoadExtractor(object):
_module = None
@classmethod
def ie_key(cls):
return cls.__name__[:-2]
def __new__(cls, *args, **kwargs):
mod = __import__(cls._module, fromlist=(cls.__name__,))
real_cls = getattr(mod, cls.__name__)
instance = real_cls.__new__(real_cls)
instance.__init__(*args, **kwargs)
return instance

View File

@@ -0,0 +1,29 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,63 @@
from __future__ import unicode_literals, print_function
from inspect import getsource
import os
from os.path import dirname as dirn
import sys
print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
lazy_extractors_filename = sys.argv[1]
if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)]
ie_template = '''
class {name}(LazyLoadExtractor):
_VALID_URL = {valid_url!r}
_module = '{module}'
'''
make_valid_template = '''
@classmethod
def _make_valid_url(cls):
return {valid_url!r}
'''
def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format(
name=name,
valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += '\n' + getsource(ie.suitable)
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(valid_url=ie._make_valid_url())
return s
names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]:
name = ie.ie_key() + 'IE'
src = build_lazy_ie(ie, name)
module_contents.append(src)
names.append(name)
module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names)))
module_src = '\n'.join(module_contents) + '\n'
with open(lazy_extractors_filename, 'wt') as f:
f.write(module_src)

View File

@@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation and youtube_dl/version.py..."
make README.md CONTRIBUTING.md supportedsites
git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -50,6 +50,7 @@
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **AtresPlayer**
- **ATTTechChannel**
@@ -57,6 +58,7 @@
- **AudioBoom**
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
@@ -92,12 +94,14 @@
- **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSInteractive**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
@@ -112,13 +116,14 @@
- **Cinemassacre**
- **Clipfish**
- **cliphunter**
- **ClipRs**
- **Clipsyndicate**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
- **Clyp**
- **cmt.com**
- **CNET**
- **CNBC**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -134,6 +139,7 @@
- **CrooksAndLiars**
- **Crunchyroll**
- **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
@@ -157,6 +163,7 @@
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
- **Dotsub**
@@ -168,7 +175,6 @@
- **Dropbox**
- **DrTuber**
- **DRTV**
- **Dump**
- **Dumpert**
- **dvtv**: http://video.aktualne.cz/
- **dw**
@@ -282,7 +288,6 @@
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
- **Izlesene**
- **JadoreCettePub**
- **JeuxVideo**
- **Jove**
- **jpopsuki.tv**
@@ -340,19 +345,22 @@
- **metacafe**
- **Metacritic**
- **Mgoon**
- **MGTV**: 芒果TV
- **Minhateca**
- **MinistryGrid**
- **Minoto**
- **miomio.tv**
- **MiTele**: mitele.es
- **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net
- **mooshare**: Mooshare.biz
- **Morningstar**: morningstar.com
- **Motherless**
- **Motorsport**: motorsport.com
@@ -376,7 +384,8 @@
- **myvideo** (Currently broken)
- **MyVidster**
- **n-tv.de**
- **NationalGeographic**
- **natgeo**
- **natgeo:channel**
- **Naver**
- **NBA**
- **NBC**
@@ -388,7 +397,6 @@
- **ndr:embed:base**
- **NDTV**
- **NerdCubedFeed**
- **Nerdist**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
@@ -406,7 +414,8 @@
- **nfl.com**
- **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category
- **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
@@ -416,7 +425,6 @@
- **Normalboots**
- **NosVideo**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- **novamov**: NovaMov
- **nowness**
- **nowness:playlist**
- **nowness:series**
@@ -455,13 +463,13 @@
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **People**
- **Periscope**: Periscope
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
- **Pinkbike**
- **Pladform**
- **PlanetaPlay**
- **play.fm**
- **played.to**
- **PlaysTV**
@@ -480,6 +488,7 @@
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
- **PressTV**
- **PrimeShareTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
@@ -490,7 +499,6 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid**
- **R7**
- **radio.de**
- **radiobremen**
@@ -604,6 +612,7 @@
- **Tagesschau**
- **Tapely**
- **Tass**
- **TDSLifeway**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
@@ -618,10 +627,8 @@
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **TenPlay**
- **TF1**
- **TheIntercept**
- **TheOnion**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
@@ -680,7 +687,6 @@
- **twitter**
- **twitter:amplify**
- **twitter:card**
- **Ubu**
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
@@ -740,6 +746,7 @@
- **vlive**
- **Vodlocker**
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**
- **vpro**: npo.nl and ntr.nl
- **VRT**
@@ -749,7 +756,6 @@
- **Walla**
- **WashingtonPost**
- **wat.tv**
- **WayOfTheMaster**
- **WDR**
- **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus

View File

@@ -2,5 +2,5 @@
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731

View File

@@ -8,11 +8,12 @@ import warnings
import sys
try:
from setuptools import setup
from setuptools import setup, Command
setuptools_available = True
except ImportError:
from distutils.core import setup
from distutils.core import setup, Command
setuptools_available = False
from distutils.spawn import spawn
try:
# This will create an exe that needs Microsoft Visual C++ 2008
@@ -70,6 +71,22 @@ else:
else:
params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module"
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
spawn(
[sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
dry_run=self.dry_run,
)
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
@@ -107,5 +124,6 @@ setup(
"Programming Language :: Python :: 3.4",
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},
**params
)

View File

@@ -143,6 +143,9 @@ def expect_value(self, got, expected, field):
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
self.assertTrue(

View File

@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
class TestIE(InfoExtractor):
@@ -66,5 +67,14 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6')
def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
if __name__ == '__main__':
unittest.main()

View File

@@ -76,6 +76,10 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@@ -20,6 +20,7 @@ from youtube_dl.utils import (
args_to_str,
encode_base_n,
clean_html,
date_from_str,
DateRange,
detect_exe_version,
determine_ext,
@@ -234,6 +235,13 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
def test_daterange(self):
_20century = DateRange("19000101", "20000101")
self.assertFalse("17890714" in _20century)
@@ -405,6 +413,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
def test_fix_xml_ampersands(self):
self.assertEqual(

View File

@@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 20)
self.assertTrue(len(entries) >= 50)
original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')

View File

@@ -39,6 +39,8 @@ from .compat import (
compat_urllib_request_DataHandler,
)
from .utils import (
age_restricted,
args_to_str,
ContentTooShortError,
date_from_str,
DateRange,
@@ -58,13 +60,16 @@ from .utils import (
PagedList,
parse_filesize,
PerRequestProxyHandler,
PostProcessingError,
platform_name,
PostProcessingError,
preferredencoding,
prepend_extension,
render_table,
replace_extension,
SameFileError,
sanitize_filename,
sanitize_path,
sanitize_url,
sanitized_Request,
std_headers,
subtitles_filename,
@@ -75,13 +80,9 @@ from .utils import (
write_string,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
prepend_extension,
replace_extension,
args_to_str,
age_restricted,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
@@ -259,7 +260,9 @@ class YoutubeDL(object):
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see youtube_dl/downloader/common.py):
@@ -377,8 +380,9 @@ class YoutubeDL(object):
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
if not isinstance(ie, type):
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
@@ -396,7 +400,7 @@ class YoutubeDL(object):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
"""
for ie in gen_extractors():
for ie in gen_extractor_classes():
self.add_info_extractor(ie)
def add_post_processor(self, pp):
@@ -660,6 +664,7 @@ class YoutubeDL(object):
if not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
if not ie.working():
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
@@ -1229,6 +1234,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
@@ -1238,7 +1244,10 @@ class YoutubeDL(object):
self.list_thumbnails(info_dict)
return
if thumbnails and 'thumbnail' not in info_dict:
thumbnail = info_dict.get('thumbnail')
if thumbnail:
info_dict['thumbnail'] = sanitize_url(thumbnail)
elif thumbnails:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
@@ -1263,6 +1272,8 @@ class YoutubeDL(object):
if subtitles:
for _, subtitle in subtitles.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@@ -1292,6 +1303,8 @@ class YoutubeDL(object):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
@@ -1948,6 +1961,8 @@ class YoutubeDL(object):
write_string(encoding_str, encoding=None)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
if _LAZY_LOADER:
self._write_string('[debug] Lazy loading extractors enabled' + '\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],

View File

@@ -181,7 +181,8 @@ except ImportError: # Python 2
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
e = encode_list(e)
list_e = encode_list(e)
e = tuple(list_e) if isinstance(e, tuple) else list_e
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e

View File

@@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}):
if ed.can_download(info_dict):
return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'):
if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
return HlsFD
if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
return FFmpegFD
return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@@ -225,7 +225,7 @@ class FFmpegFD(ExternalFD):
args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8':
if self.params.get('hls_use_mpegts', False):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']

View File

@@ -223,6 +223,12 @@ def write_metadata_tag(stream, metadata):
write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
def remove_encrypted_media(media):
return list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop
@@ -244,9 +250,7 @@ class F4mFD(FragmentFD):
# without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute
if 'id' not in e.attrib:
self.report_error('Missing ID in f4m DRM')
media = list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
media = remove_encrypted_media(media)
if not media:
self.report_error('Unsupported DRM')
return media

View File

@@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
return False
self._debug_cmd(args)
retval = subprocess.call(args)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))

File diff suppressed because it is too large Load Diff

View File

@@ -44,6 +44,7 @@ class Abc7NewsIE(InfoExtractor):
'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()

View File

@@ -2,10 +2,14 @@
from __future__ import unicode_literals
import re
import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
from ..utils import (
int_or_none,
OnDemandPagedList,
)
class ACastIE(InfoExtractor):
@@ -26,13 +30,8 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
embed_page = self._download_webpage(
re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
cast_data = self._parse_json(self._search_regex(
r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
display_id)['GetAcast/%s/%s' % (channel, display_id)]
cast_data = self._download_json(
'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
return {
'id': compat_str(cast_data['id']),
'display_id': display_id,
@@ -58,15 +57,26 @@ class ACastChannelIE(InfoExtractor):
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
channel_data = self._download_json(self._API_BASE_URL + 'channels/%s' % display_id, display_id)
casts = self._download_json(self._API_BASE_URL + 'channels/%s/acasts' % display_id, display_id)
entries = [self.url_result('https://www.acast.com/%s/%s' % (display_id, cast['url']), 'ACast') for cast in casts]
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
return self.playlist_result(entries, compat_str(channel_data['id']), channel_data['name'], channel_data.get('description'))
def _real_extract(self, url):
channel_slug = self._match_id(url)
channel_data = self._download_json(
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, channel_slug), self._PAGE_SIZE)
return self.playlist_result(entries, compat_str(
channel_data['id']), channel_data['name'], channel_data.get('description'))

View File

@@ -1,13 +1,19 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
unescapeHTML,
)
class AENetworksIE(InfoExtractor):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
@@ -16,6 +22,9 @@ class AENetworksIE(InfoExtractor):
'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
@@ -25,15 +34,15 @@ class AENetworksIE(InfoExtractor):
'expected_warnings': ['JSON-LD'],
}, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {
# m3u8 download
'skip_download': True,
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
@@ -48,7 +57,7 @@ class AENetworksIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
@@ -56,11 +65,23 @@ class AENetworksIE(InfoExtractor):
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = self._search_regex(video_url_re, webpage, 'video url')
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent',
'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
'url': smuggle_url(
update_url_query(video_url, query),
{
'sig': {
'key': 'crazyjava',
'secret': 's3cr3t'},
'force_smil_url': True
}),
})
return info

View File

@@ -69,12 +69,14 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,
'title': get_media_node('title'),
'description': get_media_node('description'),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '),
'timestamp': timestamp,
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
'subtitles': subtitles,
'formats': formats,

View File

@@ -1,26 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
_TESTS = [{
# video with 5min ID
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
'md5': '18ef68f48740e86ae94b98da815eec42',
'info_dict': {
'id': '518167793',
'ext': 'mp4',
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
'timestamp': 1395405060,
'upload_date': '20140321',
'uploader': 'Newsy Studio',
},
'add_ie': ['FiveMin'],
'params': {
# m3u8 download
'skip_download': True,
}
}, {
# video with vidible ID
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
'info_dict': {
'id': '5707d6b8e4b090497b04f706',
'ext': 'mp4',
'title': 'Netflix is Raising Rates',
'description': 'Netflix is rewarding millions of its long-standing members with an increase in cost. Veuers Carly Figueroa has more.',
'upload_date': '20160408',
'timestamp': 1460123280,
'uploader': 'Veuer',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
'only_matching': True,
}, {
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
video_id)['response']
if response['statusText'] != 'Ok':
raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
video_data = response['data']
formats = []
m3u8_url = video_data.get('videoMasterPlaylist')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []):
video_url = rendition.get('url')
if not video_url:
continue
ext = rendition.get('format')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
f = {
'url': video_url,
'format_id': rendition.get('quality'),
}
mobj = re.search(r'(\d+)x(\d+)', video_url)
if mobj:
f.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
})
formats.append(f)
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': video_data['title'],
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('publishDate')),
'view_count': int_or_none(video_data.get('views')),
'description': video_data.get('description'),
'uploader': video_data.get('videoOwner'),
'formats': formats,
}
class AolFeaturesIE(InfoExtractor):

View File

@@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'srt',
'ext': 'ttml',
'url': subtitle_url,
}]

View File

@@ -210,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@@ -229,9 +229,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
@@ -337,7 +355,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/playerv2/embed\.php\?json_url=
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*

View File

@@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
# audiomack wrapper around soundcloud song
{
'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': {
'id': '172419696',
'id': '258901379',
'ext': 'mp3',
'description': 'md5:1fc3272ed7a635cce5be1568c2822997',
'title': 'Young Thug ft Lil Wayne - Take Kare',
'uploader': 'Young Thug World',
'upload_date': '20141016',
'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'ILOVEMAKONNEN',
'upload_date': '20160414',
}
},
]

View File

@@ -120,6 +120,7 @@ class AzubuLiveIE(InfoExtractor):
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
self._sort_formats(formats)
return {
'id': info['id'],

View File

@@ -328,6 +328,7 @@ class BBCCoUkIE(InfoExtractor):
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
@@ -670,6 +671,7 @@ class BBCIE(BBCCoUkIE):
'info_dict': {
'id': '34475836',
'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
},
'playlist_count': 3,
}, {
@@ -688,6 +690,10 @@ class BBCIE(BBCCoUkIE):
# custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}]
@classmethod
@@ -817,7 +823,7 @@ class BBCIE(BBCCoUkIE):
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="(%s)"' % self._ID_REGEX,
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
webpage, 'vpid', default=None)

View File

@@ -33,8 +33,33 @@ class BeegIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url')
beeg_version, beeg_salt = [None] * 2
if cpl_url:
cpl = self._download_webpage(
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = self._search_regex(
r'beeg_version\s*=\s*(\d+)', cpl,
'beeg version', default=None) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '1750'
beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
video = self._download_json(
'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id)
def split(o, e):
def cut(s, x):
@@ -50,8 +75,8 @@ class BeegIE(InfoExtractor):
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1105.js
a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB'
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = beeg_salt
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
@@ -101,5 +126,5 @@ class BeegIE(InfoExtractor):
'duration': duration,
'tags': tags,
'formats': formats,
'age_limit': 18,
'age_limit': self._rta_search(webpage),
}

View File

@@ -94,6 +94,7 @@ class BetIE(InfoExtractor):
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -15,6 +15,9 @@ class BravoTVIE(InfoExtractor):
'ext': 'mp4',
'title': 'Last Chance Kitchen Returns',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV',
}
}

View File

@@ -46,6 +46,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
{
@@ -57,6 +60,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle',
'timestamp': 1344975024,
'upload_date': '20120814',
'uploader_id': '1460825906',
},
},
{
@@ -68,6 +74,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
},
},
{
@@ -85,14 +94,17 @@ class BrightcoveLegacyIE(InfoExtractor):
{
# test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download
'info_dict': {
'id': '2996102916001',
'id': '3750436379001',
'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV',
'uploader': 'RBTV Old (do not use)',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'timestamp': 1409122195,
'upload_date': '20140827',
'uploader_id': '710858724001',
},
},
{
@@ -106,6 +118,12 @@ class BrightcoveLegacyIE(InfoExtractor):
'playlist_mincount': 7,
},
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -136,13 +154,16 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
flashvars = {}
data_url = object_doc.attrib.get('data', '')
data_url_params = compat_parse_qs(compat_urllib_parse_urlparse(data_url).query)
def find_param(name):
if name in flashvars:
return flashvars[name]
node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None:
return node.attrib['value']
return None
return data_url_params.get(name)
params = {}
@@ -286,15 +307,19 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
publisher_id = video_info.get('publisherId')
info = {
'id': compat_str(video_info['id']),
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions')
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
@@ -315,19 +340,42 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv'
if ext is None:
ext = determine_ext(url)
size = rend.get('size')
formats.append({
tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
'filesize': size if size != 0 else None,
})
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
@@ -383,6 +431,7 @@ class BrightcoveNewIE(InfoExtractor):
'formats': 'mincount:41',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
@@ -426,7 +475,7 @@ class BrightcoveNewIE(InfoExtractor):
</video>.*?
<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
(\d+)/([\da-f-]+)_([^/]+)/index(?:\.min)?\.js
(\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
''', webpage):
entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@@ -467,7 +516,7 @@ class BrightcoveNewIE(InfoExtractor):
raise ExtractorError(json_data[0]['message'], expected=True)
raise
title = json_data['name']
title = json_data['name'].strip()
formats = []
for source in json_data.get('sources', []):
@@ -520,7 +569,7 @@ class BrightcoveNewIE(InfoExtractor):
f.update({
'url': src or streaming_src,
'format_id': build_format_id('http' if src else 'http-streaming'),
'preference': 2 if src else 1,
'source_preference': 0 if src else -1,
})
else:
f.update({
@@ -531,20 +580,22 @@ class BrightcoveNewIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
description = json_data.get('description')
thumbnail = json_data.get('thumbnail')
timestamp = parse_iso8601(json_data.get('published_at'))
duration = float_or_none(json_data.get('duration'), 1000)
tags = json_data.get('tags', [])
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
subtitles.setdefault(text_track.get('srclang'), []).append({
'url': text_track['src'],
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'description': json_data.get('description'),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'tags': tags,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
}

View File

@@ -0,0 +1,87 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
unified_strdate,
)
class CamWithHerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
_TESTS = [{
'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
'info_dict': {
'id': '5644',
'ext': 'flv',
'title': 'Periscope Tease',
'description': 'In the clouds teasing on periscope to my favorite song',
'duration': 240,
'view_count': int,
'comment_count': int,
'uploader': 'MileenaK',
'upload_date': '20160322',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flv_id = self._html_search_regex(
r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
# Video URL construction algorithm is reverse-engineered from cwhplayer.swf
rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
title = self._html_search_regex(
r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
description = self._html_search_regex(
r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
runtime = self._search_regex(
r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
if runtime:
runtime = re.sub(r'[\s-]', '', runtime)
duration = parse_duration(runtime)
view_count = int_or_none(self._search_regex(
r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
comment_count = int_or_none(self._search_regex(
r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
uploader = self._search_regex(
r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
upload_date = unified_strdate(self._search_regex(
r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
return {
'id': flv_id,
'url': rtmp_url,
'ext': 'flv',
'no_resume': True,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -33,6 +33,7 @@ class CBCIE(InfoExtractor):
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download

View File

@@ -1,24 +1,40 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import (
sanitized_Request,
smuggle_url,
xpath_text,
xpath_element,
int_or_none,
find_xpath_attr,
)
class CBSIE(InfoExtractor):
class CBSBaseIE(ThePlatformIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '4JUVEwq3wUT7',
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'flv',
'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495,
'timestamp': 1385585425,
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
},
'params': {
# rtmp download
@@ -47,22 +63,46 @@ class CBSIE(InfoExtractor):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url):
display_id = self._match_id(url)
request = sanitized_Request(url)
# Android UA is served with higher quality (720p) streams (see
# https://github.com/rg3/youtube-dl/issues/7490)
request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
webpage = self._download_webpage(request, display_id)
real_id = self._search_regex(
[r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
webpage, 'real video ID')
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
{'force_smil_url': True}),
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
}
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@@ -1,12 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE
from ..utils import int_or_none
class CNETIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
class CBSInteractiveIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
_TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'info_dict': {
@@ -17,6 +19,8 @@ class CNETIE(ThePlatformIE):
'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
'uploader': 'Sarah Mitroff',
'duration': 70,
'timestamp': 1396479627,
'upload_date': '20140402',
},
}, {
'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
@@ -28,15 +32,38 @@ class CNETIE(ThePlatformIE):
'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
'uploader': 'Ashley Esqueda',
'duration': 1482,
'timestamp': 1433289889,
'upload_date': '20150603',
},
}, {
'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
'info_dict': {
'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
'ext': 'mp4',
'title': 'Video: Keeping Android smartphones and tablets secure',
'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
'uploader': 'Adrian Kingsley-Hughes',
'timestamp': 1448961720,
'upload_date': '20151201',
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
MPX_ACCOUNTS = {
'cnet': 2288573011,
'zdnet': 2387448114,
}
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"data-cnet-video(?:-uvp)?-options='([^']+)'",
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json')
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0]
@@ -51,16 +78,15 @@ class CNETIE(ThePlatformIE):
uploader = None
uploader_id = None
metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id)
description = vdata.get('description') or metadata.get('description')
duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
formats = []
subtitles = {}
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue
release_url = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true' % vid
release_url = self.TP_RELEASE_URL_TEMPLATE % vid
if fkey == 'hds':
release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
@@ -68,15 +94,15 @@ class CNETIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': metadata.get('thumbnail'),
'duration': duration,
'duration': int_or_none(vdata.get('duration')),
'uploader': uploader,
'uploader_id': uploader_id,
'subtitles': subtitles,
'formats': formats,
}
})
return info

View File

@@ -2,14 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from .cbs import CBSBaseIE
from ..utils import (
parse_duration,
find_xpath_attr,
)
class CBSNewsIE(ThePlatformIE):
class CBSNewsIE(CBSBaseIE):
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@@ -49,15 +48,6 @@ class CBSNewsIE(ThePlatformIE):
},
]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -122,6 +112,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return {
'id': video_id,

View File

@@ -48,6 +48,7 @@ class ChaturbateIE(InfoExtractor):
raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
'info_dict': {
'id': '1488842.1399140381',
'ext': 'mp4',
'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
'duration': 229,
'timestamp': 1459850243,
'upload_date': '20160405',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class CNBCIE(InfoExtractor):
_VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://video.cnbc.com/gallery/?video=3000503714',
'info_dict': {
'id': '3000503714',
'ext': 'mp4',
'title': 'Fighting zombies is big business',
'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
'timestamp': 1459332000,
'upload_date': '20160330',
'uploader': 'NBCU-CNBC',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
{'force_smil_url': True}),
'id': video_id,
}

View File

@@ -41,7 +41,13 @@ class ComCarCoffIE(InfoExtractor):
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
@@ -54,15 +60,14 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
'title': title,
'description': video_data.get('description'),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),

View File

@@ -22,8 +22,10 @@ from ..compat import (
compat_str,
compat_urllib_error,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
)
from ..downloader.f4m import remove_encrypted_media
from ..utils import (
NO_DEFAULT,
age_restricted,
@@ -48,6 +50,7 @@ from ..utils import (
determine_protocol,
parse_duration,
mimetype2ext,
update_Request,
update_url_query,
)
@@ -229,6 +232,24 @@ class InfoExtractor(object):
episode_number: Number of the video episode within a season, as an integer.
episode_id: Id of the video episode, as a unicode string.
The following fields should only be used when the media is a track or a part of
a music album:
track: Title of the track.
track_number: Number of the track within an album or a disc, as an integer.
track_id: Id of the track (useful in case of custom indexing, e.g. 6.iii),
as a unicode string.
artist: Artist(s) of the track.
genre: Genre(s) of the track.
album: Title of the album the track belongs to.
album_type: Type of the album (e.g. "Demo", "Full-length", "Split", "Compilation", etc).
album_artist: List of all artists appeared on the album (e.g.
"Ash Borer / Fell Voices" or "Various Artists", useful for splits
and compilations).
disc_number: Number of the disc or other physical medium the track belongs to,
as an integer.
release_year: Year (YYYY) when the album was released.
Unless mentioned otherwise, the fields should be Unicode strings.
Unless mentioned otherwise, None is equivalent to absence of information.
@@ -346,7 +367,7 @@ class InfoExtractor(object):
def IE_NAME(self):
return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None):
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
""" Returns the response handle """
if note is None:
self.report_download_webpage(video_id)
@@ -355,12 +376,14 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,))
else:
self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_str):
if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query:
url_or_request = update_url_query(url_or_request, query)
if data or headers:
url_or_request = sanitized_Request(url_or_request, data, headers or {})
if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers)
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@@ -376,7 +399,7 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg)
return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None):
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
@@ -469,7 +492,7 @@ class InfoExtractor(object):
return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None):
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
""" Returns the data of the page as a string """
success = False
try_count = 0
@@ -490,7 +513,7 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None):
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
@@ -504,7 +527,7 @@ class InfoExtractor(object):
note='Downloading JSON metadata',
errnote='Unable to download JSON metadata',
transform_source=None,
fatal=True, encoding=None, data=None, headers=None, query=None):
fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
@@ -819,7 +842,7 @@ class InfoExtractor(object):
for input in re.findall(r'(?i)<input([^>]+)>', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
continue
name = re.search(r'name=(["\'])(?P<value>.+?)\1', input)
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
if not name:
continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
@@ -989,6 +1012,11 @@ class InfoExtractor(object):
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/rg3/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
'base URL', default=None)
@@ -1021,8 +1049,6 @@ class InfoExtractor(object):
'height': int_or_none(media_el.attrib.get('height')),
'preference': preference,
})
self._sort_formats(formats)
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
@@ -1143,7 +1169,6 @@ class InfoExtractor(object):
last_media = None
formats.append(f)
last_info = {}
self._sort_formats(formats)
return formats
@staticmethod
@@ -1317,8 +1342,6 @@ class InfoExtractor(object):
})
continue
self._sort_formats(formats)
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
@@ -1329,7 +1352,7 @@ class InfoExtractor(object):
if not src or src in urls:
continue
urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({
'url': src,
@@ -1509,9 +1532,16 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth')}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
@@ -1536,7 +1566,6 @@ class InfoExtractor(object):
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats
def _live_title(self, name):

View File

@@ -57,6 +57,7 @@ class CWTVIE(InfoExtractor):
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': image['uri'],

View File

@@ -41,7 +41,9 @@ class DeezerPlaylistIE(InfoExtractor):
'Deezer said: %s' % geoblocking_msg, expected=True)
data_json = self._search_regex(
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n', webpage, 'data JSON')
(r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
webpage, 'data JSON')
data = json.loads(data_json)
playlist_title = data.get('DATA', {}).get('TITLE')

View File

@@ -17,37 +17,53 @@ class DemocracynowIE(InfoExtractor):
IE_NAME = 'democracynow'
_TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'md5': '3757c182d3d84da68f5c8f506c18c196',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': 'July 03, 2015 - Democracy Now!',
'description': 'A daily independent global news hour with Amy Goodman & Juan González "What to the Slave is 4th of July?": James Earl Jones Reads Frederick Douglass\u2019 Historic Speech : "This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag : "We Shall Overcome": Remembering Folk Icon, Activist Pete Seeger in His Own Words & Songs',
'title': 'Daily Show',
},
}, {
'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
description = self._og_search_description(webpage)
json_data = self._parse_json(self._search_regex(
r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
display_id)
video_id = None
title = json_data['title']
formats = []
default_lang = 'en'
video_id = None
for key in ('file', 'audio', 'video', 'high_res_video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
'vcodec': 'none' if key == 'audio' else None,
})
self._sort_formats(formats)
default_lang = 'en'
subtitles = {}
def add_subtitle_item(lang, info_dict):
@@ -67,22 +83,13 @@ class DemocracynowIE(InfoExtractor):
'url': compat_urlparse.urljoin(url, subtitle_item['url']),
})
for key in ('file', 'audio', 'video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
})
self._sort_formats(formats)
description = self._og_search_description(webpage, default=None)
return {
'id': video_id or display_id,
'title': json_data['title'],
'title': title,
'description': description,
'thumbnail': json_data.get('image'),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -38,6 +38,7 @@ class DFBIE(InfoExtractor):
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -63,18 +63,23 @@ class DiscoveryIE(InfoExtractor):
video_title = info.get('playlist_title') or info.get('video_title')
entries = [{
'id': compat_str(video_info['id']),
'formats': self._extract_m3u8_formats(
entries = []
for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1)),
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
} for idx, video_info in enumerate(info['playlist'])]
note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
entries.append({
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
})
return self.playlist_result(entries, display_id, video_title)

View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
remove_end,
xpath_element,
xpath_text,
)
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '840376_BQRC',
'ext': 'mp4',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
}, {
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
video_formats = []
video_root = None
mp4_video = xpath_text(metadata, './mp4video', default=None)
if mp4_video is not None:
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
video_root = mobj.group('root')
if video_root is None:
http_host = xpath_text(metadata, 'httpHost', default=None)
if http_host:
video_root = 'http://%s/' % http_host
if video_root is None:
# Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
# Works for GPUTechConf, too
video_root = 'http://s3-2u.digitallyspeaking.com/'
formats = metadata.findall('./MBRVideos/MBRVideo')
if not formats:
return None
for a_format in formats:
stream_name = xpath_text(a_format, 'streamName', fatal=True)
video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
url = video_root + video_path
vbr = xpath_text(a_format, 'bitrate')
video_formats.append({
'url': url,
'vbr': int_or_none(vbr),
})
return video_formats
def _parse_flv(self, metadata):
formats = []
akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
audios = metadata.findall('./audios/audio')
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
xml_description = self._download_xml(url, video_id)
metadata = xpath_element(xml_description, 'metadata')
video_formats = self._parse_mp4(metadata)
if video_formats is None:
video_formats = self._parse_flv(metadata)
return {
'id': video_id,
'formats': video_formats,
'title': xpath_text(metadata, 'title', fatal=True),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
}

View File

@@ -6,13 +6,18 @@ import re
import time
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
update_url_query,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
# geo restricted, via direct unsigned hls URL
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': {
'id': '1255600',
@@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': {
'id': '3172',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'flv',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650,
@@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
'age_limit': 0,
},
}, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': {
'id': '70816',
'display_id': 'season-6-episode-12',
'ext': 'flv',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563,
'timestamp': 1429696800,
'upload_date': '20150422',
'creator': 'Kanal 4',
'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor',
'season_number': 6,
'episode_number': 12,
'age_limit': 0,
},
}, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}]
@@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
def extract_formats(protocol, manifest_url):
if protocol == 'hls':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False))
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
formats.extend(m3u8_formats)
elif protocol == 'hds':
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk'):
if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({
@@ -113,11 +128,24 @@ class DPlayIE(InfoExtractor):
'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol])
else:
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS:
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
@@ -131,4 +159,5 @@ class DPlayIE(InfoExtractor):
'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,39 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DumpIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
_TEST = {
'url': 'http://www.dump.com/oneus/',
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
'info_dict': {
'id': 'oneus',
'ext': 'flv',
'title': "He's one of us.",
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
}

View File

@@ -39,13 +39,13 @@ class DWIE(InfoExtractor):
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
formats = []
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]

View File

@@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
url_basename,
)
@@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
_TESTS = [{
# http://lenta.ru/news/2015/03/06/navalny/
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
'md5': '70f5187fb620f2c1d503b3b22fd4efe3',
'md5': '881ee8460e1b7735a8be938e2ffb362b',
'info_dict': {
'id': '227304',
'ext': 'mp4',
@@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
# http://muz-tv.ru/play/7129/
# http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
'url': 'eagleplatform:media.clipyou.ru:12820',
'md5': '90b26344ba442c8e44aa4cf8f301164a',
'md5': '358597369cf8ba56675c1df15e7af624',
'info_dict': {
'id': '12820',
'ext': 'mp4',
@@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor):
raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
self._handle_error(response)
try:
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
self._handle_error(response)
raise
return response
def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
@@ -84,17 +91,30 @@ class EaglePlatformIE(InfoExtractor):
secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
formats = []
m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
formats = self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id,
'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
formats.extend(m3u8_formats)
mp4_url = self._get_video_url(
# Secure mp4 URL is constructed according to Player.prototype.mp4 from
# http://lentaru.media.eagleplatform.com/player/player.js
re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
video_id, 'Downloading mp4 JSON')
formats.append({'url': mp4_url, 'format_id': 'mp4'})
mp4_url_basename = url_basename(mp4_url)
for m3u8_format in m3u8_formats:
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
if mobj:
http_format = m3u8_format.copy()
http_format.update({
'url': mp4_url.replace(mp4_url_basename, mobj.group(1)),
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(http_format)
self._sort_formats(formats)

View File

@@ -4,10 +4,10 @@ from .common import InfoExtractor
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
'info_dict': {
'id': '83367677',
'ext': 'mp4',

View File

@@ -0,0 +1,995 @@
# flake8: noqa
from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
ACastChannelIE,
)
from .addanime import AddAnimeIE
from .adobetv import (
AdobeTVIE,
AdobeTVShowIE,
AdobeTVChannelIE,
AdobeTVVideoIE,
)
from .adultswim import AdultSwimIE
from .aenetworks import AENetworksIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
from .aol import (
AolIE,
AolFeaturesIE,
)
from .allocine import AllocineIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import (
AppleTrailersIE,
AppleTrailersSectionIE,
)
from .archiveorg import ArchiveOrgIE
from .ard import (
ARDIE,
ARDMediathekIE,
SportschauIE,
)
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCIE,
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE
from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bpb import BpbIE
from .br import BRIE
from .bravotv import BravoTVIE
from .breakcom import BreakIE
from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chaturbate import ChaturbateIE
from .chilloutzone import ChilloutzoneIE
from .chirbit import (
ChirbitIE,
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cinemassacre import CinemassacreIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
from .cnbc import CNBCIE
from .cnn import (
CNNIE,
CNNBlogsIE,
CNNArticleIE,
)
from .collegehumor import CollegeHumorIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
CrunchyrollIE,
CrunchyrollShowPlaylistIE
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
DailymotionCloudIE,
)
from .daum import (
DaumIE,
DaumClipIE,
DaumPlaylistIE,
DaumUserIE,
)
from .dbtv import DBTVIE
from .dcn import (
DCNIE,
DCNVideoIE,
DCNLiveIE,
DCNSeasonIE,
)
from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
from .dotsub import DotsubIE
from .douyutv import DouyuTVIE
from .dplay import DPlayIE
from .dramafever import (
DramaFeverIE,
DramaFeverSeriesIE,
)
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
from .drtv import DRTVIE
from .dvtv import DVTVIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE
from .dw import (
DWIE,
DWArticleIE,
)
from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE
from .echomsk import EchoMskIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE
from .eitb import EitbIE
from .ellentv import (
EllenTVIE,
EllenTVClipsIE,
)
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .escapist import EscapistIE
from .espn import ESPNIE
from .esri import EsriVideoIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .fczenit import FczenitIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .fktv import FKTVIE
from .flickr import FlickrIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxsports import FoxSportsIE
from .franceculture import (
FranceCultureIE,
FranceCultureEmissionIE,
)
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
FranceTVIE,
GenerationQuoiIE,
CultureboxIE,
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freevideo import FreeVideoIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
)
from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE
from .generic import GenericIE
from .gfycat import GfycatIE
from .giantbomb import GiantBombIE
from .giga import GigaIE
from .glide import GlideIE
from .globo import (
GloboIE,
GloboArticleIE,
)
from .godtube import GodTubeIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
from .hotnewhiphop import HotNewHipHopIE
from .hotstar import HotStarIE
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import (
IGNIE,
OneUPIE,
PCMagIE,
)
from .imdb import (
ImdbIE,
ImdbListIE
)
from .imgur import (
ImgurIE,
ImgurAlbumIE,
)
from .ina import InaIE
from .indavideo import (
IndavideoIE,
IndavideoEmbedIE,
)
from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE
from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .ivi import (
IviIE,
IviCompilationIE
)
from .ivideon import IvideonIE
from .izlesene import IzleseneIE
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .kusi import KUSIIE
from .kuwo import (
KuwoIE,
KuwoAlbumIE,
KuwoChartIE,
KuwoSingerIE,
KuwoCategoryIE,
KuwoMvIE,
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
from .leeco import (
LeIE,
LePlaylistIE,
LetvCloudIE,
)
from .libsyn import LibsynIE
from .lifenews import (
LifeNewsIE,
LifeEmbedIE,
)
from .limelight import (
LimelightMediaIE,
LimelightChannelIE,
LimelightChannelListIE,
)
from .liveleak import LiveLeakIE
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
LyndaIE,
LyndaCourseIE
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
MixcloudPlaylistIE,
MixcloudStreamIE,
)
from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .morningstar import MorningstarIE
from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
MTVIggyIE,
MTVDEIE,
)
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .muzu import MuzuTVIE
from .mwave import MwaveIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicIE,
NationalGeographicChannelIE,
)
from .naver import NaverIE
from .nba import NBAIE
from .nbc import (
CSNNEIE,
NBCIE,
NBCNewsIE,
NBCSportsIE,
NBCSportsVPlayerIE,
MSNBCIE,
)
from .ndr import (
NDRIE,
NJoyIE,
NDREmbedBaseIE,
NDREmbedIE,
NJoyEmbedIE,
)
from .ndtv import NDTVIE
from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE
from .neteasemusic import (
NetEaseMusicIE,
NetEaseMusicAlbumIE,
NetEaseMusicSingerIE,
NetEaseMusicListIE,
NetEaseMusicMvIE,
NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE,
)
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
from .nextmedia import (
NextMediaIE,
NextMediaActionNewsIE,
AppleDailyIE,
)
from .nextmovie import NextMovieIE
from .nfb import NFBIE
from .nfl import NFLIE
from .nhl import (
NHLVideocenterIE,
NHLNewsIE,
NHLVideocenterCategoryIE,
NHLIE,
)
from .nick import NickIE
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
from .novamov import (
AuroraVidIE,
CloudTimeIE,
NowVideoIE,
VideoWeedIE,
WholeCloudIE,
)
from .nowness import (
NownessIE,
NownessPlaylistIE,
NownessSeriesIE,
)
from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .noz import NozIE
from .npo import (
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
)
from .npr import NprIE
from .nrk import (
NRKIE,
NRKPlaylistIE,
NRKSkoleIE,
NRKTVIE,
)
from .ntvde import NTVDeIE
from .ntvru import NTVRuIE
from .nytimes import (
NYTimesIE,
NYTimesArticleIE,
)
from .nuvid import NuvidIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .onionstudios import OnionStudiosIE
from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import OpenloadIE
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
ORFOE1IE,
ORFFM4IE,
ORFIPTVIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .people import PeopleIE
from .periscope import PeriscopeIE
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE
from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
from .pluralsight import (
PluralsightIE,
PluralsightCourseIE,
)
from .podomatic import PodomaticIE
from .porn91 import Porn91IE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE
from .presstv import PressTVIE
from .primesharetv import PrimeShareTVIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
from .qqmusic import (
QQMusicIE,
QQMusicSingerIE,
QQMusicAlbumIE,
QQMusicToplistIE,
QQMusicPlaylistIE,
)
from .r7 import R7IE
from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import (
RaiTVIE,
RaiIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE
from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
RutubeEmbedIE,
RutubeMovieIE,
RutubePersonIE,
)
from .rutv import RUTVIE
from .ruutu import RuutuIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
SafariApiIE,
SafariCourseIE,
)
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
from .sexykarma import SexyKarmaIE
from .shahid import ShahidIE
from .shared import SharedIE
from .sharesix import ShareSixIE
from .sina import SinaIE
from .skynewsarabia import (
SkyNewsArabiaIE,
SkyNewsArabiaArticleIE,
)
from .slideshare import SlideshareIE
from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
SmotriBroadcastIE,
)
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE,
SoundcloudSearchIE
)
from .soundgasm import (
SoundgasmIE,
SoundgasmProfileIE
)
from .southpark import (
SouthParkIE,
SouthParkDeIE,
SouthParkDkIE,
SouthParkEsIE,
SouthParkNlIE
)
from .spankbang import SpankBangIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import (
SportBoxIE,
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE
from .srgssr import (
SRGSSRIE,
SRGSSRPlayIE,
)
from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
from .sunporno import SunPornoIE
from .svt import (
SVTIE,
SVTPlayIE,
)
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE
from .tapely import TapelyIE
from .tass import TassIE
from .tdslifeway import TDSLifewayIE
from .teachertube import (
TeacherTubeIE,
TeacherTubeUserIE,
)
from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE
from .theplatform import (
ThePlatformIE,
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
from .tmz import (
TMZIE,
TMZArticleIE,
)
from .tnaflix import (
TNAFlixNetworkEmbedIE,
TNAFlixIE,
EMPFlixIE,
MovieFapIE,
)
from .toggle import ToggleIE
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
)
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
from .tudou import (
TudouIE,
TudouPlaylistIE,
TudouAlbumIE,
)
from .tumblr import TumblrIE
from .tunein import (
TuneInClipIE,
TuneInStationIE,
TuneInProgramIE,
TuneInTopicIE,
TuneInShortenerIE,
)
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
)
from .tv3 import TV3IE
from .tv4 import TV4IE
from .tvc import (
TVCIE,
TVCArticleIE,
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import TvpIE, TvpSeriesIE
from .tvplay import TVPlayIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentytwotracks import (
TwentyTwoTracksIE,
TwentyTwoTracksGenreIE
)
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchPastBroadcastsIE,
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
)
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .urort import UrortIE
from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE
from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vgtv import (
BTArticleIE,
BTVestlendingenIE,
VGTVIE,
)
from .vh1 import VH1IE
from .vice import (
ViceIE,
ViceShowIE,
)
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videomega import VideoMegaIE
from .videomore import (
VideomoreIE,
VideomoreVideoIE,
VideomoreSeasonIE,
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .vidme import (
VidmeIE,
VidmeUserIE,
VidmeUserLikesIE,
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewster import ViewsterIE
from .viidea import ViideaIE
from .vimeo import (
VimeoIE,
VimeoAlbumIE,
VimeoChannelIE,
VimeoGroupsIE,
VimeoLikesIE,
VimeoOndemandIE,
VimeoReviewIE,
VimeoUserIE,
VimeoWatchLaterIE,
)
from .vimple import VimpleIE
from .vine import (
VineIE,
VineUserIE,
)
from .viki import (
VikiIE,
VikiChannelIE,
)
from .vk import (
VKIE,
VKUserVideosIE,
)
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vulture import VultureIE
from .walla import WallaIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wdr import (
WDRIE,
WDRMobileIE,
WDRMausIE,
)
from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE
from .wsj import WSJIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
from .xtube import XTubeUserIE, XTubeIE
from .xuite import XuiteIE
from .xvideos import XVideosIE
from .xxxymovies import XXXYMoviesIE
from .yahoo import (
YahooIE,
YahooSearchIE,
)
from .yam import YamIE
from .yandexmusic import (
YandexMusicTrackIE,
YandexMusicAlbumIE,
YandexMusicPlaylistIE,
)
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE
from .youjizz import YouJizzIE
from .youku import YoukuIE
from .youporn import YouPornIE
from .yourupload import YourUploadIE
from .youtube import (
YoutubeIE,
YoutubeChannelIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
YoutubeLiveIE,
YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import (
ZingMp3SongIE,
ZingMp3AlbumIE,
)
from .zippcast import ZippCastIE

View File

@@ -2,78 +2,133 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_xpath
from ..utils import (
int_or_none,
qualities,
unified_strdate,
xpath_attr,
xpath_element,
xpath_text,
xpath_with_ns,
)
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390',
'md5': '777f525feeec4806130f4f764bc18a4f',
'info_dict': {
'id': '73390',
'ext': 'mp4',
'title': 'Олимпийские канатные дороги',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 149,
'like_count': int,
'dislike_count': int,
},
'skip': 'Only works from Russia',
}, {
# single format via video_materials.json API
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
'md5': '82a2777648acae812d58b3f5bd42882b',
'info_dict': {
'id': '35930',
'ext': 'mp4',
'title': 'Наедине со всеми. Людмила Сенчина',
'description': 'md5:89553aed1d641416001fe8d450f06cb9',
'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
'description': 'md5:357933adeede13b202c7c21f91b871b2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20150212',
'duration': 2694,
},
'skip': 'Only works from Russia',
}, {
# multiple formats via video_materials.json API
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
'info_dict': {
'id': '113641',
'ext': 'mp4',
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20160407',
'duration': 179,
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
}, {
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
'md5': '519d306c5b5669761fd8906c39dbee23',
'info_dict': {
'id': '47038',
'ext': 'mp4',
'title': '"Побег". Второй сезон. 3 серия',
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20120516',
'duration': 3080,
},
}, {
'url': 'http://www.1tv.ru/videoarchive/9967',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, 'Downloading page')
# Videos with multiple formats only available via this API
video = self._download_json(
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
video_id, fatal=False)
video_url = self._html_search_regex(
r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
webpage, 'video URL')
description, thumbnail, upload_date, duration = [None] * 4
title = self._html_search_regex(
[r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
if video:
item = video[0]
title = item['title']
quality = qualities(('ld', 'sd', 'hd', ))
formats = [{
'url': f['src'],
'format_id': f.get('name'),
'quality': quality(f.get('name')),
} for f in item['mbr'] if f.get('src')]
thumbnail = item.get('poster')
else:
# Some videos are not available via video_materials.json
video = self._download_xml(
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
item = xpath_element(video, './channel/item', fatal=True)
title = xpath_text(item, './title', fatal=True)
formats = [{
'url': content.attrib['url'],
} for content in item.findall(
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
thumbnail = xpath_attr(
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
if webpage:
title = self._html_search_regex(
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or title
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
duration = self._og_search_property(
'video:duration', webpage,
'video duration', fatal=False)
like_count = self._html_search_regex(
r'title="Понравилось".*?/></label> \[(\d+)\]',
webpage, 'like count', default=None)
dislike_count = self._html_search_regex(
r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', default=None)
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'video duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'upload_date': upload_date,
'duration': int_or_none(duration),
'like_count': int_or_none(like_count),
'dislike_count': int_or_none(dislike_count),
'formats': formats
}

View File

@@ -16,6 +16,9 @@ class FOXIE(InfoExtractor):
'title': 'Official Trailer: Gotham',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
'duration': 129,
'timestamp': 1400020798,
'upload_date': '20140513',
'uploader': 'NEWA-FNG-FOXCOM',
},
'add_ie': ['ThePlatform'],
}

View File

@@ -18,8 +18,8 @@ class FoxNewsIE(AMPIE):
'title': 'Frozen in Time',
'description': '16-year-old girl is size of toddler',
'duration': 265,
# 'timestamp': 1304411491,
# 'upload_date': '20110503',
'timestamp': 1304411491,
'upload_date': '20110503',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
@@ -32,8 +32,8 @@ class FoxNewsIE(AMPIE):
'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal",
'description': "Congressman discusses president's plan",
'duration': 292,
# 'timestamp': 1417662047,
# 'upload_date': '20141204',
'timestamp': 1417662047,
'upload_date': '20141204',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {

View File

@@ -46,8 +46,8 @@ class FunnyOrDieIE(InfoExtractor):
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
webpage, 'm3u8 url', default=None, group='url')
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class GazetaIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_TESTS = [{
'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
'description': 'md5:38617526050bd17b234728e7f9620a71',
'thumbnail': 're:^https?://.*\.jpg',
},
'skip': 'video not found',
}, {
'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
'only_matching': True,
}, {
'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
'md5': '37f19f78355eb2f4256ee1688359f24c',
'info_dict': {
'id': '252048',
'ext': 'mp4',
'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
},
'add_ie': ['EaglePlatform'],
}]
def _real_extract(self, url):

View File

@@ -4,7 +4,6 @@ import re
from .common import InfoExtractor
from ..utils import (
remove_end,
HEADRequest,
sanitized_Request,
urlencode_postdata,
@@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
{
'url': 'http://gdcvault.com/play/1020791/',
'only_matching': True,
}
},
{
# Hard-coded hostname
'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '1023460',
'ext': 'mp4',
'display_id': 'Tenacious-Design-and-The-Interface',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
},
{
# Multiple audios
'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
'info_dict': {
'id': '1014631',
'ext': 'flv',
'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
},
'params': {
'skip_download': True, # Requires rtmpdump
'format': 'jp', # The japanese audio
}
},
]
def _parse_mp4(self, xml_description):
video_formats = []
mp4_video = xml_description.find('./metadata/mp4video')
if mp4_video is None:
return None
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
video_root = mobj.group('root')
formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
for format in formats:
mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
url = video_root + mobj.group('path')
vbr = format.find('bitrate').text
video_formats.append({
'url': url,
'vbr': int(vbr),
})
return video_formats
def _parse_flv(self, xml_description):
formats = []
akamai_url = xml_description.find('./metadata/akamaiHost').text
audios = xml_description.find('./metadata/audios')
if audios is not None:
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xml_description.find('./metadata/slideVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xml_description.find('./metadata/speakerVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info()
if username is None or password is None:
@@ -159,9 +128,10 @@ class GDCVaultIE(InfoExtractor):
'title': title,
}
PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/player.*?\.html.*?".*?</iframe>'
xml_root = self._html_search_regex(
r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
start_page, 'xml root', default=None)
PLAYER_REGEX, start_page, 'xml root', default=None)
if xml_root is None:
# Probably need to authenticate
login_res = self._login(webpage_url, display_id)
@@ -171,27 +141,21 @@ class GDCVaultIE(InfoExtractor):
start_page = login_res
# Grab the url from the authenticated page
xml_root = self._html_search_regex(
r'<iframe src="(.*?)player.html.*?".*?</iframe>',
start_page, 'xml root')
PLAYER_REGEX, start_page, 'xml root')
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename', default=None)
if xml_name is None:
# Fallback to the older format
xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
xml_description_url = xml_root + 'xml/' + xml_name
xml_description = self._download_xml(xml_description_url, display_id)
video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description)
if video_formats is None:
video_formats = self._parse_flv(xml_description)
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename')
return {
'_type': 'url_transparent',
'id': video_id,
'display_id': display_id,
'title': video_title,
'formats': video_formats,
'url': '%s/xml/%s' % (xml_root, xml_name),
'ie_key': 'DigitallySpeaking',
}

View File

@@ -60,6 +60,7 @@ from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
class GenericIE(InfoExtractor):
@@ -104,7 +105,8 @@ class GenericIE(InfoExtractor):
'skip_download': True, # infinite live stream
},
'expected_warnings': [
r'501.*Not Implemented'
r'501.*Not Implemented',
r'400.*Bad Request',
],
},
# Direct link with incorrect MIME type
@@ -235,6 +237,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'car-20120827-manifest',
'formats': 'mincount:9',
'upload_date': '20130904',
},
'params': {
'format': 'bestvideo',
@@ -406,19 +409,6 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# multiple ooyala embeds on SBN network websites
{
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
'info_dict': {
'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
'title': '25 lies you will tell yourself on National Signing Day - SBNation.com',
},
'playlist_mincount': 3,
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
},
# embed.ly video
{
'url': 'http://www.tested.com/science/weird/460206-tested-grinding-coffee-2000-frames-second/',
@@ -607,7 +597,11 @@ class GenericIE(InfoExtractor):
'id': 'k2mm4bCdJ6CQ2i7c8o2',
'ext': 'mp4',
'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
'uploader': 'Spi0n',
'uploader_id': 'xgditw',
'upload_date': '20140425',
'timestamp': 1398441542,
},
'add_ie': ['Dailymotion'],
},
@@ -740,8 +734,11 @@ class GenericIE(InfoExtractor):
'id': 'uxjb0lwrcz',
'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
},
},
# Soundcloud embed
@@ -992,6 +989,9 @@ class GenericIE(InfoExtractor):
'ext': 'flv',
'title': "PFT Live: New leader in the 'new-look' defense",
'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
'uploader': 'NBCU-SPORTS',
'upload_date': '20140107',
'timestamp': 1389118457,
},
},
# UDN embed
@@ -1044,6 +1044,9 @@ class GenericIE(InfoExtractor):
'title': 'SN Presents: Russell Martin, World Citizen',
'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
'uploader': 'Rogers Sportsnet',
'uploader_id': '1704050871',
'upload_date': '20150525',
'timestamp': 1432570283,
},
},
# Dailymotion Cloud video
@@ -1124,7 +1127,50 @@ class GenericIE(InfoExtractor):
# m3u8 downloads
'skip_download': True,
}
}
},
# Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
# This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
{
'url': 'https://dl.dropboxusercontent.com/u/29092637/interview.html',
'info_dict': {
'id': '4785848093001',
'ext': 'mp4',
'title': 'The Cardinal Pell Interview',
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
'uploader': 'GlobeCast Australia - GlobeStream',
'uploader_id': '2733773828001',
'upload_date': '20160304',
'timestamp': 1457083087,
},
'params': {
# m3u8 downloads
'skip_download': True,
},
},
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
'md5': '850bfe45417ddf221288c88a0cffe2e2',
'info_dict': {
'id': '030273-562_PLUS7-F',
'ext': 'mp4',
'title': 'ARTE Reportage - Nulle part, en France',
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
]
def report_following_redirect(self, new_url):
@@ -1294,6 +1340,7 @@ class GenericIE(InfoExtractor):
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
info_dict['direct'] = True
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
@@ -1320,6 +1367,7 @@ class GenericIE(InfoExtractor):
# Is it an M3U playlist?
if first_bytes.startswith(b'#EXTM3U'):
info_dict['formats'] = self._extract_m3u8_formats(url, video_id, 'mp4')
self._sort_formats(info_dict['formats'])
return info_dict
# Maybe it's a direct link to a video?
@@ -1344,15 +1392,19 @@ class GenericIE(InfoExtractor):
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
return self._parse_smil(doc, url, video_id)
smil = self._parse_smil(doc, url, video_id)
self._sort_formats(smil['formats'])
return smil
elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0])
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
info_dict['formats'] = self._parse_f4m_formats(doc, url, video_id)
self._sort_formats(info_dict['formats'])
return info_dict
except compat_xml_parse_error:
pass
@@ -1693,7 +1745,7 @@ class GenericIE(InfoExtractor):
# Look for embedded arte.tv player
mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@@ -1921,7 +1973,13 @@ class GenericIE(InfoExtractor):
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
return self.url_result(instagram_embed_url, InstagramIE.ie_key())
return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
def check_video(vurl):
if YoutubeIE.suitable(vurl):
@@ -2004,6 +2062,7 @@ class GenericIE(InfoExtractor):
entries = []
for video_url in found:
video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
@@ -2037,6 +2096,9 @@ class GenericIE(InfoExtractor):
else:
entry_info_dict['url'] = video_url
if entry_info_dict.get('formats'):
self._sort_formats(entry_info_dict['formats'])
entries.append(entry_info_dict)
if len(entries) == 1:

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class GlideIE(InfoExtractor):
@@ -15,26 +16,38 @@ class GlideIE(InfoExtractor):
'ext': 'mp4',
'title': 'Damon Timm\'s Glide message',
'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
'uploader': 'Damon Timm',
'upload_date': '20140919',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'title')
video_url = self.http_scheme() + self._search_regex(
r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL')
thumbnail_url = self._search_regex(
r'<img id="video-thumbnail" src="(.*?)"',
webpage, 'thumbnail url', fatal=False)
thumbnail = (
thumbnail_url if thumbnail_url is None
else self.http_scheme() + thumbnail_url)
r'<title>(.+?)</title>', webpage, 'title')
video_url = self._proto_relative_url(self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'video URL', default=None,
group='url')) or self._og_search_video_url(webpage)
thumbnail = self._proto_relative_url(self._search_regex(
r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'thumbnail url', default=None,
group='url')) or self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<div[^>]+class="info-date"[^>]*>([^<]+)',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '027fcc54459dff0feb0bc06a7aeda680',
'md5': '4b6db9a0a333142eb9f15913142b0ed1',
'info_dict': {
'id': '299069',
'ext': 'flv',
'title': 'DIESEL SFW XXX Video',
'thumbnail': 're:^http://.*\.jpg$',
'duration': 79,
'duration': 80,
'age_limit': 18,
}
}
@@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'age_limit': self._family_friendly_search(webpage),
'age_limit': 18,
}

View File

@@ -2,12 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_element,
xpath_text,
int_or_none,
parse_duration,
)
class GPUTechConfIE(InfoExtractor):
@@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/')
xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id)
metadata = xpath_element(doc, 'metadata')
http_host = xpath_text(metadata, 'httpHost', 'http host', True)
mbr_videos = xpath_element(metadata, 'MBRVideos')
formats = []
for mbr_video in mbr_videos.findall('MBRVideo'):
stream_name = xpath_text(mbr_video, 'streamName')
if stream_name:
formats.append({
'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
})
self._sort_formats(formats)
root_path = self._search_regex(
r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
default='http://evt.dispeak.com/nvidia/events/gtc15/')
xml_file_id = self._search_regex(
r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
return {
'_type': 'url_transparent',
'id': video_id,
'title': xpath_text(metadata, 'title'),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
'formats': formats,
'url': '%sxml/%s.xml' % (root_path, xml_file_id),
'ie_key': 'DigitallySpeaking',
}

View File

@@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
'playlist': [{
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4',
'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961,
},
}],
'params': {
'skip_download': 'HLS',
'skip_download': 'HDS',
}
}
@@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:

View File

@@ -6,6 +6,7 @@ from ..utils import (
int_or_none,
js_to_json,
unescapeHTML,
determine_ext,
)
@@ -23,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 161,
},
'skip': 'Video broken',
},
{
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
@@ -39,7 +41,7 @@ class HowStuffWorksIE(InfoExtractor):
'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm',
'info_dict': {
'id': '440011',
'ext': 'flv',
'ext': 'mp4',
'title': 'Sword Swallowing #1 by Dan Meyer',
'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
'display_id': 'sword-swallowing-1-by-dan-meyer',
@@ -63,13 +65,19 @@ class HowStuffWorksIE(InfoExtractor):
video_id = clip_info['content_id']
formats = []
m3u8_url = clip_info.get('m3u8')
if m3u8_url:
formats += self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', format_id='hls', fatal=True))
flv_url = clip_info.get('flv_url')
if flv_url:
formats.append({
'url': flv_url,
'format_id': 'flv',
})
for video in clip_info.get('mp4', []):
formats.append({
'url': video['src'],
'format_id': video['bitrate'],
'vbr': int(video['bitrate'].rstrip('k')),
'format_id': 'mp4-%s' % video['bitrate'],
'vbr': int_or_none(video['bitrate'].rstrip('k')),
})
if not formats:
@@ -102,6 +110,6 @@ class HowStuffWorksIE(InfoExtractor):
'title': unescapeHTML(clip_info['clip_title']),
'description': unescapeHTML(clip_info.get('caption')),
'thumbnail': clip_info.get('video_still_url'),
'duration': clip_info.get('duration'),
'duration': int_or_none(clip_info.get('duration')),
'formats': formats,
}

View File

@@ -4,6 +4,7 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
unified_strdate,
)
@@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549,
'upload_date': '20140124',
}
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['HTTP Error 404: Not Found'],
}
def _real_extract(self, url):
@@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
description = data.get('description')
thumbnails = []
for url in data['images'].values():
for url in filter(None, data['images'].values()):
m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m:
continue
@@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
'resolution': m.group(1),
})
formats = [{
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data.get('sources', {}).get('live', {}).items()]
formats = []
sources = data.get('sources', {})
live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
for key, url in live_sources:
ext = determine_ext(url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
})
if not formats and data.get('fivemin_id'):
return self.url_result('5min:%s' % data['fivemin_id'])

View File

@@ -12,7 +12,7 @@ from ..utils import (
class InstagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516',
@@ -38,10 +38,19 @@ class InstagramIE(InfoExtractor):
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}]
@staticmethod
def _extract_embed_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
webpage)
if mobj:
return mobj.group('url')
blockquote_el = get_element_by_attribute(
'class', 'instagram-media', webpage)
if blockquote_el is None:
@@ -53,7 +62,9 @@ class InstagramIE(InfoExtractor):
return mobj.group('link')
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = mobj.group('url')
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
@@ -152,7 +163,7 @@ class InstagramUserIE(InfoExtractor):
if not page['items']:
break
max_id = page['items'][-1]['id']
max_id = page['items'][-1]['id'].split('_')[0]
media_url = (
'http://instagram.com/%s/media?max_id=%s' % (
uploader_id, max_id))

View File

@@ -1,93 +1,91 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
compat_urllib_parse_urlencode,
)
from ..utils import (
xpath_with_ns,
determine_ext,
int_or_none,
xpath_text,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
_TEST = {
'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
'info_dict': {
'id': '452693',
'id': '194487',
'ext': 'mp4',
'title': 'SKYFALL',
'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
'duration': 152,
'title': 'KICK-ASS 2',
'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _build_json_url(query):
return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse_urlencode(cleaned_dic)
def _build_xml_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration = self._download_xml(url, video_id,
'Downloading flash configuration')
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info = self._download_xml(file_url, video_id,
'Downloading video info')
item = info.find('channel/item')
if '/player/' in url:
configuration = self._download_json(url, video_id)
def _bp(p):
return xpath_with_ns(
p,
{
'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
}
)
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
width = int(attr['width'])
bitrate = int(attr['bitrate'])
format_id = '%d-%dk' % (width, bitrate)
formats.append({
'format_id': format_id,
'url': f_url,
'width': width,
'tbr': bitrate,
})
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
self._sort_formats(formats)
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, ext='mp4', m3u8_id='hls'))
else:
a_format = {
'url': file_url,
}
if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
})
formats.append(a_format)
self._sort_formats(formats)
title = video_info['title']
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
return {
'id': video_id,
'title': item.find('title').text,
'title': title,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
'thumbnail': thumbnail,
'description': description,
}

View File

@@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺'
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
_NETRC_MACHINE = 'iqiyi'
@@ -273,6 +273,9 @@ class IqiyiIE(InfoExtractor):
'title': '灌篮高手 国语版',
},
'playlist_count': 101,
}, {
'url': 'http://www.pps.tv/w_19rrbav0ph.html',
'only_matching': True,
}]
_FORMATS_MAP = [
@@ -284,6 +287,13 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'),
]
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
}
def _real_initialize(self):
self._login()
@@ -368,12 +378,19 @@ class IqiyiIE(InfoExtractor):
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00506': # requires a VIP account
if do_report_warning:
self.report_warning('Needs a VIP account for full video')
return False
return auth_result
code = auth_result.get('code')
msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if code == 'Q00506':
if do_report_warning:
self.report_warning(msg)
return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
@@ -449,11 +466,11 @@ class IqiyiIE(InfoExtractor):
need_vip_warning_report = False
break
param.update({
't': auth_result['data']['t'],
't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['data']['u'],
'QY00001': auth_result['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param)

View File

@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
'ext': 'mp4',
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
'description': 'md5:253753e2655dde93f59f74b572454f6d',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https?://.*\.jpg',
'uploader_id': 'pelikzzle',
'timestamp': int,
'upload_date': '20140702',
@@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
'id': '17997',
'ext': 'mp4',
'title': 'Tarkan Dortmund 2006 Konseri',
'description': 'Tarkan Dortmund 2006 Konseri',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https://.*\.jpg',
'uploader_id': 'parlayankiz',
'timestamp': int,
'upload_date': '20061112',
@@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')

View File

@@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@@ -4,16 +4,15 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
video_data = jwplayer_data['playlist'][0]
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = []
for source in video_data['sources']:
@@ -35,12 +34,22 @@ class JWPlatformBaseIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -2,39 +2,63 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
js_to_json,
)
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
_VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568',
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': {
'id': '171568',
'ext': 'mp4',
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
'id': '58356',
'ext': 'flv',
'title': 'קריוקי של איזון',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
api_page_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
webpage, 'API play URL', group='url')
page_video_url = self._og_search_video_url(webpage, video_id)
config_json = compat_urllib_parse_unquote_plus(self._search_regex(
r'config=(.*)', page_video_url, 'configuration'))
api_page = self._download_webpage(api_page_url, video_id)
video_cdn_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
api_page, 'video cdn URL', group='url')
urls_info_json = self._download_json(
config_json, video_id, 'Downloading configuration',
transform_source=js_to_json)
video_cdn = self._download_webpage(video_cdn_url, video_id)
play_path = self._parse_json(
self._search_regex(
r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
video_id)['clip']['url']
url = urls_info_json['playlist'][0]['url']
settings = self._parse_json(
self._search_regex(
r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
video_id, fatal=False) or {}
servers = settings.get('servers')
if not servers or not isinstance(servers, list):
servers = ('wowzail.video-cdn.com:80/vodcdn', )
formats = [{
'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
'play_path': play_path,
'app': 'vodcdn',
'page_url': video_cdn_url,
'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
'rtmp_real_time': True,
'ext': 'flv',
} for server in servers]
return {
'id': video_id,
'title': self._og_search_title(webpage),
'url': url,
'formats': formats,
}

View File

@@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
video_id = self._search_regex(
r'/config/video/(.+?)\.xml', webpage, 'video id')
# Server returns malformed headers
# Force Accept-Encoding: * to prevent gzipped results
playlist = self._download_xml(
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
video_id, transform_source=fix_xml_ampersands)
video_id, transform_source=fix_xml_ampersands,
headers={'Accept-Encoding': '*'})
NS_MAP = {
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'

View File

@@ -26,10 +26,23 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = []
for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = {
'format': file_format['ext'],
'br': file_format.get('br', ''),
'rid': 'MUSIC_%s' % song_id,
'type': 'convert_url',
'response': 'url'
}
song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
(file_format['ext'], file_format.get('br', ''), song_id),
'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers,
)
if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -44,18 +57,13 @@ class KuwoBaseIE(InfoExtractor):
'abr': file_format.get('abr'),
})
# XXX _sort_formats fails if there are not formats, while it's not the
# desired behavior if 'IPDeny' is ignored
# This check can be removed if https://github.com/rg3/youtube-dl/pull/8051 is merged
if not tolerate_ip_deny:
self._sort_formats(formats)
return formats
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+?)'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
@@ -73,7 +81,7 @@ class KuwoIE(KuwoBaseIE):
'id': '6446136',
'ext': 'mp3',
'title': '',
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a',
'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
'creator': 'IU',
'upload_date': '20150518',
},
@@ -94,18 +102,19 @@ class KuwoIE(KuwoBaseIE):
raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex(
r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name')
singer_name = self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
webpage, 'singer name', fatal=False)
r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
singer_name = remove_start(self._html_search_regex(
r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
webpage, 'singer name', fatal=False), '歌手')
lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics
lrc_content = None
formats = self._get_formats(song_id)
self._sort_formats(formats)
album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False)
publish_time = None
@@ -259,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!',
},
'playlist_count': 30,
'playlist_mincount': 24,
}
def _real_extract(self, url):

View File

@@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'This live stream has already finished.',
}]
def _real_extract(self, url):
@@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url')
@@ -130,6 +134,7 @@ class Laola1TvIE(InfoExtractor):
formats = self._extract_f4m_formats(
'%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
video_id, f4m_id='hds')
self._sort_formats(formats)
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
determine_protocol,
parse_duration,
int_or_none,
)
@@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': {
'id': '17473',
'ext': 'flv',
'ext': 'mp4',
'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann',
'duration': 5220,
},
'params': {
# m3u8 download
'skip_download': True,
}
}
@@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)):
for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
ext = determine_ext(url)
protocol = determine_protocol({'url': url})
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id))
formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id))
formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
else:
if protocol == 'rtmp':
continue # XXX: currently broken
formats.append({
'format_id': protocol,
'url': url,
})

View File

@@ -53,6 +53,14 @@ class LiveLeakIE(InfoExtractor):
}
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
webpage)
if mobj:
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)

View File

@@ -37,6 +37,7 @@ class LRTIE(InfoExtractor):
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
webpage, 'm3u8 url', group='url')
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)

View File

@@ -28,8 +28,8 @@ class LyndaBaseIE(InfoExtractor):
return
login_form = {
'username': username.encode('utf-8'),
'password': password.encode('utf-8'),
'username': username,
'password': password,
'remember': 'false',
'stayPut': 'false'
}
@@ -219,7 +219,7 @@ class LyndaCourseIE(LyndaBaseIE):
'Course %s does not exist' % course_id, expected=True)
unaccessible_videos = 0
videos = []
entries = []
# Might want to extract videos right here from video['Formats'] as it seems 'Formats' is not provided
# by single video API anymore
@@ -229,20 +229,22 @@ class LyndaCourseIE(LyndaBaseIE):
if video.get('HasAccess') is False:
unaccessible_videos += 1
continue
if video.get('ID'):
videos.append(video['ID'])
video_id = video.get('ID')
if video_id:
entries.append({
'_type': 'url_transparent',
'url': 'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'ie_key': LyndaIE.ie_key(),
'chapter': chapter.get('Title'),
'chapter_number': int_or_none(chapter.get('ChapterIndex')),
'chapter_id': compat_str(chapter.get('ID')),
})
if unaccessible_videos > 0:
self._downloader.report_warning(
'%s videos are only available for members (or paid members) and will not be downloaded. '
% unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT)
entries = [
self.url_result(
'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'Lynda')
for video_id in videos]
course_title = course.get('Title')
return self.playlist_result(entries, course_id, course_title)

View File

@@ -13,7 +13,7 @@ from ..utils import (
class MailRuIE(InfoExtractor):
IE_NAME = 'mailru'
IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'https?://(?:www\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
_VALID_URL = r'https?://(?:(?:www|m)\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
_TESTS = [
{
@@ -61,6 +61,10 @@ class MailRuIE(InfoExtractor):
'duration': 6001,
},
'skip': 'Not accessible from Travis CI server',
},
{
'url': 'http://m.my.mail.ru/mail/3sktvtr/video/_myvideo/138.html',
'only_matching': True,
}
]

View File

@@ -47,6 +47,7 @@ class MatchTVIE(InfoExtractor):
video_url = self._download_json(request, video_id)['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title('Матч ТВ - Прямой эфир'),

View File

@@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1419047100,
'upload_date': '20141220',
'timestamp': 1450950000,
'upload_date': '20151224',
'duration': 4628,
'uploader': 'KIKA',
},
@@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', default=None, group='url').replace('\/', '/')
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', group='url').replace('\/', '/')
doc = self._download_xml(
compat_urlparse.urljoin(url, data_url), video_id)

View File

@@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
'title': 'Open: This is Face the Nation, February 9',
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
'duration': 96,
'uploader': 'CBSI-NEW',
'upload_date': '20140209',
'timestamp': 1391959800,
},
'params': {
# rtmp download

View File

@@ -11,7 +11,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
'info_dict': {
'id': '3698222',
@@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
'duration': 221,
},
}
'skip': 'Not providing trailers anymore',
}, {
'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
'info_dict': {
'id': '5740315',
'ext': 'mp4',
'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
'duration': 114,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -0,0 +1,63 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TEST = {
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗',
'description': '我是歌手第四季双年巅峰会',
'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # m3u8 download
},
}
_FORMAT_MAP = {
'标清': ('Standard', 0),
'高清': ('High', 1),
'超清': ('SuperHigh', 2),
}
def _real_extract(self, url):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data']
info = api_data['info']
formats = []
for idx, stream in enumerate(api_data['stream']):
format_name = stream.get('name')
format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
format_info = self._download_json(
stream['url'], video_id,
note='Download video info for format %s' % format_id or '#%d' % idx)
formats.append({
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4', # These are m3u8 playlists
'preference': preference,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'].strip(),
'formats': formats,
'description': info.get('desc'),
'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('thumb'),
}

View File

@@ -1,8 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
'id': '3453494717001',
'ext': 'mp4',
'title': 'The Gospel by Numbers',
'thumbnail': 're:^https?://.*\.jpg',
'upload_date': '20140410',
'description': 'Coming soon from T4G 2014!',
'uploader': 'LifeWay Christian Resources (MG)',
'uploader_id': '2034960640001',
'timestamp': 1397145591,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['TDSLifeway'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
portlets_json = self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list')
portlets = json.loads(portlets_json)
portlets = self._parse_json(self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
video_id)
pl_id = self._search_regex(
r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id')
r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
for i, portlet in enumerate(portlets):
portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
default=None)
if video_iframe_url:
surl = smuggle_url(
video_iframe_url, {'force_videoid': video_id})
return {
'_type': 'url',
'id': video_id,
'url': surl,
}
return self.url_result(
smuggle_url(video_iframe_url, {'force_videoid': video_id}),
video_id=video_id)
raise ExtractorError('Could not find video iframe in any portlets')

View File

@@ -67,6 +67,7 @@ class MiTeleIE(InfoExtractor):
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
display_id, f4m_id=loc))
self._sort_formats(formats)
title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title')

View File

@@ -1,26 +1,35 @@
from __future__ import unicode_literals
import base64
import functools
import itertools
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
HEADRequest,
OnDemandPagedList,
parse_count,
str_to_int,
)
class MixcloudIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
IE_NAME = 'mixcloud'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
'info_dict': {
'id': 'dholbach-cryptkeeper',
'ext': 'mp3',
'ext': 'm4a',
'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach',
@@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
'uploader': 'Gilles Peterson Worldwide',
'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*/images/',
'thumbnail': 're:https?://.*',
'view_count': int,
'like_count': int,
},
}]
def _check_url(self, url, track_id, ext):
try:
# We only want to know if the request succeed
# don't download the whole file
self._request_webpage(
HEADRequest(url), track_id,
'Trying %s URL' % ext)
return True
except ExtractorError:
return False
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
@staticmethod
def _decrypt_play_info(play_info):
KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
play_info = base64.b64decode(play_info.encode('ascii'))
return ''.join([
compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
for idx, ch in enumerate(play_info)])
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id)
preview_url = self._search_regex(
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url)
song_url = song_url.replace('/previews/', '/c/originals/')
if not self._check_url(song_url, track_id, 'mp3'):
song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
if not self._check_url(song_url, track_id, 'm4a'):
raise ExtractorError('Unable to extract track url')
message = self._html_search_regex(
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
webpage, 'error message', default=None)
encrypted_play_info = self._search_regex(
r'm-play-info="([^"]+)"', webpage, 'play info')
play_info = self._parse_json(
self._decrypt_play_info(encrypted_play_info), track_id)
if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = play_info['stream_url']
PREFIX = (
r'm-play-on-spacebar[^>]+'
@@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
'view_count': view_count,
'like_count': like_count,
}
class MixcloudPlaylistBaseIE(InfoExtractor):
_PAGE_SIZE = 24
def _find_urls_in_page(self, page):
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
yield self.url_result(
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
MixcloudIE.ie_key())
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
real_page_number = real_page_number or current_page + 1
return self._download_webpage(
'https://www.mixcloud.com/%s/' % path, video_id,
note='Download %s (page %d)' % (page_name, current_page + 1),
errnote='Unable to download %s' % page_name,
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
headers={'X-Requested-With': 'XMLHttpRequest'})
def _tracks_page_func(self, page, video_id, page_name, current_page):
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
for item in self._find_urls_in_page(resp):
yield item
def _get_user_description(self, page_content):
return self._html_search_regex(
r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
page_content, 'user description', fatal=False)
class MixcloudUserIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
IE_NAME = 'mixcloud:user'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.mixcloud.com/dholbach/listens/',
'info_dict': {
'id': 'dholbach_listens',
'title': 'Daniel Holbach (listens)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
list_type = mobj.group('type')
# if only a profile URL was supplied, default to download all uploads
if list_type is None:
list_type = 'uploads'
video_id = '%s_%s' % (user_id, list_type)
profile = self._download_webpage(
'https://www.mixcloud.com/%s/' % user_id, video_id,
note='Downloading user profile',
errnote='Unable to download user profile')
username = self._og_search_title(profile)
description = self._get_user_description(profile)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
self._PAGE_SIZE, use_cache=True)
return self.playlist_result(
entries, video_id, '%s (%s)' % (username, list_type), description)
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
IE_NAME = 'mixcloud:playlist'
_TESTS = [{
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
'info_dict': {
'id': 'RedBullThre3style_tokyo-finalists-2015',
'title': 'National Champions 2015',
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
},
'playlist_mincount': 16,
}, {
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
'info_dict': {
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Jazzcat on Ness Radio',
'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
},
'playlist_mincount': 23
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
playlist_id = mobj.group('playlist')
video_id = '%s_%s' % (user_id, playlist_id)
profile = self._download_webpage(
url, user_id,
note='Downloading playlist page',
errnote='Unable to download playlist page')
description = self._get_user_description(profile)
playlist_title = self._html_search_regex(
r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
profile, 'playlist title')
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
self._PAGE_SIZE)
return self.playlist_result(entries, video_id, playlist_title, description)
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
IE_NAME = 'mixcloud:stream'
_TEST = {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar',
'title': 'First Ear',
'description': 'Curators of good music\nfirstearmusic.com',
},
'playlist_mincount': 192,
}
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
prev_page_url = None
def _handle_page(page):
entries.extend(self._find_urls_in_page(page))
return self._search_regex(
r'm-next-page-url="([^"]+)"', page,
'next page URL', default=None)
next_page_url = _handle_page(webpage)
for idx in itertools.count(0):
if not next_page_url or prev_page_url == next_page_url:
break
prev_page_url = next_page_url
current_page = int(self._search_regex(
r'\?page=(\d+)', next_page_url, 'next page number'))
next_page_url = _handle_page(self._fetch_tracks_page(
'%s/stream' % user_id, user_id, 'stream', idx,
real_page_number=current_page))
username = self._og_search_title(webpage)
description = self._get_user_description(webpage)
return self.playlist_result(entries, user_id, username, description)

View File

@@ -1,110 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
'md5': '4e14f9562928aecd2e42c6f341c8feba',
'info_dict': {
'id': '8dqtk4bjbp8g',
'ext': 'mp4',
'title': 'Comedy Football 2011 - (part 1-2)',
'duration': 893,
},
},
{
'url': 'http://mooshare.biz/aipjtoc4g95j',
'info_dict': {
'id': 'aipjtoc4g95j',
'ext': 'mp4',
'title': 'Orange Caramel Dashing Through the Snow',
'duration': 212,
},
'params': {
# rtmp download
'skip_download': True,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
download_form = {
'op': 'download1',
'id': video_id,
'hash': hash_key,
}
request = sanitized_Request(
'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._sleep(5, video_id)
video_page = self._download_webpage(request, video_id, 'Downloading video page')
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
duration = int(duration_str) if duration_str is not None else None
formats = []
# SD video
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'sd',
'format': 'SD',
})
# HD video
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'hd',
'format': 'HD',
})
# rtmp video
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('rtmpurl'),
'play_path': mobj.group('playpath'),
'rtmp_live': False,
'ext': 'mp4',
'format_id': 'rtmp',
'format': 'HD',
})
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -2,39 +2,48 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import sanitized_Request
from ..utils import (
smuggle_url,
float_or_none,
parse_iso8601,
update_url_query,
)
class MovieClipsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
_TEST = {
'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597?autoPlay=true&playlistId=5',
'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597',
'md5': '42b5a0352d4933a7bd54f2104f481244',
'info_dict': {
'id': 'pKIGmG83AqD9',
'display_id': 'warcraft-trailer-1-561180739597',
'ext': 'mp4',
'title': 'Warcraft Trailer 1',
'description': 'Watch Trailer 1 from Warcraft (2016). Legendarys WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1446843055,
'upload_date': '20151106',
'uploader': 'Movieclips',
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
req = sanitized_Request(url)
# it doesn't work if it thinks the browser it's too old
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)')
webpage = self._download_webpage(req, display_id)
theplatform_link = self._html_search_regex(r'src="(http://player.theplatform.com/p/.*?)"', webpage, 'theplatform link')
title = self._html_search_regex(r'<title[^>]*>([^>]+)-\s*\d+\s*|\s*Movieclips.com</title>', webpage, 'title')
description = self._html_search_meta('description', webpage)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = next(v for v in self._parse_json(self._search_regex(
r'var\s+__REACT_ENGINE__\s*=\s*({.+});',
webpage, 'react engine'), video_id)['playlist']['videos'] if v['id'] == video_id)
return {
'_type': 'url_transparent',
'url': theplatform_link,
'title': title,
'display_id': display_id,
'description': description,
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
video['contentUrl'], {'mbr': 'true'}), {'force_smil_url': True}),
'title': self._og_search_title(webpage),
'description': self._html_search_meta('description', webpage),
'duration': float_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('dateCreated')),
'thumbnail': video.get('defaultImage'),
'uploader': video.get('provider'),
}

Some files were not shown because too many files have changed in this diff Show More