GithubHelp home page GithubHelp logo

coursera-dl / coursera-dl Goto Github PK

View Code? Open in Web Editor NEW
9.3K 326.0 2.2K 1.77 MB

Script for downloading Coursera.org videos and naming them.

License: GNU Lesser General Public License v3.0

Python 94.41% Batchfile 1.96% PowerShell 3.45% Dockerfile 0.18%
coursera-dl video downloader lectures python coursera video-downloader storage archival

coursera-dl's Introduction

Coursera Downloader

Build Status Build status Coverage Status Latest version on PyPI Code Climate

Introduction

Coursera is arguably the leader in massive open online courses (MOOC) with a selection of more than 300 classes from 62 different institutions as of February 2013. Generous contributions by educators and institutions are making excellent education available to many who could not afford it otherwise. There are even non-profits with "feet on the ground" in remote areas of the world who are helping spread the wealth (see the feedback below from Tunapanda).

This script makes it easier to batch download lecture resources (e.g., videos, ppt, etc) for Coursera classes. Given one or more class names and account credentials, it obtains week and class names from the lectures page, and then downloads the related materials into appropriately named files and directories.

Why is this helpful? A utility like wget can work, but has the following limitations:

  1. Video names have numbers in them, but this does not correspond to the actual order. Manually renaming them is a pain that is best left for computers.
  2. Using names from the syllabus page provides more informative names.
  3. Using wget in a for loop picks up extra videos which are not posted/linked, and these are sometimes duplicates.

Browser extensions like DownloadThemAll is another possibility, but coursera-dl provides more features such as appropriately named files.

This work was originally inspired in part by youtube-dl by which I've downloaded many other good videos such as those from Khan Academy.

Features

  • Support for all kinds of courses (i.e., "Old Platform"/time-based as well as "New Platform"/on-demand courses).
  • Intentionally detailed names, so that it will display and sort properly on most interfaces (e.g., VLC or MX Video on Android devices).
  • Regex-based section (week) and lecture name filters to download only certain resources.
  • File format extension filter to grab resource types you want.
  • Login credentials accepted on command-line or from .netrc file.
  • Default arguments loaded from coursera-dl.conf file.
  • Core functionality tested on Linux, Mac and Windows.

Disclaimer

coursera-dl is meant to be used only for your material that Coursera gives you access to download.

We do not encourage any use that violates their Terms Of Use. A relevant excerpt:

"[...] Coursera grants you a personal, non-exclusive, non-transferable license to access and use the Sites. You may download material from the Sites only for your own personal, non-commercial use. You may not otherwise copy, reproduce, retransmit, distribute, publish, commercially exploit or otherwise transfer any material, nor may you modify or create derivatives works of the material."

Installation instructions

coursera-dl requires Python 2 or Python 3 and a free Coursera account enrolled in the class of interest. (As of February of 2020, we test automatically the execution of the program with Python versions 2.7, Pypy, 3.6, 3.7, 3.8, and 3.9).

Note: We strongly recommend that you use a Python 3 interpreter (3.9 or later).

On any operating system, ensure that the Python executable location is added to your PATH environment variable and, once you have the dependencies installed (see next section), for a basic usage, you will need to invoke the script from the main directory of the project and prepend it with the word python. You can also use more advanced features of the program by looking at the "Running the script" section of this document.

Note: You must already have (manually) agreed to the Honor of Code of the particular courses that you want to use with coursera-dl.

Recommended installation method for all Operating Systems

From a command line (preferably, from a virtual environment), simply issue the command:

pip install coursera-dl

This will download the latest released version of the program from the Python Package Index (PyPI) along with all the necessary dependencies. At this point, you should be ready to start using it.

If this does not work, because your Python 2 version is too old (e.g. 2.7.5 on Ubuntu 14.4), try:

apt-get install python3 python3-pip
pip3 install coursera-dl

instead.

Note 1: We strongly recommend that you don't install the package globally on your machine (i.e., with root/administrator privileges), as the installed modules may conflict with other Python applications that you have installed in your system (or they can interfere with coursera-dl). Prefer to use the option --user to pip install, if you need can.

Note 2: As already mentioned, we strongly recommend that you use a new Python 3 interpreter (e.g., 3.9 or later), since Python 3 has better support for SSL/TLS (for secure connections) than earlier versions.
If you must use Python 2, be sure that you have at least Python 2.7.9 (later versions are OK).
Otherwise, you can still use coursera-dl, but you will have to install the extra package ndg-httpsclient, which may involve compilation (at least on Linux systems).

Alternative ways of installing missing dependencies

We strongly recommend that you consider installing Python packages with pip, as in it is the current preferred method, unless directed otherwise by one of the project members (for instance, when testing or debugging a new feature or using the source code directly from our git repository). If you are using pip, you can directly install all the dependencies from the requirements file using pip install -r requirements.txt.

Alternative installation method for Unix systems

We strongly recommend that you install coursera-dl and all its dependencies in a way that does not interfere with the rest of your Python installation. This is accomplished by the creation of a virtual environment, or "virtualenv".

For the initial setup, in a Unix-like operating system, please use the following steps (create/adapt first the directory /directory/where/I/want/my/courses):

cd /directory/where/I/want/my/courses
virtualenv my-coursera
cd my-coursera
source bin/activate
git clone https://github.com/coursera-dl/coursera-dl
cd coursera-dl
pip install -r requirements.txt
./coursera-dl ...

To further download new videos from your classes, simply perform:

cd /directory/where/I/want/my/courses/my-coursera
source bin/activate
cd coursera-dl
./coursera-dl ...

We are working on streamlining this whole process so that it is as simple as possible, but to support older versions of Python and to cope with Coursera disabling SSLv3, we have to take a few extra steps. In any case, it is highly recommended that you always install the latest version of the Python interpreter that you can.

ArchLinux

AUR package: coursera-dl

Installing dependencies on your own

Warning: This method is not recommended unless you have experience working with multiple Python environments.

You can use the pip program to install the dependencies on your own. They are all listed in the requirements.txt file (and the extra dependencies needed for development are listed in the requirements-dev.txt file).

To use this method, you would proceed as:

pip install -r requirements.txt
pip install -r requirements-dev.txt

The second line above should only be needed if you intend to help with development (and help is always welcome) or if a maintainer of the project asks you to install extra packages for debugging purposes.

Once again, before filing bug reports, if you installed the dependencies on your own, please check that the versions of your modules are at least those listed in the requirements.txt file (and, requirements-dev.txt file, if applicable).

Docker

If you prefer you can run this software inside Docker:

docker run --rm -it -v \
    "$(pwd):/courses" \
    courseradl/courseradl -u <USER> -p <PASSWORD>

Or using netrc file:

docker run --rm -it \
    -v "$(pwd):/courses" -v "$HOME/.netrc:/netrc" \
    courseradl/courseradl -n /netrc

The actual working dir for coursera-dl is /courses, all courses will be downloaded there if you don't specify otherwise.

Windows

python -m pip install coursera-dl

Be sure that the Python install path is added to the PATH system environment variables. This can be found in Control Panel > System > Advanced System Settings > Environment Variables.

Example:
C:\Python39\Scripts\;C:\Python39\;

Or if you have restricted installation permissions and you've installed Python under AppData, add this to your PATH.

Example:
C:\Users\<user>\AppData\Local\Programs\Python\Python39-32\Scripts;C:\Users\<user>\AppData\Local\Programs\Python\Python39-32;

Coursera-dl can now be run from commandline or powershell.

Create an account with Coursera

If you don't already have one, create a Coursera account and enroll in a class. See https://www.coursera.org/courses for the list of classes.

Running the script

Refer to coursera-dl --help for a complete, up-to-date reference on the runtime options supported by this utility.

Run the script to download the materials by providing your Coursera account credentials (e.g. email address and password or a ~/.netrc file), the class names, as well as any additional parameters:

    General:                     coursera-dl -u <user> -p <pass> modelthinking-004

    With CAUTH parameter:	 coursera-dl -ca 'some-ca-value-from-browser' modelthinking-004

If you don't want to type your password in command line as plain text, you can use the script without -p option. In this case you will be prompted for password once the script is run.

Here are some examples of how to invoke coursera-dl from the command line:

    Without -p field:            coursera-dl -u <user> modelthinking-004
    Multiple classes:            coursera-dl -u <user> -p <pass> saas historyofrock1-001 algo-2012-002
    Filter by section name:      coursera-dl -u <user> -p <pass> -sf "Chapter_Four" crypto-004
    Filter by lecture name:      coursera-dl -u <user> -p <pass> -lf "3.1_" ml-2012-002
    Download only ppt files:     coursera-dl -u <user> -p <pass> -f "ppt" qcomp-2012-001
    Use a ~/.netrc file:         coursera-dl -n -- matrix-001
    Get the preview classes:     coursera-dl -n -b ni-001
	Download videos at 720p:     coursera-dl -n --video-resolution 720p ni-001
    Specify download path:       coursera-dl -n --path=C:\Coursera\Classes\ comnetworks-002
    Display help:                coursera-dl --help

    Maintain a list of classes in a dir:
      Initialize:              mkdir -p CURRENT/{class1,class2,..classN}
      Update:                  coursera-dl -n --path CURRENT `\ls CURRENT`

Note: If your ls command is aliased to display a colorized output, you may experience problems. Be sure to escape the ls command (use \ls) to assure that no special characters get sent to the script.

Note that we do support the New Platform ("on-demand") courses.

By default, videos are downloaded at 540p resolution. For on-demand courses, the --video-resolution flag accepts 360p, 540p, and 720p values.

To download just the .txt and/or .srt subtitle files instead of the videos, use -ignore-formats mp4 --subtitle-language en or whatever format the videos are encoded in and desired languages for subtitles.

On *nix platforms, the use of a ~/.netrc file is a good alternative to specifying both your username (i.e., your email address) and password every time on the command line. To use it, simply add a line like the one below to a file named .netrc in your home directory (or the equivalent, if you are using Windows) with contents like:

    machine coursera-dl login <user> password <pass>

Create the file if it doesn't exist yet. From then on, you can switch from using -u and -p to simply call coursera-dl with the option -n instead. This is especially convenient, as typing usernames (email addresses) and passwords directly on the command line can get tiresome (even more if you happened to choose a "strong" password).

Alternatively, if you want to store your preferred parameters (which might also include your username and password), create a file named coursera-dl.conf where the script is supposed to be executed, with the following format:

    --username <user>
    --password <pass>
    --subtitle-language en,zh-CN|zh-TW
    --download-quizzes
    #--mathjax-cdn https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js
    # more other parameters

Parameters which are specified in the file will be overriden if they are provided again on the commandline.

Note: In coursera-dl.conf, all the parameters should not be wrapped with quotes.

Resuming downloads

In default mode when you interrupt the download process by pressing CTRL+C, partially downloaded files will be deleted from your disk and you have to start the download process from the beginning. If your download was interrupted by something other than KeyboardInterrupt (CTRL+C) like sudden system crash, partially downloaded files will remain on your disk and the next time you start the process again, these files will be discarded from download list!, therefore it's your job to delete them manually before next start. For this reason we added an option called --resume which continues your downloads from where they stopped:

coursera-dl -u <user> -p <pass> --resume sdn1-001

This option can also be used with external downloaders:

coursera-dl --wget -u <user> -p <pass> --resume sdn1-001

Note 1: Some external downloaders use their own built-in resume feature which may not be compatible with others, so use them at your own risk.

Note 2: Remember that in resume mode, interrupted files WON'T be deleted from your disk.

NOTE: If your password contains punctuation, quotes or other "funny characters" (e.g., <, >, #, &, | and so on), then you may have to escape them from your shell. With bash or other Bourne-shell clones (and probably with many other shells) one of the better ways to do so is to enclose your password in single quotes, so that you don't run into problems. See issue #213 for more information.

Troubleshooting

If you have problems when downloading class materials, please try to see if one of the following actions solve your problem:

  • Make sure the class name you are using corresponds to the resource name used in the URL for that class: https://www.coursera.org/learn/<CLASS_NAME>/home/welcome

  • Have you tried to clean the cached cookies/credentials with the --clear-cache option?

  • Note that many courses (most, perhaps?) may remove the materials after a little while after the course is completed, while other courses may retain the materials up to a next session/offering of the same course (to avoid problems with academic dishonesty, apparently).

    In short, it is not guaranteed that you will be able to download after the course is finished and this is, unfortunately, nothing that we can help you with.

  • Make sure you have installed and/or updated all of your dependencies according to the requirements.txt file as described above.

  • One can export a Netscape-style cookies file with a browser extension (1, 2) and use it with the -c option. This comes in handy when the authentication via password is not working (the authentication process changes now and then).

  • If results show 0 sections, you most likely have provided invalid credentials (username and/or password in the command line or in your .netrc file or in your coursera-dl.conf file).

  • For courses that have not started yet, but have had a previous iteration sometimes a preview is available, containing all the classes from the last course. These files can be downloaded by passing the --preview parameter.

  • If you get an error like Could not find class: <CLASS_NAME>, then:

    • Verify that the name of the course is correct. Current class names in coursera are composed by a short course name e.g. class and the current version of the course (a number). For example, for a class named class, you would have to use class-001, class-002 etc.
    • Second, verify that you are enrolled in the course. You won't be able to access the course materials if you are not officially enrolled and agreed to the honor course via the website.
  • If:

    • You get an error when using -n to specify that you want to use a .netrc file and,

    • You want the script to use your default netrc file and,

    • You get a message saying coursera-dl: error: too few arguments

      Then you should specify -- as an argument after -n, that is, -n -- or change the order in which you pass the arguments to the script, so that the argument after -n begins with an hyphen (-). Otherwise, Python's argparse module will think that what you are passing is the name of the netrc file that you want to use. See issue #162.

  • If your password has spaces, don't forget to write it using quotes.

  • Have you installed the right project ?

    Warning: If you installed the script using PyPi (pip) please verify that you installed the correct project. We had to use a different name in pip because our original name was already taken. Remember to install it using:

        pip install coursera-dl
    

China issues

If you are from China and you're having problems downloading videos, adding "52.84.167.78 d3c33hcgiwev3.cloudfront.net" in the hosts file (/etc/hosts) and freshing DNS with "ipconfig/flushdns" may work (see https://github.com/googlehosts/hosts for more info).

Found 0 sections and 0 lectures on this page

First of all, make sure you are enrolled to the course you want to download.

Many old courses have already closed enrollment so often it's not an option. In this case, try downloading with --preview option. Some courses allow to download lecture materials without enrolling, but it's not common and is not guaranteed to work for every course.

Finally, you can download the videos if you have, at least, the index file that lists all the course materials. Maybe your friend who is enrolled could save that course page for you. In that case use the --process_local_page option.

Alternatively you may want to try this various browser extensions designed for this problem.

If none of the above works for you, there is nothing we can do.

Download timeouts

Coursera-dl supports external downloaders but note that they are only used to download materials after the syllabus has been parsed, e.g. videos, PDFs, some handouts and additional files (syllabus is always downloaded using the internal downloader). If you experience problems with downloading such materials, you may want to start using external downloader and configure its timeout values. For example, you can use aria2c downloader by passing --aria option:

coursera-dl -n --path . --aria2  <course-name>

And put this into aria2c's configuration file ~/.aria2/aria2.conf to reduce timeouts:

connect-timeout=2
timeout=2
bt-stop-timeout=1

Timeout configuration for internal downloader is not supported.

Windows: proxy support

If you're on Windows behind a proxy, set up the environment variables before running the script as follows:

set HTTP_PROXY=http://host:port
set HTTPS_PROXY=http://host:port

Related discussion: #205

Windows: Failed to create process

In C:\Users\<user>\AppData\Local\Programs\Python\Python39-32\Scripts or wherever Python installed (above is default for Windows) edit below file in idle: (right click on script name and select 'edit with idle in menu)

coursera-dl-script

from

#!c:\users\<user>\appdata\local\programs\python\python39-32\python.exe

to

#"!c:\users\<user>\appdata\local\programs\python\python39-32\python.exe"

(add quotes). This is a known pip bug.

Source: issue #500 StackOverflow

SSLError: [Errno 1] _ssl.c:504: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure

This is a known error, please do not report about this error message! The problem is in YOUR environment. To fix it, do the following:

sudo apt-get install build-essential python-dev libssl-dev libffi-dev
pip install --user urllib3 pyasn1 ndg-httpsclient pyOpenSSL

If the error remains, try installing coursera-dl from github following this instruction: https://github.com/coursera-dl/coursera-dl#alternative-installation-method-for-unix-systems

If you still have the problem, please read the following issues for more ideas on how to fix it: #330 #377 #329

This is also worth reading: https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning

Alternative CDN for MathJax.js

When saving a course page, we enabled MathJax rendering for math equations, by injecting MathJax.js in the header. The script is using a cdn service provided by mathjax.org. However, that url is not accessible in some countries/regions, you can provide a --mathjax-cdn <MATHJAX_CDN> parameter to specify the MathJax.js file that is accessible in your region.

Reporting issues

Before reporting any issue please follow the steps below:

  1. Verify that you are running the latest version of the script, and the recommended versions of its dependencies, see them in the file requirements.txt. Use the following command if in doubt:

     pip install --upgrade coursera-dl
    
  2. If the problem persists, feel free to open an issue in our bugtracker, please fill the issue template with as much information as possible.

Filing an issue/Reporting a bug

When reporting bugs against coursera-dl, please don't forget to include enough information so that you can help us help you:

  • Is the problem happening with the latest version of the script?
  • What operating system are you using?
  • Do you have all the recommended versions of the modules? See them in the file requirements.txt.
  • What is the course that you are trying to access?
  • What is the precise command line that you are using (feel free to hide your username and password with asterisks, but leave all other information untouched).
  • What are the precise messages that you get? Please, use the --debug option before posting the messages as a bug report. Please, copy and paste them. Don't reword/paraphrase the messages.

Feedback

I enjoy getting feedback. Here are a few of the comments I've received:

  • "Thanks for the good job! Knowledge will flood the World a little more thanks to your script!"
    Guillaume V. 11/8/2012

  • "Just wanted to send you props for your Python script to download Coursera courses. I've been using it in Kenya for my non-profit to get online courses to places where internet is really expensive and unreliable. Mostly kids here can't afford high school, and downloading one of these classes by the usual means would cost more than the average family earns in one week. Thanks!"
    Jay L., Tunapanda 3/20/2013

  • "I am a big fan of Coursera and attend lots of different courses. Time constraints don't allow me to attend all the courses I want at the same time. I came across your script, and I am very happily using it! Great stuff and thanks for making this available on Github - well done!"
    William G. 2/18/2013

  • "This script is awesome! I was painstakingly downloading each and every video and ppt by hand -- looked into wget but ran into wildcard issues with HTML, and then.. I came across your script. Can't tell you how many hours you've just saved me :) If you're ever in Paris / Stockholm, it is absolutely mandatory that I buy you a beer :)"
    Razvan T. 11/26/2012

  • "Thanks a lot! :)"
    Viktor V. 24/04/2013

Contact

Please, post bugs and issues on github. Please, DON'T send support requests privately to the maintainers! We are quite swamped with day-to-day activities. If you have problems, PLEASE, file them on the issue tracker.

coursera-dl's People

Contributors

arlimus avatar asiviero avatar azizmb avatar balta2ar avatar charlesbickel avatar federicoceratto avatar felker avatar filosottile avatar github-john-doe avatar iemejia avatar jonasdt avatar jplehmann avatar leonidvasilyev avatar mcarpenter avatar meejah avatar miguelmalvarez avatar moiseslodeiro avatar mxamin avatar opsxcq avatar rbrito avatar rranelli avatar rsdcastro avatar rteslaru avatar santosh avatar swechha avatar thegoddessinari avatar victorwestmann avatar vladistan avatar vojnovski avatar wiedi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coursera-dl's Issues

Download failing when video is missing

John,

One of the Science Writing videos cannot be loaded. When the script hits this video it errors and stops and does not download the later videos. Here is the error.

Great tool. I use it all the time.

Best,
Vivek

SCIWRITE-2012-001_04_Unit_4/07_4.7-_Upcoming_Writing_and_Editing_Assignment.mp4
Downloading https://class.coursera.org/sciwrite-2012-001/lecture/download.mp4?le
cture_id=59 -> SCIWRITE-2012-001_04_Unit_4/07_4.7-Upcoming_Writing_and_Editing
Assignment.mp4
Traceback (most recent call last):
File "coursera-dl", line 308, in
main()
File "coursera-dl", line 302, in main
args.lecture_filter
File "coursera-dl", line 193, in download_lectures
download_file(url, lecfn, cookies_file, wget_bin)
File "coursera-dl", line 203, in download_file
download_file_nowget(url, fn, cookies_file)
File "coursera-dl", line 219, in download_file_nowget
urlfile = get_opener(cookies_file).open(url)
File "/usr/lib/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 435, in error
return self._call_chain(_args)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(_args)
File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

HTTP Error 400 for new-courses.

I was trying to download new course 'Internet History, Technology and Security' and got this error:

Traceback (most recent call last):
File "./coursera-dl", line 235, in
main()
File "./coursera-dl", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl", line 50, in get_page
return opener.open(url).read()
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

Problem downloading course files, suspect not making use of http_proxy

Using latest version of script to access dataanalysis-001 lectures
Get
searlernz:~/coursera/data_analysis/lectures$ python ../../coursera-master/coursera/coursera_dl.py -u username -p pass --curl_bin /usr/bin/curl --debug dataanalysis-001
root[main] Downloading class: dataanalysis-001
Traceback (most recent call last):
File "../../coursera-master/coursera/coursera_dl.py", line 709, in
main()
File "../../coursera-master/coursera/coursera_dl.py", line 703, in main
download_class(args, class_name)
File "../../coursera-master/coursera/coursera_dl.py", line 667, in download_class
or tmp_cookie_file, args.local_page)
File "../../coursera-master/coursera/coursera_dl.py", line 225, in get_syllabus
page = get_page(url, cookies_file)
File "../../coursera-master/coursera/coursera_dl.py", line 201, in get_page
ret = opener.open(url).read()
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 438, in error
result = self._call_chain(_args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 625, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

My http_proxy environment variable is set and I can access the course index URL from firefox without difficulty.
Fails with or without --curl_bin option.

Error in downloading!

$ coursera-dl algo2-2012-001 -u sarthaksahu****@gmail.com -p xxx
usage: coursera_dl.py [-h](-c COOKIES_FILE | -u USERNAME | -n) [-p PASSWORD]
[-f FILE_FORMATS] [-sf SECTION_FILTER]
[-lf LECTURE_FILTER] [-w WGET_BIN] [--curl_bin CURL_BIN]
[--aria2_bin ARIA2_BIN] [-o] [-l LOCAL_PAGE]
[--skip-download] [--path PATH] [--verbose-dirs]
[--debug] [--quiet] [--add-class ADD_CLASS]
class_names [class_names ...]
coursera_dl.py: error: too few arguments

Newbie problem

This is surely exposing my extreme lack of experience with such things, but I have two problems with running this wonderful script:

  1. I get this error message when running the module... have I missed where to input my login info/ download command?

"usage: Python batch downloader.py [-h](-c COOKIES_FILE | -u USERNAME | -n)
[-p PASSWORD] [-f FILE_FORMATS]
[-sf SECTION_FILTER] [-lf LECTURE_FILTER]
[-w WGET_BIN] [-o] [-l LOCAL_PAGE]
[--skip-download]
class_name
Python batch downloader.py: error: too few arguments

Traceback (most recent call last):
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 309, in
main()
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 289, in main
args = parseArgs()
File "D:/Documents/Desktop/Coursera/Python batch downloader.py", line 272, in parseArgs
args = parser.parse_args()
File "C:\Python27\lib\argparse.py", line 1688, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "C:\Python27\lib\argparse.py", line 1720, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "C:\Python27\lib\argparse.py", line 1937, in parse_known_args
self.error(
('too few arguments'))
File "C:\Python27\lib\argparse.py", line 2347, in error
self.exit(2, _('%s: error: %s\n') % (self.prog, message))
File "C:\Python27\lib\argparse.py", line 2335, in exit
_sys.exit(status)
SystemExit: 2"

  1. When typing the command in the python shell (after running the module):
    coursera-dl -u <(with my email)> -p <(with my passowrd> progfun-2012-001

it says syntax error at the "@" of my email.

I'm sure I'm missing something obvious, but have nonetheless spent too much time (admittedly randomly) trying different ways of making this work?

Thank you so much for your help!

Tashi

Probably bad cookies file (or wrong class name)

A related issue is here: https://github.com/jplehmann/coursera/issues/74. Sorry for the duplicate.

Coursera-dl previously worked , and I've downloaded part of the data analysis course already. Tried to continue

downloading today. Got the following error.
sudo python coursera-dl dataanalysis-001 -u ***** -p *****
Downloading class: dataanalysis-001
Downloaded http://class.coursera.org/dataanalysis-001/lecture/index (5332 bytes)
Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

Using newest coursera-dl script running & I tried also to download the innovation-001 course with the same error.

Please help.

Not all resources downloaded: directory name files are skipped

Steps to reproduce:

  1. Example: Goto the Electric Engineering course from professor Don H. Johnson at Rice University

    https://class.coursera.org/eefun-001/lecture/index

  2. You will see that there are several files to download without a file name

E.g. for week 1 there are this 8 files which are not downloaded:

http://cnx.org/content/m0000/latest/

http://cnx.org/content/m0001/latest/

http://cnx.org/content/m0003/latest/

http://cnx.org/content/m0004/latest/

http://cnx.org/content/m0008/latest/

http://cnx.org/content/m0081/latest/

http://cnx.org/content/m0005/latest/

http://cnx.org/content/m0006/latest/

  1. This are valid HTML web pages, which can be downloaded
    (E.g. just open any of this 8 URLs in your browser (e.g. Firefox or Microsoft Internet Explorer),
    and that will open successfully that HTML web page)
  2. but latest downloaded version (Sunday 10 March 2013) of coursera-dl.py
    does not download them.
  3. Command line similar to

python.exe coursera_dl.py -u yourusername -p yourpassword eefun-001

  1. Result: E.g. in week 1 there are 24 files to download, it downloads only 16 files. Skipping exactly this 8 files which have no filename but only a directory.
  2. But expected was: Also this 8 files should possibly be downloaded

Thanks

Password complexity issue

When I first tried this script with the -u and -p args I get an error:

bash: !an < rest of the password >: event not found

When I tried with the .netrc file I get a bad cookie or wrong credentials error, even though they were right. This is when I wanted to post the issue but I went through the code and found these lines:

if args.username and not args.password and not args.netrc:
  args.password = getpass.getpass("Coursera password for %s: " % args.username)

so I just entered my username via -u and the name of the class without my password, I got the prompt for the password, after entering it the download started normally. So I guess there is a problem with parsing the complex passwords - I'm not much of a coder so I'm not sure what exactly is an issue, hopefully you guys will know and improve this script!

Good script, it's a life savior! Cheers! :)

Cookies Load error with Mac/Chrome

Got this error on mac:

$ ./coursera-dl saas -c cookies.txt
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py", line 99, in _really_load
{})
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/cookielib.py", line 739, in init
if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1334338629.053531'

_warn_unhandled_exception()
Traceback (most recent call last):
File "./coursera-dl", line 235, in
main()
File "./coursera-dl", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl", line 49, in get_page
opener = get_opener(cookies_file)
File "./coursera-dl", line 44, in get_opener
cj._really_load(cookies, "StringIO.cookies", False, False)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/_MozillaCookieJar.py", line 111, in _really_load
(filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'developers.google.com\tFALSE\t/\tFALSE\t1334338629.053531\tsessionid\t60b9268cf45b27ee8af338942880ef36'

No Lectures Found for Current Class Error

Hi I get following error when I run following command:

./coursera-dl pgm -c cookies.txt

Output:

Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

The pgm i.e. Probabilistic Graphical Models class is currently going on and one can even preview some of the lectures here: https://class.coursera.org/pgm/lecture/preview

I have a valid coursera account (however I could not enroll in the class as I got late. Hence this business of downloading the videos). I am not sure about the cookie error and why I get it.

I used the firefox extension to create the cookies.txt file.

Please respond.

Thanks.

Crash if file not accessible

The Modelthinking course has file(s?) that can't be read. This gives an exception, and the whole download aborts.

I added a catch all clause after line 159 in method download_file(..) to change this.

159 sys.exit()
+160 except:
+161 print "\nXXXX Didnt work -- Removing partial file:", fn

Thanks for the downloader! This was a big help.

on Windows: non-wget download creates bad files

On Windows 7, the default python download code creates video files which are large than they should be (and of course don't play).

Current workaround is to use a wget binary with the -w option.

Inconsistent output written to stdout/stderr

It used to be that everything was written to stdout. Now some things are written to stdout (like the number of bytes being downloaded), while the line with the filename of what is being downloaded is written to stderr. I'm not sure why the change was made. It seemed more consistent when everything went to stdout.

cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'Name: csrf_token'

vijayram@ubuntu:~/coursera/coursera-1$ ./coursera-dl.py -c ../class.coursera.org_csrf_token.txt nlp
/usr/lib/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
File "/usr/lib/python2.7/_MozillaCookieJar.py", line 71, in _really_load
line.split("\t")
ValueError: need more than 1 value to unpack

_warn_unhandled_exception()
Traceback (most recent call last):
File "./coursera-dl.py", line 235, in
main()
File "./coursera-dl.py", line 220, in main
page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
File "./coursera-dl.py", line 56, in get_syllabus
page = get_page(url, cookies_file)
File "./coursera-dl.py", line 49, in get_page
opener = get_opener(cookies_file)
File "./coursera-dl.py", line 44, in get_opener
cj._really_load(cookies, "StringIO.cookies", False, False)
File "/usr/lib/python2.7/_MozillaCookieJar.py", line 111, in _really_load
(filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'Name: csrf_token'

Support download annotated pdf files.

Many lectures offer annotated pdf file and non-annotated file at the same time. So it would be very much useful that downloader can download both of them.

New feature - PDFs of quizes and other materials

This is clearly a new feature (that would be nice I think). Currently, to keep copies of the quizes, I go in with Chrome and then print with "save to PDF". I do the same with the class syllabus, and other materials. It would be nice if this could be automated in this program. I know someone that did a version of this in a different python coursera downloader and added functionality using wkhtmltopdf to convert html to pdf format. They would find the quizes, download them as html files and then do the conversion. Unfortunately, I found that wkhtmltopdf blew up (threw an exception) on my windows box. It would be nice if it would also pdf the syllabus, etc. One last thing to point out (should you decide to do this), the announcements (aka "home") page typically changes at least once per week, so it might be good to recreate it every time.

Not being able to download any video

I am facing problems downloading any video. The following is the
error that I receive:
Traceback (most recent call last):
File "coursera-dl", line 1, in
coursera/coursera_dl.py
NameError: name 'coursera' is not defined
I have downloaded the latest version of couresera-dl. The problem does not seem to go
away. I am giving it the right password and the right username. Can someone tell me what I am doing wrong? Thank you.

Regards,

Ramana

Password on command line potentially insecure

Password on command line may be visible system-wide in process listing and may be written to user's shell history.

Better to allow password prompted from terminal rather than just exiting if not supplied.

/usr/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!

log is

/usr/lib/python2.6/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib/python2.6/_MozillaCookieJar.py", line 99, in _really_load
    {})
  File "/usr/lib/python2.6/cookielib.py", line 738, in __init__
    if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1349111445.24318'

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "./coursera-dl", line 235, in <module>
    main()
  File "./coursera-dl", line 220, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "./coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "./coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "./coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib/python2.6/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org\tFALSE\t/\tFALSE\t1349111445.24318\tsessionid\t80b3f5ab0bf5fe0e19c7383606de7072'

"IndexError: list index out of range" error running coursera-dl

The following error consistently arises when running the program for several courses:

python coursera/coursera-dl compfinance-002
Downloading class: compfinance-002
Downloaded http://class.coursera.org/compfinance-002/lecture/index (208676 bytes)
Introduction
  Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
Week_1-_Time_Value_of_Money
  1.0_Week_1_Introduction
Week_1-_Simple_Returns
  1.1_Future_Value_Present_Value_and_Compounding
  1.2_Asset_Returns
  1.3_Portfolio_Returns
  1.4_Dividends
  1.5_Inflation
  1.6_Annualizing_Returns
Week_1-_Continuously_Compounded_Returns
  1.7_Continuously_Compounded_Returns
  1.8_CC_Portfolio_Returns_and_Inflation
Week_1-_Excel_Examples
  1.9_Simple_Returns
  1.10_Getting_Financial_Data_from_Yahoo
  1.11_Return_Calculations
  1.12_Growth_of_1
Week_2-_Probability_Review
  2.0_Week_2_Introduction
  2.1_Univariate_Random_Variables
  2.2_Cumulative_Distribution_Function
  2.3_Quantiles
  2.4_Standard_Normal_Distribution
  2.5_Expected_Value_and_Standard_Deviation
  2.6_General_Normal_Distribution
  2.7_Standard_Deviation_as_a_Measure_of_Risk
  2.8_Normal_Distribution-_Appropriate_for_simple_returns
  2.9_Skewness_and_Kurtosis
  2.10_Students-t_Distribution
  2.11_Linear_Functions_of_Random_Variables
Week_2-_Example
  2.12_Value_at_Risk
Traceback (most recent call last):
  File "coursera/coursera-dl", line 709, in <module>
    main()
  File "coursera/coursera-dl", line 703, in main
    download_class(args, class_name)
  File "coursera/coursera-dl", line 671, in download_class
    or tmp_cookie_file, args.reverse)
  File "coursera/coursera-dl", line 277, in parse_syllabus
    section_name = clean_filename(stag.contents[0].contents[1])
IndexError: list index out of range

not all resources downloaded

If multiple files in resource are of same extension then they are not downloaded, only the last one gets downloaded.
probably a bug in parse_syllabus

Please look into it

A question about python code about extracting videos from coursera.

I have written a python code about extracting videos from coursera.But codes below can not be used.
It raises error "urllib.error.HTTPError: HTTP Error 403: FORBIDDEN"
I know jplehmann / coursera is a popular code for coursera and hope you can help me.
Thank you very much!

login_page = "https://www.coursera.org/account/signin"
def set_cookie(username,password):
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor())
    values = {"signin-email":username,
              "signin-password":password,
              "login:":"Login"}
    data = urllib.parse.urlencode(values)
    binary_data = data.encode(encoding='utf-8', errors='strict')
    headers = {"User-Agent":"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6"}
    req = urllib.request.Request(login_page,binary_data,headers)
    opener.open(req)

    with open("1.txt",encoding='utf-8',mode='w') as record_file:
        op = opener.open("https://www.coursera.org")
        record_file.write(op.read().decode('utf-8'))

Index Out Of Range Error

I get the following error running the latest code to date with this command:
./coursera-dl compfinance-2012-001 -u -p
(where user and pw are filled in)

Downloaded http://class.coursera.org/compfinance-2012-001/lecture/index (162511 bytes)
Introduction
Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
None https://class.coursera.org/compfinance-2012-001/lecture/31
Traceback (most recent call last):
File "./coursera-dl", line 308, in
main()
File "./coursera-dl", line 292, in main
sections = parse_syllabus(page, args.cookies_file or tmp_cookie_file)
File "./coursera-dl", line 145, in parse_syllabus
href = grab_hidden_video_url(a['data-lecture-view-link'], cookies_file)
File "./coursera-dl", line 87, in grab_hidden_video_url
return l[0]['src']
IndexError: list index out of range
*

Chrome's `cookie.txt export` plugin does not produce usable cookie file.

I've solved my issue by using a different cookie export plugin in Firefox. Copy-and-pasting from the Chrome plugin does not produce a usable file, even when tabs are preserved.

mike@*****:/*****$ ./coursera-dl/coursera-dl -c cookies.txt ml
/usr/lib/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib/python2.7/_MozillaCookieJar.py", line 99, in _really_load
    {})
  File "/usr/lib/python2.7/cookielib.py", line 739, in __init__
    if expires is not None: expires = int(expires)
ValueError: invalid literal for int() with base 10: '1344973754.473932'

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "./coursera-dl/coursera-dl", line 235, in <module>
    main()
  File "./coursera-dl/coursera-dl", line 220, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "./coursera-dl/coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "./coursera-dl/coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "./coursera-dl/coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib/python2.7/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org\tFALSE\t/\tFALSE\t1344973754.473932\tsessionid\t28eeaeea129425a90f0f08bfff38ea38'

-n broken on windows

I have a version from January 16 and it works fine. However, the current version generates the following error:

coursera_dl.py: error: argument -n/--netrc: expected one argument

I'm executing:

python coursera_dl.py somecourse -n

I don't know why it would expect an argument.

HOME is set to my user directory and there is a .netrc file there (which is why the January 16 version works). The only thing I'm changing is the version of coursera_dl.py. I don't know python, so I haven't looked at the issue.

[Question] Does the script overwrite already downloaded video?

Hi,

I might be using the script multiple times on the same course, for instance when a new week of videos are put online. Will the script skip already downloaded sections or will it try to download from the start?

Script worked great to get me proglang videos :) Thanks a ton.

Getting urllib2.URLError: <urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol>

Traceback (most recent call last):
File "./coursera-dl.py", line 235, in
main()
File "./coursera-dl.py", line 231, in main
args.lecture_filter
File "./coursera-dl.py", line 145, in download_lectures
download_file(url, lecfn, cookies_file, wget_bin)
File "./coursera-dl.py", line 155, in download_file
download_file_nowget(url, fn, cookies_file)
File "./coursera-dl.py", line 171, in download_file_nowget
urlfile = get_opener(cookies_file).open(url)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol>
vijayram@ubuntu:~/coursera/coursera-1$

Downloaded videos are all 7579 bytes

I'm running Python2.7.3 on Arch Linux. Everything works fine, and I can download the other files (pdf, pptx), but the mp4 files are all unplayable.

Here's what the console is showing

ALGO_01_I._INTRODUCTION/01_Introduction_-_Why_Study_Algorithms_.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=20 ->  ALGO_01_I._INTRODUCTION/01_Introduction_-_Why_Study_Algorithms_.mp4
7579 bytes read .
ALGO_01_I._INTRODUCTION/02_About_the_Course.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=21 ->  ALGO_01_I._INTRODUCTION/02_About_the_Course.mp4
7579 bytes read .
ALGO_01_I._INTRODUCTION/03_Merge_Sort-_Motivation_and_Example.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=1 ->   ALGO_01_I._INTRODUCTION/03_Merge_Sort-_Motivation_and_Example.mp4
7578 bytes read .
ALGO_01_I._INTRODUCTION/04_Merge_Sort-_Pseudocode.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=2 ->     ALGO_01_I._INTRODUCTION/04_Merge_Sort-_Pseudocode.mp4
7578 bytes read .
ALGO_01_I._INTRODUCTION/05_Merge_Sort-_Analysis.mp4
Downloading https://class.coursera.org/algo/lecture/download.mp4?lecture_id=3 ->     ALGO_01_I._INTRODUCTION/05_Merge_Sort-_Analysis.mp4

They're all the same size, and I can't figure out why..

Downloading Issue

Following is the error I get when I download any course. Please let me know if anyone has any idea. I am using Python 2.6.6.

compmethods-2012-001\31_Week_10-Lecture_28-_Global_normal_forms_of_bifurcatio
n_structures_in_PDEs\04_W10_L28_P4-reduction_of_a_neuro-sensory_systems.srt alre
ady downloaded
Traceback (most recent call last):
File "./coursera_dl.py", line 820, in
main()
File "./coursera_dl.py", line 810, in main
if download_class(args, class_name):
File "./coursera_dl.py", line 790, in download_class
args.verbose_dirs,
File "./coursera_dl.py", line 454, in download_lectures
if time.time() - last_update > datetime.timedelta(days=30).total_seconds():
AttributeError: 'datetime.timedelta' object has no attribute 'total_seconds
_

downloads mp4 str txt, doesnt download pdf

https://class.coursera.org/hetero-2012-001

H:\_learning>cd H:\_learning\hetero-2012-001
Downloaded http://class.coursera.org/hetero-2012-001/lecture/index (19738 bytes)
Week_1_Section_1
   Lecture_0-_Course_Overview
     None https://class.coursera.org/hetero-2012-001/lecture/3
     txt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=3_en&format=txt
     srt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=3_en&format=srt
     mp4 https://class.coursera.org/hetero-2012-001/lecture/download.mp4?lecture_id=3
   Lecture_1.1-_Introduction_to_Heterogeneous_Parallel_Programming
     None https://class.coursera.org/hetero-2012-001/lecture/9
     txt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=9_en&format=txt
     srt https://class.coursera.org/hetero-2012-001/lecture/subtitles?q=9_en&format=srt
     mp4 https://class.coursera.org/hetero-2012-001/lecture/download.mp4?lecture_id=9

and so on

SSL problem with downloading videos (only) from nlp-class

Hi,

I am unable to download videos from the nlp course website. I have tried recreating cookies, changing browsers but nothing worked. Pasting the backtrace below:

Downloaded http://class.coursera.org/nlp/lecture/index (174982 bytes)
Week_1_-_Course_Introduction
   Course_Introduction
     None https://class.coursera.org/nlp/lecture/view?lecture_id=124 
     pptx https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pptx
     pdf https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pdf
     txt https://class.coursera.org/nlp/lecture/subtitles?q=124_en&format=txt
     srt https://class.coursera.org/nlp/lecture/subtitles?q=124_en&format=srt
     mp4 https://class.coursera.org/nlp/lecture/download.mp4?lecture_id=124

(trimmed)

   Evaluating_Search_Engines
     None https://class.coursera.org/nlp/lecture/view?lecture_id=190 
     pptx https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2F05-02-09-IR-EvalSearchEngines-abridged.pptx
     pdf https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2F05-02-09-IR-EvalSearchEngines-abridged.pdf
     mp4 https://class.coursera.org/nlp/lecture/download.mp4?lecture_id=190
Found 19 sections and 87 lectures on this page
NLP_01_Week_1_-_Course_Introduction/01_Course_Introduction.pptx
Downloading https://d19vezwu8eufl6.cloudfront.net/nlp/slides%2Fintro.pptx -> NLP_01_Week_1_-_Course_Introduction/01_Course_Introduction.pptx
Traceback (most recent call last):
  File "/home/abhinav/development/coursera/coursera-dl", line 235, in <module>
    main()
  File "/home/abhinav/development/coursera/coursera-dl", line 231, in main
    args.lecture_filter
  File "/home/abhinav/development/coursera/coursera-dl", line 145, in download_lectures
    download_file(url, lecfn, cookies_file, wget_bin)
  File "/home/abhinav/development/coursera/coursera-dl", line 155, in download_file
    download_file_nowget(url, fn, cookies_file)
  File "/home/abhinav/development/coursera/coursera-dl", line 171, in download_file_nowget
    urlfile = get_opener(cookies_file).open(url)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:504: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

on Ubuntu: it doesn't like the cookies file produced by Export Cookies FF extension

Using the cookies.txt file saved by the Export Cookies FF extension I get:
$ ./coursera-dl saas -c ./cookies.txt
Downloaded http://class.coursera.org/saas/lecture/index (14530 bytes)
Found 0 sections and 0 lectures on this page
Probably bad cookies file (or wrong class name)

However, if i download the index with wget using the same cookies file and then pass the -w parameter to coursera-dl, it downloads happily, so I think something is wrong in the handling of cookies in coursera-dl.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

ii python 2.7.2-7ubuntu2
ii python-argparse 1.1-1ubuntu1
ii python-beautifulsoup 3.2.0-2

Authentication via cookies doesn't work anymore

Hi.

It seems that coursera has changed its site now requiring a session cookie and the trick of exporting cookies from the browser doesn't work anymore.

OTOH, using wiedi/coursera@38c92a2 make things work again.

Well, I actually pulled all of @wiedi's patches, but reverted the ones that tweaked the naming of the files, as I prefer how things currently are. :)

BTW, have you considered getting the code in our youtube-dl tree?

Regards.

Unable to download anything (authentication failure)

It appears that the Coursera folk have changed where you access the videos again. It appears that it can login ok, but it finds "0 sections and 0 lectures". I tried both with the netrc and with explicitly giving my username and password.

invalid netscape format cookies file

Attempting to download the NLP videos with cookies.txt from the chrome extension, I get:

/usr/lib64/python2.7/_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
  File "/usr/lib64/python2.7/_MozillaCookieJar.py", line 71, in _really_load
    line.split("\t")
ValueError: need more than 1 value to unpack

  _warn_unhandled_exception()
Traceback (most recent call last):
  File "/home/andy/bin/coursera-dl", line 198, in <module>
    main()
  File "/home/andy/bin/coursera-dl", line 193, in main
    page = get_syllabus(args.class_name, args.cookies_file, args.local_page)
  File "/home/andy/bin/coursera-dl", line 56, in get_syllabus
    page = get_page(url, cookies_file)
  File "/home/andy/bin/coursera-dl", line 49, in get_page
    opener = get_opener(cookies_file)
  File "/home/andy/bin/coursera-dl", line 44, in get_opener
    cj._really_load(cookies, "StringIO.cookies", False, False)
  File "/usr/lib64/python2.7/_MozillaCookieJar.py", line 111, in _really_load
    (filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'StringIO.cookies': 'www.coursera.org    FALSE   /nlp    FALSE   1335746269  csrf_token  Tdh8Cj1qQGZ4AD7N7VWZ'

It doesn't download videos

Hello,

I'm using openSUSE 11.4 to download the courses with the following arguments:
python coursera-dl compfinance-2012-001 -u -p

What works and what doesn't (for me)

  • Works: Successfully parses and downloads .srt and .txt files.
  • Doesn't works: It doesn't download the videos.

log:
Downloaded http://class.coursera.org/compfinance-2012-001/lecture/index (75353 bytes)
Introduction
Welcome_to_Introduction_to_Computational_Finance_and_Financial_Econometrics
None https://class.coursera.org/compfinance-2012-001/lecture/31
Week_1-_Time_Value_of_Money
1.0_Week_1_Introduction
None https://class.coursera.org/compfinance-2012-001/lecture/29
1.1_Future_Value_Present_Value_and_Compounding
None https://class.coursera.org/compfinance-2012-001/lecture/13
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=13_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=13_en&format=srt
Week_1-_Simple_Returns
1.2_Asset_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/3
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=3_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=3_en&format=srt
1.3_Portfolio_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/12
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=12_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=12_en&format=srt
1.4_Dividends
None https://class.coursera.org/compfinance-2012-001/lecture/6
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=6_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=6_en&format=srt
1.5_Inflation
None https://class.coursera.org/compfinance-2012-001/lecture/11
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=11_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=11_en&format=srt
1.6_Annualizing_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/2
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=2_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=2_en&format=srt
Week_1-_Continuously_Compounded_Returns
1.7_Continuously_Compounded_Returns
None https://class.coursera.org/compfinance-2012-001/lecture/5
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=5_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=5_en&format=srt
1.8_CC_Portfolio_Returns_and_Inflation
None https://class.coursera.org/compfinance-2012-001/lecture/4
txt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=4_en&format=txt
srt https://class.coursera.org/compfinance-2012-001/lecture/subtitles?q=4_en&format=srt
... etc ...
Found 12 sections and 56 lectures on this page
COMPFINANCE-2012-001_02_Week_1-_Time_Value_of_Money/02_1.1_Future_Value_Present_Value_and_Compounding.txt
COMPFINANCE-2012-001_02_Week_1-_Time_Value_of_Money/02_1.1_Future_Value_Present_Value_and_Compounding.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/01_1.2_Asset_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/01_1.2_Asset_Returns.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/02_1.3_Portfolio_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/02_1.3_Portfolio_Returns.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/03_1.4_Dividends.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/03_1.4_Dividends.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/04_1.5_Inflation.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/04_1.5_Inflation.srt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/05_1.6_Annualizing_Returns.txt
COMPFINANCE-2012-001_03_Week_1-_Simple_Returns/05_1.6_Annualizing_Returns.srt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/01_1.7_Continuously_Compounded_Returns.txt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/01_1.7_Continuously_Compounded_Returns.srt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/02_1.8_CC_Portfolio_Returns_and_Inflation.txt
COMPFINANCE-2012-001_04_Week_1-_Continuously_Compounded_Returns/02_1.8_CC_Portfolio_Returns_and_Inflation.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/01_1.9_Simple_Returns.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/01_1.9_Simple_Returns.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/02_1.10_Getting_Financial_Data_from_Yahoo.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/02_1.10_Getting_Financial_Data_from_Yahoo.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/03_1.11_Return_Calculations.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/03_1.11_Return_Calculations.srt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/04_1.12_Growth_of_1.txt
COMPFINANCE-2012-001_05_Week_1-_Excel_Examples/04_1.12_Growth_of_1.srt

Any help would be very appreciated.

Thanks a lot, I think this idea is just great! Congrats!

Supply a template or txt file with course names for easy lookup

When I tried to use the otherwise awesome script I had to go and lookup all the names I wanted from the course list. So I just made a little txt file with the url handle and the name of the course, which I could then easily copy into the command line.
Perhaps it would be an idea to maintain a list of all the courses?

Past courses

  • neuralnets-2012-001 Neural Networks for Machine Learning
  • sciwrite-2012-001 Writing in the Sciences
  • progfun-2012-001 Functional Programming Principles in Scala
  • maththink-2012-001 Introduction to Mathematical Thinking
  • bigdata-2012-001 Web Intelligence and Big Data
  • healthpolicy-2012-001 Health Policy and the Affordable Care Act
  • intrologic Introduction to Logic
  • compilers Compilers
  • automata Automata
  • gametheory Game Theory
  • crypto Cryptography I

Current courses (possibly incomplete)

  • algo2-2012-001 Algorithms: Design and Analysis, Part 2
  • thinkagain-2012-001 Think Again: How to Reason and Argue
  • hetero-2012-001 Heterogeneous Parallel Programming
  • compmethods-2012-001 Computational Methods for Data Analysis
  • precalculus-001 Pre-Calculus
  • algebra-001 Algebra
  • proglang-2012-001 Programming Languages
  • calcsing-2012-001 Calculus in a Single Variable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.