Setting up Coursera Downloader in a Python 3.4 virtual environment on OS X 10.9

I have been procrastinating about studying ... it's hard work. Especially when there is another Game of Thrones episode to watch.

I want to learn statistics and apply it to data analysis. The software I want to use to do this is Python and R running on OS X

For me the best way to do this now is through online classes. The class I have chosen is through Coursera and John Hopkins University.

The course is called The Data Scientist’s Toolbox

As an aside I finished the course in less time then it took me to write this article. If only the future courses were as easy.

First step is to satisfy my anal obsession with downloading all the course material. In the past I would download each file manually thus give me a sense of achieving something without actually having to think. But since this is a data analysis course ... I should let the computer do all the hard lifting (or clicking)

If your lazy, just get the chrome extension I haven't used it, but most chrome extensions are pretty good. If you like Python, then Coursera Downloader is the way to go.

From a fresh install of OS X 10.9

  1. Install Python 3.4.1 from python.org or use a package manager like homebrew

These two links are a good start if you don't want to use the official OS X binaries from python.org

http://hackercodex.com/guide/mac-osx-mavericks-10.9-configuration/

http://hackercodex.com/guide/python-development-environment-on-mac-osx/

My current setup uses the official python binaries, but when I buy a new machine I might give homebrew a go.

  1. Open terminal window and create a folder for your virtual environments. For me that will be ~/virtualenv
pwd
/Users/stephen
mkdir virtualenv
cd virtualenv/
pwd
/Users/stephen/virtualenv

Create the virtual environment for coursera using the Python 3 utility pyvenv:

pyvenv coursera

Active the environment

source ~/coursera/bin/activate
(coursera)
python -V
Python 3.4.1

If the environment is active:

  • The name of the environment, (coursea) in brackets should be before each BASH command line
  • Confirm that Python 3 is running and not Python 2.7

Download the ZIP file for the Coursera Downloader from github

extract the ZIP file to the virtual environment ==> ~/virtualenv/coursera and you will see ~/virtualenv/coursera/coursera-master

Now that the python program is in the virtual environment folder, we just need to install the dependancies and we are ready to go.

cd ~/virtualenv/coursera/coursera-master
pip install -r requirements.txt

Using the PIP command included with Python 3 it PIP INSTALLs all the python modules in requirements.txt

beautifulsoup4>=4.1.3
coverage>=3.7
html5lib>=1.0b2
nose>=1.3.0
requests>=2.2.1
six>=1.3.0

Run the coursera-dl python program with your username, password and the name of the course

Coursera username: stephen@gmail.com

Coursera password: secret password

course name: datascitoolbox-004

[59] 00:22:39--> ./coursera-dl -u stephen@gmail.com -p 'secret password' datascitoolbox-004

Note: if the password is a single word you don't have to enclose it with single quotes, i.e. not 'password'

Downloading class: datascitoolbox-004
Starting new HTTPS connection (1): class.coursera.org
Starting new HTTPS connection (1): accounts.coursera.org
Logged in on accounts.coursera.org.
Starting new HTTPS connection (1): class.coursera.org
Found authentication cookies.
Downloaded https://class.coursera.org/datascitoolbox-004/lecture/index (81474 bytes)
Week_1
    Series_Motivation
    The_Data_Scientists_Toolbox
    Getting_Help
    Finding_Answers
    R_Programming_Overview
    Getting_Data_Overview
    Exploratory_Data_Analysis_Overview
    Reproducible_Research_Overview
    Statistical_Inference_Overview
    Regression_Models_Overview
    Practical_Machine_Learning_Overview
    Building_Data_Products_Overview
    Installing_Rstudio
    Installing_Outside_Software_on_Mac
    Install_R_on_a_Mac
    Installing_R_on_Windows
Week_2
    Tips_from_Coursera_Users_-_Optional_Video
    Command_Line_Interface
    Introduction_to_Git
    Introduction_to_Github
    Creating_a_Github_Repository
    Basic_Git_Commands
    Basic_Markdown
    Installing_R_Packages
    Installing_Rtools
Week_3
    Types_of_Questions
    What_is_Data
    What_About_Big_Data
    Experimental_Design
Found 3 sections and 29 lectures on this page
Downloading: datascitoolbox-004/01_Week_1/01_Series_Motivation.mp4
Downloading https://class.coursera.org/datascitoolbox-004/lecture/download.mp4?lecture_id=5 -> datascitoolbox-004/01_Week_1/01_Series_Motivation.mp4
Starting new HTTPS connection (1): d28rh4a8wq0iu5.cloudfront.net
[##################################################] 100%           9.60MB at 917.60KB/s
Downloading: datascitoolbox-004/01_Week_1/01_Series_Motivation.txt
Downloading https://class.coursera.org/datascitoolbox-004/lecture/subtitles?q=5_en&format=txt -> datascitoolbox-004/01_Week_1/01_Series_Motivation.txt
[##################################################] 100%           13.14KB at 49.51KB/s

. .. ... lots more of this .. .

Downloading: datascitoolbox-004/03_Week_3/04_Experimental_Design.pdf
Downloading https://d396qusza40orc.cloudfront.net/datascitoolbox/lecture_slides/03_04_experimentalDesign.pdf -> datascitoolbox-004/03_Week_3/04_Experimental_Design.pdf
[##################################################] 100%           3.72MB at 630.55KB/s

Alternative:

./coursera-dl -n-- --path=/Volumes/training/Coursera getdata-004

The -n-- option reads the password from ~/.netrc

~/.netrc
machine coursera-dl login stephen@gmail.com password 'secret password'

The --path specifies the output directory