Data
For this lesson, we will use a number of different data sets. Download a zip archive containing all the files to your computer and unzip this in a known location:
You can also download the files individually:
Once you click on a file, it should be automatically downloaded to your default download directory. Some browsers may require you to right click on the link to specify the download location.
Software
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend an all-in-one installer.
For this workshop we use Python version 3.x.
For installing these packages we will use Anaconda. Anaconda is a Python distribution aimed at data science.
Download and install Anaconda. Remember to download and install the installer for Python 3.x for your platform.
You can download either the graphical or command-line installer. If you
download the command line installer, you will need to run the installer
using the sh
command. For example, if you downloaded
Anaconda3-4.4.0-MacOSX-x86_64.sh
, you would need to run the command:
sh Anaconda3-4.4.0-MacOSX-x86_64.sh
It is usually necessary to restart your shell once you’ve installed Anaconda.
Run the command:
conda install ggplot
In some cases, installing ggplot
from conda may fail with an error like:
UnsatisfiableError:The following specifications were found to be in conflict:
- ggplot -> python3.4*
- python 3.6*
In that case, try installing ggplot
with Anaconda pip
by running this command in your terminal:
pip install -U ggplot
After installing either Anaconda or Miniconda and the workshop packages, launch a Jupyter notebook by typing this command from the terminal:
jupyter notebook
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Screenshot of a Jupyter Notebook on quantum mechanics by Robert Johansson
After typing the command jupyter notebook
, the following happens:
The Jupyter Notebook server opens the Jupyter notebook client, also known as the notebook user interface, in your default web browser.
The Jupyter notebook file browser
To create a new Python notebook select the “New” dropdown on the upper right of the screen.
The Jupyter notebook file browser
When you can create a new notebook and type code into the browser, the web browser and the Jupyter notebook server communicate with each other.
A new, blank Jupyter notebook
Under the “help” menu, take a quick interactive tour of how to use the notebook. Help on Jupyter and key workshop packages is available here too.
User interface tour and Help
The web browser then displays the updated notebook to you.
For example, click in the first cell and type some Python code.
A Code cell
This is a Code cell (see the cell type dropdown with the word Code). To run the cell, type Shift-Enter.
A Code cell and its output
Let’s look at a Markdown cell. Markdown is a text manipulation language that is readable yet offers additional formatting. Don’t forget to select Markdown from the cell type dropdown. Click in the cell and enter the markdown text.
A markdown input cell
To run the cell, type Shift-Enter.
A rendered markdown cell
This workflow has several advantages:
.ipynb
.The notebook has two modes of operation: Control and Edit. Control mode lets you edit notebook level features; while, Edit mode lets you change the contents of a notebook cell. Remember a notebook is made up of a number of cells which can contain code, markdown, html, visualizations, and more.
Use the Help menu and its options when needed.