Advanced Pandas: Setup

Data

For this lesson, we will use a number of different data sets. Download these files to your computer and put them in a location that you can find again later. (Some browsers may require you to right click on the link to specify the download location.)

Note: this data set is 1.6GB!

Software

For this workshop we use Python version 3.x.

Required Python Packages for this workshop

Install the workshop packages

For installing these packages we will use Anaconda. Anaconda is a Python distribution aimed at data science.

Anaconda installation

Download and install Anaconda. Remember to download and install the installer for Python 3.x for your platform.

You can download either the graphical or command-line installer. If you download the command line installer, you will need to run the installer using the sh command. For example, if you downloaded Anaconda3-4.4.0-MacOSX-x86_64.sh, you would need to run the command:

sh Anaconda3-4.4.0-MacOSX-x86_64.sh

It is usually necessary to restart your shell once you’ve installed Anaconda.

Editing Python Scripts

In addition to the Python packages, you will also need access to a text editor or a development environment for Python scripts. Nano is a good option for anyone not used to text editing. See these instructions for how to install Nano on Windows/Mac/Linux. If you have another editor you’d rather use that is fine also.

Another alternative is to use a Python development environment such as PyDev. Here are instructions on how to install PyDev, but it’s use is beyond the scope of this tutorial.

Launch a Jupyter notebook

After installing either Anaconda or Miniconda and the workshop packages, launch a Jupyter notebook by typing this command from the terminal:

jupyter notebook

The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.


Overview of the Jupyter notebook (Optional)

Example Jupyter Notebook
Screenshot of a Jupyter Notebook on quantum mechanics by Robert Johansson

How the Jupyter notebook works

After typing the command jupyter notebook, the following happens:

This workflow has several advantages:

How the notebook is stored

Notebook modes: Control and Edit

The notebook has two modes of operation: Control and Edit. Control mode lets you edit notebook level features; while, Edit mode lets you change the contents of a notebook cell. Remember a notebook is made up of a number of cells which can contain code, markdown, html, visualizations, and more.

Help and more information

Use the Help menu and its options when needed.