Digital Humanities Research Institute: Binghamton 2019

Advanced Python and NLTK


Description

This small group session will continue our work with Python and Jupyter Notebooks that we started in the morning session. We will explore Python packages available for Jupyter Notebooks, and there will be examples shown of how packages can be put to use with data and corpus (like the Natural Language Toolkit).

We will also go over what the NLTK is, how to import it into Jupyter Notebooks, and its use for text analysis. 

Instructions for Importing NLTK

*NOTE* I will also have you import matplotlib as this is the package used to show the graphic visuals in the demo.

Open a new Python 3 Jupyter Notebook
In the box of the first cell, type the following in the same cell, pressing enter to add the second line:
import NLTK
import matplotlib 
Press Shift + Enter to run the cell

If nothing happens, they are installed and you are ready for the next step! If you get an error message, either you have a typo or they are not installed. If an error occurs, open a new Python 3 Jupyter Notebook and try the following (pressing Shift + Enter after each cell):

If that still does not work, open the command line and type:
conda install nltk -y
conda install matplotlib -y

Then, go back to Jupyter Notebook for the next steps.

Next, we need to install the NLTK corpus. This is very large and may take a while to download, depending on the strength of your connection.
In the next cell, type:
nltk.download()
Press Shift + Enter to run the cell

The NLTK downloader should appear. Install all of the packages as shown here:

Yours may look a little different, but the same interface. Click on the 'All / All Packages' option and then 'Download'. Once they all turn green, you can close the Downloader dialogue box.

Return to your Jupyter Notebook and type:
from nltk.book import *
Press Shift + Enter to run the cell

A list of books (as shown below) should appear:

Other Corpus Examples

Books from Project Gutenberg
Digitized Newspapers shared by Library of Congress
Donald Trump's Tweets from Trump Twitter Archive
Corpus of Contemporary American English

Additional Resources

Edward Loper & Steven Bird (May 2002) "NLTK: The Natural Language Toolkit"
(December 2017) "Installing Python Packages from a Jupyter Notebook"
NLTK Project (2019) "NLTK 3.4.1 documentation"
TechLessons[username] (April 2017) "How to Download Natural Language Toolkit NLTK for Python NLP Natural Language Processing"
Ehi Aigiomawu (May 2018) "Introduction to Matplotlib — Data Visualization in Python"
Zac Bedell (October 2018) "Writing Beautiful Code with NumPy"

This page references: