Wikipedia Tools at histori.city

This is a work in progress. Stay tuned for updates as we fill out the information available here.

In order to import so much data onto the histori.city map, we had to write a number of tools to work with such a large data set. We thought other people might find them useful, so we have tried to make that as easy as possible.

If you are looking for programmatic access to the entire set of articles from the English language Wikipedia, you've come to the right place.

Using histori.city tools, you can retrieve the contents of any Wikipedia article with a single HTTP request. You can navigate a structured object model of the article, with support for nested structures including infoboxes, plainlists, links, and more.

You can even write importers that will transform Wikipedia pages into histori.city Nexuses. You can then import these onto your histori.city map.

Using the Tools

You will need a Java 7 JDK installed. Then download the code. You will use the the run.sh (or run.bat) script to run commands, or use the library directly from your Java code.

You will also need an Amazon account and AWS access credentials to access the S3 bucket that contains the JSON files.

Retrieve an article

Using command-line
todo: add options for AWS authentication curl http://enwiki-20160204-histori-json.s3-website-us-west-2.amazonaws.com/index/$(./run.sh path 'Title of article')
In Java
todo: reference example class in source repo

Parse an article

Using command-line
todo: write WikiNode JSON-ify routine
In Java
todo: reference example class in source repo