App/Import/mwad

From XOWA: the free, open-source, offline wiki application

mwad is a python script / executable by Mattze96 to generate XML dumps using the MediaWiki API

Overview: XML Dumps

XOWA is an offline wiki application for online wikis. It works by converting a MediaWiki XML dump into an .xowa sqlite3 database.

XML dumps can be obtained in the following locations:

For non-Wikimedia wikis (Wikia wikis and other wikis), the dumps may not be available or out-of-date. For example, the freespech wikia has a dump date of 2013-12-26, which is over 2 and a half years old.

For Wikia wikis, one can request an XML dump by doing the following:

  • Logging in with a user account
  • Requesting a dump through the Special:Statistics page
  • Waiting for the dump to be generated

Other wikis may require emails to the wiki's admins.

An alternative to this process is to use Mattze96's mwad: the Media Wiki Api dump

Usage

Currently mwad is available as a command-line executable and a python script.

Generating the dump

Executable

  • Open up a command prompt
cmd
  • Change to the mwad directory
cd C:\xowa\bin\windows\python\mwad
  • Run mwad with the following options
mediawiki_api_dump.win32.exe http://freespeech.wikia.com

Python script

  • Make sure you have Python 3 installed on your system
  • Open up a command prompt
cmd
  • Change to the mwad directory
cd C:\xowa\bin\any\python\mwad
  • Run mwad with the following options
python mediawiki_api_dump.py http://freespeech.wikia.com

Both cases will generate an xml file called freespeech.wikia.com-20160710-pages-articles.xml

Importing the dump

  • Create a folder called C:\xowa\wiki\freespeech.wikia.com
  • Move the xml file to C:\xowa\wiki\freespeech.wikia.com
  • Rename the file to freespeech.wikia.com.xml
  • Choose "Main Menu" -> "Tools" -> "Import Offline"
  • Change "Wiki" to "Other wiki"
  • Change "Where to get the dump" to "read from file"
  • Select the XML file by clicking "..."
  • Press "Import Now"

Depending on the wiki, the Main_Page may not be available. You can use the XOWA search bar to look for pages in the wiki.

Other notes

  • Do not run this on Wikimedia wikis. Wikimedia has strict web-crawling policies. If you run this on a Wikimedia wiki, such as en.wikipedia.org, your IP address will probably be banned and you will be unable to access Wikipedia.
  • Pay attention to the licenses for the wiki. All Wikia wikis are under a Creative Commons license for article text[1]. Other wikis may follow similiarly permissive licensing but it is your responsibility to check. If a wiki has a strict copyright license, please do not run mwad on it.
  • Web-scraping policies may get your IP banned. Different wikis may have different limits on number of articles downloaded, even through their API. If you're downloading a large wiki, you should consult first with the wiki's admins. Otherwise, your IP address may be flagged as an unauthorized web-crawler and you will be banned.


mwad usage notes

usage: mediawiki_api_dump.py [-h] [-v] [-n NAME] [-l LOG] [-c] [-x] url

Create a wiki xml-dump via api.php

positional arguments:
  url                   download url

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose level... repeat up to three times
  -n NAME, --name NAME  name of the wiki for filename etc.
  -l LOG, --log LOG     specify log-file.
  -c, --compress        compress output file with bz2
  -x, --xowa            special XOWA mode: xml to stdout, progress to stderr

Example:

./mediawiki_api_dump.py http://wiki.archlinux.org

References

  1. ^ See [1]

Namespaces

XOWA

Getting started

Android

Help

Blog

Donate