Dev/Command-line/Site meta

From XOWA: the free, open-source, offline wiki application

XOWA can download the metadata for the Wikimedia wikis

Background

Wikimedia exposes an API for accessing the meta-data for a given wiki. For example, for English Wikipedia, the following will return most of the meta-data around the wiki installation.

https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general|namespaces|statistics|interwikimap|namespacealiases|specialpagealiases|libraries|extensions|skins|magicwords|functionhooks|showhooks|extensiontags|protocols|defaultoptions|languages

XOWA can call this API to download metadata for each wiki and save them in a database for data-processing. XOWA uses this info to resolve namespaces, but it will also incorporate other metadata from this API in future releases.

Process

Assuming you are on a Windows system with XOWA installed at C:\xowa

  • Create a plain text-file called "C:\xowa\build_site_meta.gfs"
  • Save the following text to the file:
app.bldr.pause_at_end_('n');
app.scripts.run_file_by_type('xowa_cfg_app');
app.bldr.cmds {

  // NOTE: wiki doesn't matter; just use any wiki name that is on your system
  add('simple.wikipedia.org', 'util.site_meta') {

    // path of the database to generate; default is C:\xowa\bin\any\xowa\cfg\wiki\site_meta.sqlite3
    db_url = 'C:\xowa\site_meta__enwiki.sqlite3';

    // skip any wikis which have been downloaded after this time. default is now() - 1 day
    // the purpose of this argument is to avoid recalling the api if it's already been called recently.
    // for example, if the script runs for 800 wikis and fails for 3 wikis,
    // you can rerun the script again and it will only download the 3 failed ones; not all 800
    cutoff_time = '2015-07-01';

    // list of wikis to download; note that each wiki must be separated by a new-line. default is all wikis listed in [[Dashboard/Import/Online]]
    wikis  = 
'en.wikipedia.org
en.wiktionary.org';
  }
}
app.bldr.run;
  • Run the file with the following:
java -jar xowa_windows.jar --app_mode cmd --cmd_file C:\xowa\build_site_meta.gfs
  • Open C:\xowa\site_meta__enwiki.sqlite3 in a sqlite shell and run the following:
SELECT * FROM site_statistic;

Namespaces

XOWA

Getting started

Android

Help

Blog

Donate