Dev/Command-line
From XOWA: the free, open-source, offline wiki application
XOWA can import a wiki using a plain text file and a command-line.
Contents
Import simple.wikipedia.org through the command-line
- Open up a command-line. For example, on Windows, run cmd
- Run the following: java -jar C:\000\200_dev\110_java\100_core\out\production\400_xowa\ --cmd_file C:\xowa_release\xowa_build.gfs --app_mode cmd
- Wait about 10 minutes for the script to complete
- Launch XOWA and enter simple.wikipedia.org in the URL bar
Import a different wiki by editing the build script
- Open the following file in a text editor: C:\xowa_release\xowa_build.gfs. See Script below for the full text.
- Replace all instances of simple.wikipedia.org with the domain name. For example, for English Wikipedia, use en.wikipedia.org
- Run the command-line import again.
- Launch XOWA and enter in the domain name in the the URL bar.
Import a wiki with a manual download
Download the wiki dump
- Navigate to https://dumps.wikimedia.org/enwiki
- Click on the latest directory
- Download the file just under "Articles, templates, media/file descriptions, and primary meta-pages.". It should read enwiki-latest-pages-articles.xml.bz2
- The download is 11+ GB and may take anywhere between 2 and 5 hours to complete.
- If you also want talk pages, you should download the "Recombine all pages, current versions only." version. It should read enwiki-latest-pages-meta-current.xml.bz2. Note that this dump is twice the size of the regular dump.
Specify location of the wiki dump
- In the build script, replace the following line:
- add ('simple.wikipedia.org', 'text.init') {src_bz2_fil = '/your_directory/simplewiki-20130103-pages-articles.xml.bz2';}
Script
// do not show a "Press enter to continue" at the end of the script
app.bldr.pause_at_end = 'n';
// run xowa.gfs
app.scripts.run_file_by_type('xowa_cfg_app');
// import wiki; for more info see [[Dev/Command-line]]
app.bldr.cmds {
// delete all files in directory; note that subdirectories and file databases ("-file.xowa") will not be deleted
add ('simple.wikipedia.org' , 'util.cleanup') {delete_all = 'y';}
// download main dump file; contains all articles
add ('simple.wikipedia.org' , 'util.download') {dump_type = 'pages-articles';}
// download categorylinks file; contains links from category to pages
add ('simple.wikipedia.org' , 'util.download') {dump_type = 'categorylinks';}
// download page_props file; contains information on hidden categories
add ('simple.wikipedia.org' , 'util.download') {dump_type = 'page_props';}
// start wiki import
add ('simple.wikipedia.org' , 'text.init');
// import articles
add ('simple.wikipedia.org' , 'text.page');
// generate search data
add ('simple.wikipedia.org' , 'text.search');
// end import
add ('simple.wikipedia.org' , 'text.term');
// import css into wiki
add ('simple.wikipedia.org' , 'text.css');
// create main category table (also mark hidden categories)
add ('simple.wikipedia.org' , 'wiki.page_props');
// create category links
add ('simple.wikipedia.org' , 'wiki.categorylinks');
// cleanup temp files; delete xml and bz2
add ('simple.wikipedia.org' , 'util.cleanup') {delete_tmp = 'y'; delete_by_match('*.xml|*.sql|*.bz2|*.gz');}
}
// run cmds
app.bldr.run;