Blog/2016-05

From XOWA: the free, open-source, offline wiki application

Dev: Handling the 1 MB limit for SQLite on Android (2016-05-30 21:15 Mon)

I rediscovered an unintuitive bug today, and decided to take some time to document it.

XOWA stores all its data in SQLite databases, including images and html. Image sizes can range from a few KB to hundreds of MB. Note that storing these large images in the database works with SQLite on all three major desktop platforms: Linux, Windows, and Mac OS X.

However, SQLite on Android has a 1 MB limit for row data in a SELECT. If a row has more than 1 MB, then reading from the cursor will fail with an error like IllegalStateException: Couldn't read row 0, col 0 from CursorWindow. Make sure the Cursor is initialize before accessing data from it.

For example, consider the following setup:

  • You have a database with a table defined like this:
CREATE TABLE blob_table
( blob_id      INTEGER
, blob_column  BLOB
)
  • You insert two rows: the first with 2 bytes of data and the second with 2 MB of data.
INSERT INTO blob_table (blob_id, blob_column) VALUES (0, '01');
INSERT INTO blob_table (blob_id, blob_column) VALUES (1, '0123456789...'); -- ... represents remaining 2 MB of data
  • You then run code to retrieve both rows
  SQLiteDatabase db = SQLiteDatabase.openDatabase("/Android/data/org.xowa/files/blob_database.sqlite", null, SQLiteDatabase.OPEN_READONLY | SQLiteDatabase.NO_LOCALIZED_COLLATORS);
  Cursor cursor = db.query("blob_table", new String[] {"blob_id", "blob_column"}, null, null, null, null, null, null);
  cursor.moveToFirst();

  // this line succeeds as blob_id + blob_column has a length of 6 bytes
  int row_0_id = cursor.getInt(0);

  cursor.moveToNext();

  // this line fails b/c blob_id + blob_column has a total length > 1 MB; note that it fails, even though just the INTEGER is being requested, not the BLOB.
  int row_1_id = cursor.getInt(0);

I believe this is because the transaction buffer is limited to 1 MB.[1] The official recommendation is not to store 1 MB BLOBs in the database. Instead, the BLOB should be stored on the filesystem, and only the BLOB's url should be stored in the database. This appears to be the recommendation of SQLite[2] as well as the general sentiment of most commenters on stackoverflow[3]. Note that SQLite defines the threshold for large BLOB as 100 KB.

To a certain extent, I understand why this is the recommendation:

  • Resource constraints: An Android device may have limited memory, and there is a risk in devouring untold MB.
  • BLOBs are not relational data: Databases are for storing and querying relational data. You can't do any type of meaningful queries with BLOBs except for SELECTs
  • Efficiency: Filesystem storage will be more efficient for storing these large BLOBs. Databases pages will just intervene another layer on top of filesystem pages.

Personally though, I disagree with this recommendation:

  • Unnecessary restriction: There is no restriction to allocating a byte array of 2 MB (byte[] array = new byte[2000000]). Or adding 2 million objects to a list. Why should there be a similar one for retrieving SQLite data?
  • BLOBs are data: A good deal of database code is just "get me the data for this ID". It shouldn't matter if the data is less than 1 MB or greater than 1 MB. Especially in the SQLite world, where there isn't even a concept of field lengths for strings (varchar(1) is the same as varchar(10000))
  • Distribution complication: XOWA distributes English Wikipedia as 80 database files. Storing these images as separate files would balloon the distribution to several thousand files.

At this point, there seem to be three options to work around the 1 MB limit:

  1. Access SQLite directly via C language: This is complicated and not really something I want to try now.
  2. Use sqlite4java which supports large BLOBs: This is the same as number (1), but the work has already been done by another. Although this is promising, I'm not prepared to replace the existing SQlite bridge in XOWA: Xerial SQLite JDBC
  3. Get the length of the blob data and read the BLOB in 1 MB increments: This basically requires 3+ SELECTs.
    1. SELECT length of the blob: SELECT blob_id, length(blob_column) FROM blob_table WHERE blob_id = 0
    2. SELECT the 1st MB of the blob: SELECT substr(blob_column, 1, 1000000) FROM blob_table WHERE blob_id = 0
    3. SELECT the 2nd MB of the blob: SELECT substr(blob_column, 1000001, 1000000) FROM blob_table WHERE blob_id = 0
    4. Keep SELECTing until all data is read: Note that the last SELECT needs to substr the exact remainder of the blob. For example, a BLOB of 2.3 MB should have a final select of 300 KB: SELECT substr(blob_column, 2000001, 300000) FROM blob_table WHERE blob_id = 0

For now, I went with option #3, as it is the easiest to implement. It is slower, but fortunately BLOBs > 1 MB are far and few between.


References

  1. ^ https://developer.android.com/reference/android/os/TransactionTooLargeException.html
    The Binder transaction buffer has a limited fixed size, currently 1Mb, which is shared by all transactions in progress for the process.
    
  2. ^ http://www.sqlite.org/intern-v-extern-blob.html
    For BLOBs smaller than 100KB, reads are faster when the BLOBs are stored directly in the database file. For BLOBs larger than 100KB, reads from a separate file are faster. 
    
  3. ^

Release: NONE (2016-05-29 20:00 Sun)

This week is also a no show for releases. I'm planning to make a release next week to handle a few parser issues. I'm also going to try to have the desktop app start reading the HTML databases.

As I'm still working on the wiki downloader, image dumps outside of English Wikipedia will be on hold for a while. If you want a copy, please drop me an email, or post an issue.

Thanks.

Release: NONE (2016-05-24 11:00 Tue)

The 2016-05 English Wikipedia HTML dumps for Android are up. See https://archive.org/details/Xowa_enwiki_latest

Release: NONE (2016-05-22 20:00 Sun)

I'm skipping the release again. I had to overhaul some of the internals of the downloader which cost me a couple of days. I'm not sure if I can get it ready for next week, so it may be a June release.

I did upload the images for English Wikipedia: https://archive.org/download/Xowa_enwiki_latest/Xowa_enwiki_2016-05-01_file_v2b.7z . I started uploading the Android html version, but ran into an issue with archive.org. You can start downloading the 2016-05 html files now: https://archive.org/download/Xowa_enwiki_latest . I'll upload the rest during the week.

Release: NONE (2016-05-15 19:45 Sun)

This week is another skipped release, as the work continues on the wiki download page. In the meantime, I'm generating 2016-May English Wikipedia and will have them uploaded this week.

Release: NONE (2016-05-08 19:45 Sun)

I'm skipping the release this week as I'm still working on the wiki download tool. Hopefully it will make it into the beta next weekend.

Release: v3.5.1.1 (2016-05-01 20:25 Sun)

The desktop app is a trivial release. It has a few minor parser fixes, primarily for English Wiktionary.

The Android app has no release.

(Desktop) Minor parser fixes for English Wiktionary

There are several minor parser fixes for English Wiktionary. The most major item is a simple implementation for {{categorytree}} which will show a link to the Category page, but not do the Javascript tree expansion.

Next release: v3.5.2

I'm still working on the wiki update tool. My goal is to have it for Android v3.5.2. Once that is done, I'll upload the next batch of wikis after it (English / French).

Also, on a related note, the English Wikipedia dumps for 2016-04-07 are done. Many thanks to Ariel Glenn for fixing the issue: https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-April/001301.html . I'm processing them now, and will try to post them this week.

Namespaces

XOWA

Getting started

Android

Help

Blog

Donate