Yet another reverse engineering blog

Friday, November 23, 2007

Embiid Publishing

Short History

Embiid Publishing was an early e-pub company which started back in 2000. They published some midlist SF and Romance titles, most famous probably being Liaden series by Sharon Lee and Steve Miller. They offered nice prices ($5 in average), and free sampler bundles. At first their books were Windows-only, later they started to offer Rocket format and a book reader for Palm OS. In 2006 the company closed doors, leaving customers with books they could not convert.

The Reader

The Windows reader program could read two formats: UBK and EBK. The former was slightly scrambled but could be read by any reader. The latter was encrypted with a personalized key and could only be read by the personalized reader executable downloadable with the first purchase.
The Reader was written in Delphi and had pretty basic functionality: changeable font, bookmarks, navigation.



File format details

A pseudo-C description of the file header looks like following:
struct EmbiidFile {
/* 00 */ int32 file_seed; //the seed for decrypting header fields
/* 04 */ char type[5]; //file type (encrypted w/ file_seed)

#define FTYPE_UBK "Valid" //non-personalized (text encrypted with file_seed)
#define FTYPE_EBK "EBook" //personalized (text encrypted with user_seed)

/* 09 */ uint32 cover_off; //offset of the cover image (jpeg image)
/* 0D */ uint32 cover_len; //length of the cover image data
/* 11 */ byte version; //format version (encrypted w/ file_seed)

#define CURRENT_VERSION 1

/* 12 */ char title[50]; //book title (encrypted w/ file_seed), space-padded
/* 44 */ char author[954]; //book author (encrypted w/ file_seed), space-padded
/* 3FE */ uint16 nchapters; //number of chapters
/* 400 */ uint32 chap_lens[256]; //chapter lengths
/* 800 */ char book_text[]; //text of the book. UBK: encrypted with file_seed, EBK: encrypted with user_seed
}


The encryption uses a 1024-byte array to xor the data with. The array is initialized from the seed using a pseudo-random number generator. Here's pseudocode for its generation:
float a = seed/1000.0
for(int i=1;i<0x400;i++)
{
float b = int(a/127773);
float c = a - b*127773;
a = c*16807 - b*2836;
if (a<0) a+=2147483647;
xor_buf[i] = int(a/2147483647*256)&0xFF;
}


The decryption uses the file offset of the data to index the array, and it skips bytes that would decrypt to 0x1A (the EOF symbol):

xor_val = xor_buf[file_offset%0x400];
val_out = val_in^xor_val;
if (val_out==0x1A)
val_out = val_in;


While the file_seed is stored directly in the file, user_seed is calculated as Adler32 checksum of a 128-byte user ID, which is stored directly in the personalized EmbiidReader.exe.

t1 = 1;
sum = 0;
i = 0;
do{
t1 = (t1 + user_id[i++]) % 0xFFF1;
sum = (t1 + sum) % 0xFFF1;
}
while ( i <0x80 );
user_seed = (sum<<16) | t1;


The text of the book uses a small subset of HTML tags for formatting, but the paragraphs are delimited by newlines, not <br> or <p> tags.

Here's a small Python script to convert an Embiid book to HTML. A valid EmbiidReader.exe is necessary to decrypt personalized books.
Google Pages
You will need Python to run it.
Place your books, EmbiidReader.exe and embiid.py into the same directory and execute from command prompt:
embiid.py <book.ebk>
You should get a <book.html> file with decoded text.

Thursday, November 22, 2007

Welcome

Hello, visitor.
I am a long-time reverse engineering hobbyist. Most of that time I've been "working for myself" but now I've decided to share some of my findings. My reverse engineering interests include decompilation, file formats and interoperability (and more). I've written a two-part article on Visual C++ reversing for OpenRCE.org: 1, 2.
Right now I'm "into" eBook readers, so the next few posts will probably be on eBook formats.