bchtx technical document bchtx version 1.44
An way to obtain reliable serial file transfer.
Takayuki Hosoda
This program is to experiment the efficiency of error detection and cor-
rection used on rather high speed serial communication link. To obtain
error correction capability, it's making use of BCH(255,239) code.
The features of this program are, two bit error correction and three bit
error detection capability for a block(39 byte), and object data consists
of only visible seven bit ASCII charactors.
[Acknowledgement]
BCH code, named from the initial of Bose, Chaudhuri, Hocquenghem, is a
wide class of cyclic code, which include Hamming code, and its expansion.
It's a most important one of the error correction code, it can be app-
lied to wide range of code length and error correction capability.
When used at code length of some thouthand or short, It's redundancy is
relatively small refered to the other error correction code. The more
important thing is that, it can be decoded by rather easy alithmatic way.
From these reason, BCH code is used in various applications as data transfer.
[BCH code]
To obtain practical decoding speed, I'm going to use error correction table.
How to correct is, at first calculate syndrome of received block, If it's
not zero, it indicates some error. Then addressing the table by the syndrome,
we can get error position of received data block. Flipping the bit at error
position, we can get the corrected data. Because of addressing table by a
syndrome, there are restriction from the size of available memory, I might
think using 64Kword for the table may be a good trade off for the personal
computers nowadays. The table size of 64Kword suit the requirement of BCH
(255,239), two bit error correction out of 255 bit. We'll go on with this
selection.
1. a positive integer m = 8
code length n = 2^m - 1 = 2^8 - 1 = 255
2. number of error correction bit t = 2
information bit k >= 2^m - 1 - m*t = 2^8 - 1 - 8*2 = 239
number of check bit n - k = 255 - 239 = 16
3. 1) 1..2t = 1, 2, 3, 4
2) 1, 3
3) degree 8
4. Generator polynomial = 0011011110110001(0x37b1)
[Parity]
The BCH code descrived above can detect and correct upto two bit errors,
however, there is a need to detect three bit errors to abort data conversion
process. An additional parity bit on BCH code enables it to detect three bit
errors. This means to increase 'hamming distance' one.
[Data format]
Because of it's making use of BCH(255,239), the maximun number
of information code is limited to 31 byte. I'll use 29 byte for
the convinience to have an encoded block size of 39 byte, which
is about half of a line length of a general terminal. Making it
double we get an output line.
The internal BCH encoded data format is shown below.(fig.1)
(fig.1)
|<----------------------------255------------------------------->|
| |<---------------------234(39*6bit)---------------------->|
| |<------------------232(29byte)------------------>| |
|<-21->|<----------216(27byte)------->|<-------16------->|-1-|-1-|
|______|______________________________|__________________|___|___|
X______X______________________________X__________________X___X___X
| | information data | check bit | | |
| |MSB | LSB| | |
| |data[0]...............data[28]|data[29]..data[30]| | |
| | |
+--- not used padding data '0' --+ |
parity bit ---+
The BCH encoded data is encoded again to visible seven bit ASCII
charactors. The reason is to make it possible to transfer a
binary file through seven bit serial link, or through MoDem.
[Binary to text conversion]
(fig.2)
MSB LSB
7 6 5 4 3 2 1 0
_______________________________
|data[0]................|data[1]| Each 3 byte of binary(original)
|_______________ _______|_______| data will be separated to four
|........data[1]|data[2]........| charactor of 6 bit length, and
|_______|_______|_______ _______| encoded to 7 bit ASCII charac-
|data[2]|data[3]................| tor. Conversion table is shown
|_______|_______________________| below(fig.3).
|data[4]................|data[5]|
| | |
The feature of this conversion table is its easiness to decode
back to original six bit, All you have to do is to clear upper
bit.
(fig.3) | l o w e r b i t
|____________________________________
| 1 1 1 1 1 1
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
________ _____|____________________________________
| 0 | @ A B C D E F G H I J K L M N O
U |_____|____________________________________
P | 1 | P Q R S T U V W X Y Z [ \ ] ^ _
P |_____|____________________________________
E | 2 | ` a b c d e f g h i j k l m n o
R |_____|____________________________________
| 3 | 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
________|_____|____________________________________
[Text to binary conversion]
(fig.4)
MSB LSB
7 6 5 4 3 2 1 0
_______ _______________________
|xxxxxxx|data[0]................| Each four lower 6 bit data will
|_______|_______ _______________| be concatinated to three 8 bit
|xxxxxxx|data[0]| data[1].......| binary data. Upper two bit will
|_______|_______|_______ _______| be neglected.
|xxxxxxx|........data[1]|data[2]|
|_______|_______________|_______|
|xxxxxxx|................data[2]|
|_______|_______________________|
|xxxxxxx|data[3]................|
| | |
[bchtx file format]
The BCH encoded output file is consist of 2 block/line. Each line is beggining
with 'p' that is used for start-of-heading code. The reason why beggining with
'p' is as follows, note that 'p' is explained as '1110000' in binary, and any
two bit error won't cause it to be a control code ('0000000' to '00011111'),
nor DEL code ('1111111'). Using 'p' as the SOH code, we can ignore all kind of
line terminator which varies by system, i.e. LF for UNIX, CRLF for MS-DOS, CR
for OS-9.
Thus we understood we can ignore line terminator code, however, there still be
a line terminator probrem. Imagine a error causes a charactor to be a control
code, for example, a flip of 7th bit of charactor 'J' (1001010) makes LF code
(0001010), then, on some system in which a line terminator LF will translated to
CRLF. This translation causes unrecoverable error to the data. With BCH code, we
can correct bit errors, however, don't know how to delete an inserted charactor.
For the worse, LF to CR(0001101) conversion brings it to out of reach of two
bit error correction capability.
Before the first block is a comment line that shows original file name and
total number of lines. The first block is used as syncronize word to find start
of BCH encoded text, in which contains the bchtx signiture. The second block is
used as header in which contains size of the original file, header extension
flag, number of extended header. The space not used in the header block is
reserved for future use. The third to (m-1)'th block is used for extended header,
in which contains the original file name so far, the block can be expanded to
contain the file name upto 255 charactors of maximum name length. The m'th to
(n-3)'th block is main body of encoded data. (n-2)'th to (n-1)'th two block is
to contain original file's 32 bit Cyclic Redundancy Check(CRC32). This is a
variation of CCITT's CRC32 (explaned later in this file).
(fig.5)
___________________________________________________
block number | comment
______________|____________________________________
______________|____________________________________
0 | comment field
______________|____________________________________
1 | signiture word
______________|____________________________________
2 | header
______________|____________________________________
3..m-1 | extended header
______________|____________________________________
m..n-3 | main body
______________|____________________________________
n-2..n-1| CRC
______________|____________________________________
n | footer comment [EOF]
______________|____________________________________
[CRC32]
Though BCH's error correction ability is powerful, however, it could not
check the error out of the block. For this reason the other way is required
to check the integrity of decoded file. I chose CRC32 for the answer.
For the CRC32 is a bit short for the maximum file size of 2Gbyte, consi-
dered to expand CRC32 pararelly to 8 bit by 32 lengh.
The original files' CRC is calculated and added at the end of encoded
file to check the integrity at the decode end.
(fig.6)
<- 8bit ->
+----------------+
____|____ | CRC is calculated through 32
X^31 |_________| | stages simulated linier feedback
X^30 |_________| | shift registor(LFSR).
X^29 |_________| | The generator polynomial is same
X^28 |_________| | as CCITT's CRC32.
X^27 |_________| |
X^26 |____X____|-<--+ | 'Exclusive or' (Half addition) is
X^25 |_________| | | done at 'X' position.
X^24 |_________| | |
X^23 |____X____|-<--+ |
X^22 |____X____|-<--+ |
X^21 |_________| | |
X^20 |_________| | |
X^19 |_________| | |
X^18 |_________| | |
X^17 |_________| | |
X^16 |____X____|-<--+ |
X^15 |_________| | |
X^14 |_________| | |
X^13 |_________| | |
X^12 |____X____|-<--+ |
X^11 |____X____|-<--+ |
X^10 |____X____|-<--+ |
X^ 9 |_________| | |
X^ 8 |____X____|-<--+ |
X^ 7 |____X____|-<--+ |
X^ 6 |_________| | |
X^ 5 |____X____|-<--+ |
X^ 4 |____X____|-<--+ |
X^ 3 |_________| | |
X^ 2 |____X____|-<--+ |
X^ 1 |____X____|-<--+ |
X^ 0 |____X____|-<--+ |
^ | |
| | |
+------->-+ |
| |
X-<--------------+
|
input data
[Reference]
1. Hashimoto Kiyoshi:"Jyouhou fugou riron nyumon",Morikita shuppan(1984)
ISBN4-627-82050-X
2. Miyagawa Hiroshi, Harajima Hiroshi, Imai Hideki:"Jyouhou to fugou no
riron",Iwanami shoten(1983) ISBN4-00-010154