Efficient data storage when recording GPS signals with the GN3S v.2 receiver
One issue with gnuradio-companion being used for recording samples from the
(now discontinued)
GN3S v2 dongle
is the size of the resulting file. The dongle outputs 1-bit samples (0 or 1, saved as -1 or 1
for a result with average value close to 0, better suited for cross-correlation), but gnuradio-companion
(using the driver available from gnss-sdr)
wants to save floating-point complex values (8-bytes for each complex I/Q sample). I tried
reducing to 4-bytes/IQ by converting to interleaved ishort, and then to 2-bytes by manually
splitting to two characters, but that is still 16 times the needed size for data storage. I
want to backup data on CDs (the 700 MB storage medium), so anything bigger than this is a hassle.
1-minute worth of samples recorded at 8.1838 MS/s is 3.92 GB using floats, 982 MB using interleaved
chars, but only 123 MB when using single bit data storage. I know disks are cheap, but long-term
backup is not, so why waste space.
Matlab (unlike octave as of version 3.8.2) is able to read single-bit values, so here are a couple
of very simple programs for converting interleaved
shorts and interleaved chars to bit-streams
ready to be used in Matlab. The awkward conversion from char to bits reading the imaginary
part first is due to the layout of my flowchart: I have put the
imaginary part on the top position of the interleave block and real part on the bottom,
so that the real part is in the even samples and the imaginary part in the odd ones.
One way of reading such values in GNU/Octave might be through the bitget function after reading
single bytes, but I have not attempted to do as much
Function File: c = bitget (A, n)
Return the status of bit(s) n of the unsigned integers in A.
The least significant bit is n = 1.
bitget (100, 8:-1:1)
⇒ 0 1 1 0 0 1 0 0
See also: bitand, bitor, bitxor, bitset, bitcmp, bitshift, bitmax.
In Matlab, reading N-bit samples is as simple as
data=fread(fid,inf,'bitN') or data=fread(fid,inf,'ubitN')
Of course, manipulating data at the bit value means hitting the endianness issue. By
generating a subset of the recorded samples
$ head -c 128 150719_gn3s_ishort.bin > test_ishort.bin
$ xxd test_ishort.bin
0000000: ffff ffff ffff ffff 0100 ffff 0100 ffff ................
0000010: 0100 0100 0100 0100 0100 0100 0100 0100 ................
0000020: 0100 0100 0100 0100 0100 0100 0100 0100 ................
0000030: 0100 0100 0100 0100 0100 0100 ffff 0100 ................
0000040: ffff 0100 ffff 0100 ffff 0100 0100 0100 ................
0000050: 0100 ffff 0100 ffff 0100 ffff 0100 ffff ................
0000060: 0100 0100 0100 0100 ffff 0100 ffff 0100 ................
0000070: ffff 0100 ffff ffff ffff ffff ffff ffff ................
we see that since the x86 PC I am using is little endian, all +1 values appear as 0100, while -1 is
alway 0xFF(FF), whatever the endianness.
Converting such an interleaved short dataset to bits shrinks the size 16-fold as seen below, and the
gain is (obviously) 8-fold from interleaved chars to bits:
$ ./short2bit test_ishort.bin
total 3354736
-rw-r--r-- 1 jmfriedt jmfriedt 1053532160 Jul 19 10:54 150719_gn3s_ichar.bin
-rw------- 1 jmfriedt jmfriedt 131691520 Jul 19 20:51 150719_gn3s_ichar2ibit1.bin
-rw-r--r-- 1 jmfriedt jmfriedt 2273165312 Jul 19 10:44 150719_gn3s_ishort.bin
-rw------- 1 jmfriedt jmfriedt 142072832 Jul 19 20:22 150719_gn3s_ishort2ibit1.bin
The gain in storage space is obvious, and close to optimum. But are the data correctly encoded ? Running
the GPS acquisition program on such datasets, the result indeed matches the
result from processing ishort or ichar values.
These charts were generated using the this script assuming a copy
of cacode.m as found on the Matlab Central repository is located in the Matlab path.
The raw binary files needed to reproduce these charts, recorded on July 19th 2015, are
150719_gn3s_ichar2ibit1.bin (132 MB for a 1 minute long record)
and 150719_gn3s_ishort2ibit1.bin (142 MB).