cassandra - Running hadoop jobs over snappy compressed column familes -
I am trying to dump the pig relation of a compressed column family. It has a single column whose value is a Jason Blob it is compressed through fast compression and the value validator is typed bytes. After making a connection and making a dump, I get the trash. Here is the description:
< code> column family: cf key verification class: org.apache.cassandra.db.marshal.TimeUUID type the default column value validator: org.apache.cassandra.db.marshal.BytesType Sorted by cell: org.apache.cassandra .db.marshal.UTF8Type GC Grace seconds Q: 86400 Compaction Min / Max Threshold: 2/32 Read the chance to repair: Read the local localized at 0.1 DC. Read the chance to repair: 0.0 Io cache on flush: Copy to false writing: True caching: KEYS_ONLY Bloom Filter FP Opportunity: Default built index : [] Compaction Strategy: org.apache.cassandra.db.com Advertising. Size treed Compaction Streaking compression option: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressorThen I:
suffocation> Rows = Load 'cql: // Chipspace / CF' using SQLServer ();
I have also tried:
grunt & gt; Rows = Load 'cql: // as Cyclostorage () using' Chipspace / CF '(Key: Charra, Call 1: Charra, Value: Charare);
But when I dump it it still looks like its binary
Whether the compression has not been handled transparently or have I forgotten something ? I have made some goals but have not seen anything on this subject. Apart from this, I am using Datastex Enterprise. 3.1. thank you in advanced!
I was able to resolve this issue. There was another layer of compression happening in DAO which was using java.util.zip.Deflater / Inflater with SPP compression defined on CF. The solution was to extend the CassandraStorage and override the getNext () method that calls the new implementation super.getNext () and where Tupales is appropriate.
Comments
Post a Comment