Can I create sequence file using spark dataframes?

Question

Can I create sequence file using spark dataframes?

4.2k views Asked by mahan07 At 27 November 2016 at 17:54

I have a requirement in which I need to create a sequence file.Right now we have written custom api on top of hadoop api,but since we are moving in spark we have to achieve the same using spark.Can this be achieved using spark dataframes?

Original Q&A

There are 1 answers

**Ram Ghadiyaram** · Accepted Answer · 2016-11-27T18:30:05+00:00

AFAIK there is no native api available directly in DataFrame except the below approach

Please try/think some thing like(which is RDD of DataFrame style, inspired by SequenceFileRDDFunctions.scala & method saveAsSequenceFile) in below example :

Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion.

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.SequenceFileRDDFunctions
import org.apache.hadoop.io.NullWritable

object driver extends App {

   val conf = new SparkConf()
        .setAppName("HDFS writable test")
   val sc = new SparkContext(conf)

   val empty = sc.emptyRDD[Any].repartition(10)

   val data = empty.mapPartitions(Generator.generate).map{ (NullWritable.get(), _) }

   val seq = new SequenceFileRDDFunctions(data)

   // seq.saveAsSequenceFile("/tmp/s1", None)

   seq.saveAsSequenceFile(s"hdfs://localdomain/tmp/s1/${new scala.util.Random().nextInt()}", None)
   sc.stop()
}

Further information pls see ..

TechQA.

Can I create sequence file using spark dataframes?

There are 1 answers

AFAIK there is no native api available directly in DataFrame except the below approach

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SEQUENCEFILE

Related Questions in OUTPUTFORMAT

Popular Questions

Trending Questions