IntelliJ and Scala Spark

  1. Its a pain compared to Python ways of doing stuffs. Much more complicated than I have expected.Few takeaways:

    1. Guide for compiling Scala application in IntelliJ.
    2. assembly.sbt: Compilation method
    3. build.sbt: Dependencies specification
    4. File –> Settings –> Project – level settings, tick Use SBT Shell for build and import
    5. use sbt to compile project: View –> Tool Windows –> SBT Shell: compile, update, assembly

  2. I had both the memory and performance issues using Pyspark to write predictions to HDFS. As a result, I have tried out using Scala Spark with Aerospike and it turned out to be performing very well!!Imports I have used:

    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    import org.apache.spark.sql.{Row, SQLContext}
    import com.typesafe.config.ConfigFactory
    import org.apache.log4j.BasicConfigurator
    import org.joda.time.format.DateTimeFormat
    import org.json4s._
    import org.json4s.jackson.JsonMethods._

    import java.io.File

    import com.aerospike.client.{AerospikeClient,Key,Bin}
    import com.aerospike.client.Value.ListValue

    import scala.collection.JavaConverters._
    import scala.concurrent.{Await, ExecutionContext, Future, blocking}
    import scala.concurrent.duration._
    import scala.math.exp

  3. Learn to write recursive function instead of using for loops!

 

Leave a Reply