Click here to Skip to main content
15,077,641 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to implement an FPGrowth algorithm using Spark's MLLIB but do not know how to proceed. I have seen multiple examples but do not include cross validation where a data set is split into training and test.

   // Recommendation engine can be per league
// "Ligue 1"
// "Bundesliga " // important to add space character after bundesliga due to switch case implementation in data generation
// "La Liga"
// "Premier League"
// "Seria A"
var league = ""
if(args.length > 0){
  league = args(0).toString

def generateRecommendations(hc:HiveContext,sc : SparkContext, leagueName : String) = {

// Recommendations per league filter
var leagueCondition =  ""
if(leagueName != ""){
  leagueCondition = " and leagueName = '" + leagueName

println("\n\nAbout to get recommendationengine.sportsbookbets\n\n")

// Has all customers and their bets
var queryToGroupCustomers = "SELECT yt.userid as player, " +
  " concat_ws(\",\", collect_set(match))  AS matchesPlayedOn" + //concat_ws()
  " FROM recommendationengine.sportsbookbets_orc yt" +
  " where yt.userid is not null " + leagueCondition + "' " +
  "GROUP BY yt.userid limit 2"

println("Executing query: \n\n" + queryToGroupCustomers)
var results = hc.sql(queryToGroupCustomers)

I have the query above where I would run the algorithm on the leagueName selected. I would be able to run this with the normal algorithm but have no idea how to implement it in cross validation and save into tables as per league, and train or test. I would appreciate any help and guidelines...thanks

What I have tried:

This is how far as I got but this is just the normal algorithm.

println("Executing query: \n\n" + queryToGroupCustomers)
  var results = hc.sql(queryToGroupCustomers)
  val transactions: RDD[Array[String]] = row => row.get(2).toString.split(","))

  // Set configurations for FP-Growth
  var fpg = new FPGrowth()

  // Generate model
  var model =;

  println("\n\n Starting FPGrowth\n\n")

  model.freqItemsets.collect().foreach { itemset =>
    println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)
Updated 26-Mar-17 22:48pm

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900