I am trying to implement an FPGrowth algorithm using Spark's MLLIB but do not know how to proceed. I have seen multiple examples but do not include cross validation where a data set is split into training and test.
// Recommendation engine can be per league
// "Ligue 1"
// "Bundesliga " // important to add space character after bundesliga due to switch case implementation in data generation
// "La Liga"
// "Premier League"
// "Seria A"
var league = ""
if(args.length > 0){
league = args(0).toString
}
def generateRecommendations(hc:HiveContext,sc : SparkContext, leagueName : String) = {
// Recommendations per league filter
var leagueCondition = ""
if(leagueName != ""){
leagueCondition = " and leagueName = '" + leagueName
}
println("\n\nAbout to get recommendationengine.sportsbookbets\n\n")
// Has all customers and their bets
var queryToGroupCustomers = "SELECT yt.userid as player, " +
" concat_ws(\",\", collect_set(match)) AS matchesPlayedOn" + //concat_ws()
" FROM recommendationengine.sportsbookbets_orc yt" +
" where yt.userid is not null " + leagueCondition + "' " +
"GROUP BY yt.userid limit 2"
println("Executing query: \n\n" + queryToGroupCustomers)
var results = hc.sql(queryToGroupCustomers)
I have the query above where I would run the algorithm on the leagueName selected. I would be able to run this with the normal algorithm but have no idea how to implement it in cross validation and save into tables as per league, and train or test. I would appreciate any help and guidelines...thanks
What I have tried:
This is how far as I got but this is just the normal algorithm.
println("Executing query: \n\n" + queryToGroupCustomers)
var results = hc.sql(queryToGroupCustomers)
val transactions: RDD[Array[String]] = results.rdd.map( row => row.get(2).toString.split(","))
var fpg = new FPGrowth()
.setMinSupport(0.5)
.setNumPartitions(10)
var model = fpg.run(transactions);
println("\n\n Starting FPGrowth\n\n")
model.freqItemsets.collect().foreach { itemset =>
println(itemset.items.mkString("[", ",", "]") + ", " + itemset.freq)
}