Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Python email
I have been working on a Python coded priority email inbox, with the ultimate aim of using a machine learning algorithm to label (or classify) a selection of emails as either important or un-important. I will begin with some background information and then move into my question.
 
I have so far developed code to extract data from an email and process it to discover the most important ones. This is achieved using the following email features:
 
Senders Address Frequency
Thread Activity
Date Received (time between replies)
Common Words in body/subject
 
The code I have currently applies a ranking (or weighting) (value 0.1-1) to each email based on its importance and then applies a label of either ‘important’ or ‘un-important’ (In this case this is just 1 or 0). The status of priority is awarded if the rank is >0.5. This data is stored in a CSV file (as below).
 
From Subject Body Date Rank Priority
test@test.com HelloWorld Body Words 10/10/2012 0.67 1
rest@test.com ByeWorld Body Words 10/10/2012 0.21 0
best@test.com SayWorld Body Words 10/10/2012 0.9 1
just@test.com HeyWorld Body Words 10/10/2012 0.48 0
etc …………………………………………………………………………
 
I have two sets of email data (One Training, One Testing). The above applies to my training email data. I am now attempting to train a learning algorithm so that I can predict the importance of the testing data.
 
To do this I have been looking at both SCIKIT and NLTK. However, I am having trouble transferring the information I have learnt in the tutorials and implementing into my project. I have no particular requirements in regards to which learning algorithm is used. Is this as simple as applying the following? And if so how?
 
X, y = email.data, email.target
 
from sklearn.svm import LinearSVC
clf = LinearSVC()
 
clf = clf.fit(X, y)
 
X_new = [Testing Email Data]
 
clf.predict(X_new)
Posted 3-Feb-13 23:28pm

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 490
1 Maciej Los 299
2 BillWoodruff 174
3 /\jmot 170
4 Suraj Sahoo | Coding Passion 150
0 OriginalGriff 8,484
1 Sergey Alexandrovich Kryukov 7,407
2 DamithSL 5,639
3 Maciej Los 5,159
4 Manas Bhardwaj 4,986


Advertise | Privacy | Mobile
Web01 | 2.8.1411023.1 | Last Updated 4 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100