|
Thanks for your answer but I'm afraid this is not possible, I'm just trying to write a spam filter for my website, so I can't keep users waiting that long, I thought about searching for all TLDs but I don't think it's a good idea, performance-wise. Do you know of any good spam filter that I can call from ASP.NET application? ie. send it a string and gets something like a boolean indicating whether it's spam or not, a percentage will even be much better than a boolean (the percentage of how likely this post is spam), thanks.
|
|
|
|
|
I think I just answered this in the ASP.NET forum.
There is no way of knowing if a string is a *valid* URL without posting to it. Telling if a string is a valid URL is easy with regex tho.
Christian Graus
No longer a Microsoft MVP, but still happy to answer your questions.
|
|
|
|
|
Ok, now I get your point, actually I don't care whether they are valid or not, as I mentioned before it's just for spam filtering so it's not important to check whether they are valid ..
Let me explain from the beginning (hopefully you have the time to read all this )
In my website, users should be adding a lot of posts in a short time and I want the site to be as fast and responsive as possible when they do this, so, basically I'm looking for a spam filter that will run on my machine (as opposed to spam filters that call a web service on another website, like akismet, which can be good for blogs and sites that don't receive many posts). Unfortunately I wasn't able, so far, to find such thing, this is why I'm trying to write it myself and it seems more complicated than what I thought.
Well, I thought of two approaches that I can use to detect spam:
- Using naive bayesian (there's an article here on code project that talks about that, see http://www.codeproject.com/KB/recipes/BayesianCS.aspx[^])
- Using some rules that usually apply to spam and this is what I'm trying to do. Actually naive bayesian is very effective in most cases but it's basically because of something related to my app. Read on:
Due to the nature of my website, users wouldn't normally post any text that contains links (and I don't change links that start with http:// to anchor tags). So, it's reasonable to assume that posts that contain links will most likely be spam. Spammers can spam your site for two reasons, first to get a higher page rank for some website, more accurately for some web page (which is not true in my case as I don't change links into anchor tags, and even if I was I could use rel="nofollow" as most people do) but anyway the point is that the spam contains a url, second to advertise something and in this case they have to leave a url, email or a phone number (if you can't reach the advertiser then the ad is useless, right?). Probably you're thinking that if I don't change the links into anchor tags they won't spam my site, I can assure you they are dumb enough to do this, I have seen many other websites that don't change links into anchors still they are heavily spammed (but may be not because they are dumb, it might be because it's rumored that google detects any links that start with http:// when crawling your site even if they are not in anchor tags, I have no idea though whether this is true or not). Anyway, what I'm trying to do is find whether the post contains any of these (url, email or phone number). Finding the email address or phone number is fairly easy with regex, finding the url is fairly easy if it starts with http://, but now there are two problems with this approach, first, by having a look on some spam ads I noticed that some spammers don't start their urls with http://, and second if they know that I only check for http:// they will post all urls without it. Now the real complication is to find urls that don't start with http://, because basically anything that has a 'dot' inside can be a url, so if a user doesn't leave a space between the period that ends a sentence (full stop) and the next sentence, it will be detected as a url (this is along with so much other text that can contain a dot between two strings yet it's not a url), so I thought I can use the TLDs (generic and country codes) but in this case our regex will be way too long! This will most probably affect performance, and even if you decide not to use regex (probably using a loop that checks for every TLD) this will also most probably affect performance, and to make things even worse, some completely valid text, like asp.net for example, will be detected as a url (and it's even a valid url in case you do a post ).
This is getting more complicated than needed, I think I'll either drop urls that only has two parts (like asp.net) or use naive bayesian
And BTW, the reason I wanted the spam filter to return a percentage is because some posts are guaranteed to be spam (the spam filter keeps a database of spam ads, when it receives a post it hashes it and compares it to the spam ads that has the same hash, SpamAssassin does this I believe), in this case the spam filter should return 100% but if the post is not guaranteed to be spam it returns a percentage less than 100% (depending on how much this post is likely to be spam). In my app, I will not save the posts that are 100% spam to the database but those that have a percentage less than 100% will be saved but won't be visible to any users except the ones who posted them until manually checked by a moderator.
Sorry for making this too long, just wanted to explain why I'm doing this ...
Have a great day ...
modified on Sunday, August 17, 2008 10:45 AM
|
|
|
|
|
Waleed Eissa wrote: Using naive bayesian
But again, Naive Bayes algorithm doesn't have inteliigence on its own.
It has to be trained in proper manner to produce results. The more you train him, the better results it will yield.
Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.
|
|
|
|
|
Actually I'm not esp. interested in Naive Bayes algorithm or any other algorithm, I'm just trying to filter out the spam, can you suggest a better way for doing this? And if you know of a good spam filter that I can use in my application that will even be much better.
Regards
|
|
|
|
|
Use regular expressions as you have already been told. Instead of checking for something like http://, why not just check for things like .com, .net, .edu, etc.
"The clue train passed his station without stopping." - John Simmons / outlaw programmer
"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon
|
|
|
|
|
Hi Paul, thanks for your answer, the problem with checking for domain names, like .com, .net .. etc, is that there are too many TLDs to check for (because you have to check for ccTLDs which are very commonly used by spammers), this is along with some other problems too, please refer to my last post.
Regards
|
|
|
|
|
Hello,
My application let user download some files from my web site,I want to let him download one file, only my program can recognize the file and work with it.Is it possible.
Thanks.
Dad
|
|
|
|
|
What would be the problem, registering the file extension on client?
|
|
|
|
|
the problem is how to structure a file that only my application can read.This file holds an image and a structure of an html document.
Dad
|
|
|
|
|
Sounds like an serialized xml document, optionally in binary format.
File itself is readable by any application or user but to understand its content is application specific.
|
|
|
|
|
|
I posted this a few days ago, but it got kinda lost. I want to have the final array NOT contain the part before the equals sign (result, etc...)
Hey, I was wondering if anyone can help me with trimming a string array. I am calling an API that returns this (At bottom of post).
I am then using this to convert that into a string array
string[] apiresult = new string[25];<br />
char[] splitter = { ';' };<br />
apiresult = result.Split(splitter);
My question is, how do I trim the string array so it just contains
success<br />
1<br />
Bob<br />
etc...
instead of
result=success<br />
userid=1<br />
firstname=Bob<br />
etc...
result=success;userid=1;firstname=Bob;lastname=Smith;companyname=Smith Enterprises;email=bsmith@boostplatform.com;address1=1 Smith Drive;address2=;city=Bobtown;state=Bobstate;postcode=12345;country=US;phonenumber=419-123-4567;notes=TESTING ACCOUNT!!;password=bsmith;status=Active;credit=1010.00;taxexempt=;language=;lastlogin=No Login Logged;billingcid=0;domainemails=;generalemails=;invoiceemails=;productemails=;supportemails=;
|
|
|
|
|
This should work:
string[] finalResult = new string[apiresult.Length];
for (int i = 0; i < apiresult.Length; i++)
{
finalResult[i] = apiresult[i].Substring(apiresult[i].IndexOf('=')+1);
}
Don't be afraid of loops.
Cheers
rotter
modified on Saturday, August 16, 2008 11:41 PM
|
|
|
|
|
Hi!
Try using this
string [] tokens = myString.split (new char[] {';' , '='});
foreach (string s in tokens)
{
//do something to s;
}
DONT use a fixed array string !!!!
|
|
|
|
|
|
I have a program that opens a small input window which prompts the user to enter a name. I want to store the name entered in the input window in a string variable in the main window. How do I do that?
Jason
|
|
|
|
|
I think you have a few options. If the scenario is that the main form creates the sub form and displays it modally the easiest solution would be to just put it in a public variable, so once it closes the variable can be accessed.
If its modeless it gets a little bit more tricky. I would make a class containing the string and an object. The main form constructor would initialize the class to some good default value, then you could make the form's constructor accept this new class. When the subclass wants to change the value, use one item in the class to lock the item (for threadsafeness) and then set the string value. The main class could then lock the string, and read it.
Hope this helped,
-Kenmaz
|
|
|
|
|
A not very clean solution :
Add a member to your dialog window
public AForm : Form
{
private string _str;
public string Str
{
get { return _str;}
set { _str = value;}
}
DoSomethingToStringAndCloseWindow ()
{
_str = "Hi there";
this.Hide();
}
}
public CallingForm : Form
{
void AMethod()
{
AForm frm = new AForm ();
frm.Str = "a string";
frm.ShowDialog(this);
string newString = frm.Str;
frm.Close();
frm.Dispose();
}
}
}
A Cleaner solution is to share the same object instance between forms...
|
|
|
|
|
The easy and standard way to do this is to define a delegate so that your sub window can call a method on the main window to pass back the value
Christian Graus
No longer a Microsoft MVP, but still happy to answer your questions.
|
|
|
|
|
As CG said above use a delegate. This is an example[^] I geve somebody else a couple of weeks ago.
DaveBTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)Expect everything to be hard and then enjoy the things that come easy. (code-frog)
|
|
|
|
|
Dave,
I have tried the example but for some reason I get the following error:
"Error 1 'System.Windows.Window' does not contain a definition for 'SendText' and no extension method 'SendText' accepting a first argument of type 'System.Windows.Window' could be found (are you missing a using directive or an assembly reference?)
The line that I get the error on is as follows:
req.SendText = new getArtistName(changeArtistName);
I am using Windows Presentation Foundation instead of Windows Forms.
Jason
|
|
|
|
|
This works for me (WpfApplication1 : Window1 : Window2 with textBox1):
Window1.xaml.cs
using System.Windows;
namespace WpfApplication1
{
public delegate void UpdateText(string text);
public partial class Window1 : Window
{
public Window1() { InitializeComponent(); }
private void Window_Loaded(object sender, RoutedEventArgs e)
{
Window2 window2 = new Window2();
window2.SendText = new UpdateText(Update);
window2.Show();
}
private void Update(string text)
{
this.Title = text;
}
}
}
Window2.xaml.cs
using System.Windows;
using System.Windows.Controls;
namespace WpfApplication1
{
public partial class Window2 : Window
{
public UpdateText SendText;
public Window2() { InitializeComponent(); }
private void textBox1_TextChanged(object sender, TextChangedEventArgs e)
{
SendText(textBox1.Text);
}
}
}
DaveBTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)Expect everything to be hard and then enjoy the things that come easy. (code-frog)
|
|
|
|
|
This worked perfectly. Thank you very much.
Jason
|
|
|
|
|
Hi,
I'm trying to serialize a simple configuration class that contains a few methods and primitive types. It also contains one very simple class:
[XmlRootAttribute(ElementName = "WildAnimal", IsNullable = false)]
class Config
{
public int a;
public int b;
public SimpleClass classinst;
public Config()
}
class SimpleClass
{
public int c;
public int d;
public SimpleClass()
}
Along with Serialize/Deseriailze functions inside the Config class:
public void SerializeObject(Object pObject)
{
try
{
FileStream fileStream = new FileStream(s_strBaseThemeDirectory +
@"\" +m_strName+".xml", FileMode.CreateNew, FileAccess.ReadWrite);
XmlSerializer xs = new XmlSerializer(typeof(Config));
XmlTextWriter xmlTextWriter = new XmlTextWriter(fileStream, Encoding.UTF8);
xs.Serialize(xmlTextWriter, pObject);
fileStream = (FileStream)xmlTextWriter.BaseStream;
fileStream.Close();
}
catch (Exception e)
{
Utilities.Trace("Error Serializing the XML: " + e.ToString());
}
}
public Configuration DeserializeObject(string strName)
{
FileStream fileStream = new FileStream("filename.xml", FileMode.Open, FileAccess.ReadWrite);
XmlSerializer xs = new XmlSerializer(typeof(Config));
XmlTextWriter xmlTextWriter = new XmlTextWriter(fileStream, Encoding.UTF8);
return (Configuration)xs.Deserialize(fileStream);
}
Everything works fine without the SimpleClass, but when that gets thrown into the mix I've been running into exceptions being thrown (the SimpleClass is actually in an Arraylist). Any idea how I should get around this issue?
Thanks!
-Ken
|
|
|
|
|