C# Discussion Boards - CodeProject

Waleed Eissa17-Aug-08 1:42

17-Aug-08 1:42

Thanks for your answer but I'm afraid this is not possible, I'm just trying to write a spam filter for my website, so I can't keep users waiting that long, I thought about searching for all TLDs but I don't think it's a good idea, performance-wise. Do you know of any good spam filter that I can call from ASP.NET application? ie. send it a string and gets something like a boolean indicating whether it's spam or not, a percentage will even be much better than a boolean (the percentage of how likely this post is spam), thanks.

Waleed Eissa
Software Developer Sydney

Re: How to know wheter a string contains a url? [modified]

Christian Graus17-Aug-08 2:06

Christian Graus

17-Aug-08 2:06

I think I just answered this in the ASP.NET forum.

There is no way of knowing if a string is a *valid* URL without posting to it. Telling if a string is a valid URL is easy with regex tho.

Christian Graus

No longer a Microsoft MVP, but still happy to answer your questions.

Waleed Eissa17-Aug-08 3:34

17-Aug-08 3:34

Ok, now I get your point, actually I don't care whether they are valid or not, as I mentioned before it's just for spam filtering so it's not important to check whether they are valid ..

Let me explain from the beginning (hopefully you have the time to read all this Smile | :)

)

In my website, users should be adding a lot of posts in a short time and I want the site to be as fast and responsive as possible when they do this, so, basically I'm looking for a spam filter that will run on my machine (as opposed to spam filters that call a web service on another website, like akismet, which can be good for blogs and sites that don't receive many posts). Unfortunately I wasn't able, so far, to find such thing, this is why I'm trying to write it myself and it seems more complicated than what I thought.

Well, I thought of two approaches that I can use to detect spam:

- Using naive bayesian (there's an article here on code project that talks about that, see http://www.codeproject.com/KB/recipes/BayesianCS.aspx[^])

- Using some rules that usually apply to spam and this is what I'm trying to do. Actually naive bayesian is very effective in most cases but it's basically because of something related to my app. Read on:

Due to the nature of my website, users wouldn't normally post any text that contains links (and I don't change links that start with http:// to anchor tags). So, it's reasonable to assume that posts that contain links will most likely be spam. Spammers can spam your site for two reasons, first to get a higher page rank for some website, more accurately for some web page (which is not true in my case as I don't change links into anchor tags, and even if I was I could use rel="nofollow" as most people do) but anyway the point is that the spam contains a url, second to advertise something and in this case they have to leave a url, email or a phone number (if you can't reach the advertiser then the ad is useless, right?). Probably you're thinking that if I don't change the links into anchor tags they won't spam my site, I can assure you they are dumb enough to do this, I have seen many other websites that don't change links into anchors still they are heavily spammed (but may be not because they are dumb, it might be because it's rumored that google detects any links that start with http:// when crawling your site even if they are not in anchor tags, I have no idea though whether this is true or not). Anyway, what I'm trying to do is find whether the post contains any of these (url, email or phone number). Finding the email address or phone number is fairly easy with regex, finding the url is fairly easy if it starts with http://, but now there are two problems with this approach, first, by having a look on some spam ads I noticed that some spammers don't start their urls with http://, and second if they know that I only check for http:// they will post all urls without it. Now the real complication is to find urls that don't start with http://, because basically anything that has a 'dot' inside can be a url, so if a user doesn't leave a space between the period that ends a sentence (full stop) and the next sentence, it will be detected as a url (this is along with so much other text that can contain a dot between two strings yet it's not a url), so I thought I can use the TLDs (generic and country codes) but in this case our regex will be way too long! This will most probably affect performance, and even if you decide not to use regex (probably using a loop that checks for every TLD) this will also most probably affect performance, and to make things even worse, some completely valid text, like asp.net for example, will be detected as a url (and it's even a valid url in case you do a post Smile | :)

).

This is getting more complicated than needed, I think I'll either drop urls that only has two parts (like asp.net) or use naive bayesian

And BTW, the reason I wanted the spam filter to return a percentage is because some posts are guaranteed to be spam (the spam filter keeps a database of spam ads, when it receives a post it hashes it and compares it to the spam ads that has the same hash, SpamAssassin does this I believe), in this case the spam filter should return 100% but if the post is not guaranteed to be spam it returns a percentage less than 100% (depending on how much this post is likely to be spam). In my app, I will not save the posts that are 100% spam to the database but those that have a percentage less than 100% will be saved but won't be visible to any users except the ones who posted them until manually checked by a moderator.

Sorry for making this too long, just wanted to explain why I'm doing this ...

Have a great day ...

Waleed Eissa
Software Developer Sydney

modified on Sunday, August 17, 2008 10:45 AM

Manas Bhardwaj18-Aug-08 6:15

Manas Bhardwaj

18-Aug-08 6:15

Waleed Eissa wrote:
Using naive bayesian

But again, Naive Bayes algorithm doesn't have inteliigence on its own.
It has to be trained in proper manner to produce results. The more you train him, the better results it will yield.

Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

Waleed Eissa18-Aug-08 13:49

18-Aug-08 13:49

Actually I'm not esp. interested in Naive Bayes algorithm or any other algorithm, I'm just trying to filter out the spam, can you suggest a better way for doing this? And if you know of a good spam filter that I can use in my application that will even be much better.

Regards

Waleed Eissa
Software Developer Sydney

Paul Conrad17-Aug-08 8:12

Paul Conrad

17-Aug-08 8:12

Use regular expressions as you have already been told. Instead of checking for something like http://, why not just check for things like .com, .net, .edu, etc.

"The clue train passed his station without stopping." - John Simmons / outlaw programmer

"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

Waleed Eissa17-Aug-08 17:31

17-Aug-08 17:31

Hi Paul, thanks for your answer, the problem with checking for domain names, like .com, .net .. etc, is that there are too many TLDs to check for (because you have to check for ccTLDs which are very commonly used by spammers), this is along with some other problems too, please refer to my last post.

Regards

Waleed Eissa
Software Developer Sydney

New Extention

hadad16-Aug-08 22:15

hadad

16-Aug-08 22:15

Hello,
My application let user download some files from my web site,I want to let him download one file, only my program can recognize the file and work with it.Is it possible.
Thanks.

Dad

Wendelius16-Aug-08 23:12

Wendelius

16-Aug-08 23:12

What would be the problem, registering the file extension on client?

hadad16-Aug-08 23:19

hadad

16-Aug-08 23:19

the problem is how to structure a file that only my application can read.This file holds an image and a structure of an html document.

Dad

Wendelius16-Aug-08 23:30

Wendelius

16-Aug-08 23:30

Sounds like an serialized xml document, optionally in binary format.

File itself is readable by any application or user but to understand its content is application specific.

Communicating between windows

Giorgi Dalakishvili17-Aug-08 0:26

Giorgi Dalakishvili

17-Aug-08 0:26

Have a look at this: Registering the Extension[^]

Giorgi Dalakishvili

#region signature
my articles
#endregion

Trim String Array

Arcdigital16-Aug-08 15:04

Arcdigital

16-Aug-08 15:04

I posted this a few days ago, but it got kinda lost. I want to have the final array NOT contain the part before the equals sign (result, etc...)

Hey, I was wondering if anyone can help me with trimming a string array. I am calling an API that returns this (At bottom of post).

I am then using this to convert that into a string array

string[] apiresult = new string[25];<br />
char[] splitter = { ';' };<br />
apiresult = result.Split(splitter);

My question is, how do I trim the string array so it just contains

success<br />
1<br />
Bob<br />
etc...

instead of

result=success<br />
userid=1<br />
firstname=Bob<br />
etc...

result=success;userid=1;firstname=Bob;lastname=Smith;companyname=Smith Enterprises;email=bsmith@boostplatform.com;address1=1 Smith Drive;address2=;city=Bobtown;state=Bobstate;postcode=12345;country=US;phonenumber=419-123-4567;notes=TESTING ACCOUNT!!;password=bsmith;status=Active;credit=1010.00;taxexempt=;language=;lastlogin=No Login Logged;billingcid=0;domainemails=;generalemails=;invoiceemails=;productemails=;supportemails=;

Re: Trim String Array

Dr. Emmett Brown16-Aug-08 15:15

Dr. Emmett Brown

16-Aug-08 15:15

This should work:

string[] finalResult = new string[apiresult.Length];
for (int i = 0; i < apiresult.Length; i++)
{
finalResult[i] = apiresult[i].Substring(apiresult[i].IndexOf('=')+1);
}

Don't be afraid of loops. Smile | :)

Cheers

rotter

modified on Saturday, August 16, 2008 11:41 PM

Re: Trim String Array

lisan_al_ghaib17-Aug-08 0:06

lisan_al_ghaib

17-Aug-08 0:06

Hi!
Try using this
string [] tokens = myString.split (new char[] {';' , '='});
foreach (string s in tokens)
{
//do something to s;
}

DONT use a fixed array string !!!!

Re: Trim String Array

PIEBALDconsult17-Aug-08 4:07

PIEBALDconsult

17-Aug-08 4:07

It didn't get lost, you just didn't look for it

http://www.codeproject.com/script/Forums/View.aspx?fid=1649&msg=2677761[^]

Jason Coggins16-Aug-08 13:08

Jason Coggins

16-Aug-08 13:08

I have a program that opens a small input window which prompts the user to enter a name. I want to store the name entered in the input window in a string variable in the main window. How do I do that?

Jason

Ken Mazaika16-Aug-08 13:19

Ken Mazaika

16-Aug-08 13:19

I think you have a few options. If the scenario is that the main form creates the sub form and displays it modally the easiest solution would be to just put it in a public variable, so once it closes the variable can be accessed.

If its modeless it gets a little bit more tricky. I would make a class containing the string and an object. The main form constructor would initialize the class to some good default value, then you could make the form's constructor accept this new class. When the subclass wants to change the value, use one item in the class to lock the item (for threadsafeness) and then set the string value. The main class could then lock the string, and read it.

Hope this helped,

-Kenmaz

lisan_al_ghaib16-Aug-08 13:33

lisan_al_ghaib

16-Aug-08 13:33

A not very clean solution :
Add a member to your dialog window

public AForm : Form
{
private string _str;
public string Str
{
get { return _str;}
set { _str = value;}
}

DoSomethingToStringAndCloseWindow ()
{
_str = "Hi there";
this.Hide();
}
}

public CallingForm : Form
{
void AMethod()
{
AForm frm = new AForm ();
frm.Str = "a string";
frm.ShowDialog(this);
string newString = frm.Str;
frm.Close();
frm.Dispose();
}
}

}

A Cleaner solution is to share the same object instance between forms...

Christian Graus16-Aug-08 14:05

Christian Graus

16-Aug-08 14:05

The easy and standard way to do this is to define a delegate so that your sub window can call a method on the main window to pass back the value

Christian Graus

No longer a Microsoft MVP, but still happy to answer your questions.

DaveyM6916-Aug-08 23:46

DaveyM69

16-Aug-08 23:46

As CG said above use a delegate. This is an example[^] I geve somebody else a couple of weeks ago.

Dave
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
Expect everything to be hard and then enjoy the things that come easy. (code-frog)

Jason Coggins17-Aug-08 10:53

Jason Coggins

17-Aug-08 10:53

Dave,

I have tried the example but for some reason I get the following error:

"Error 1 'System.Windows.Window' does not contain a definition for 'SendText' and no extension method 'SendText' accepting a first argument of type 'System.Windows.Window' could be found (are you missing a using directive or an assembly reference?)

The line that I get the error on is as follows:

req.SendText = new getArtistName(changeArtistName);

I am using Windows Presentation Foundation instead of Windows Forms.

Jason

DaveyM6917-Aug-08 23:10

DaveyM69

17-Aug-08 23:10

This works for me (WpfApplication1 : Window1 : Window2 with textBox1):
Window1.xaml.cs

using System.Windows;
namespace WpfApplication1
{
    public delegate void UpdateText(string text);
    public partial class Window1 : Window
    {
        public Window1() { InitializeComponent(); }
        private void Window_Loaded(object sender, RoutedEventArgs e)
        {
            Window2 window2 = new Window2();
            window2.SendText = new UpdateText(Update);
            window2.Show();
        }
        private void Update(string text) 
        {
            this.Title = text; 
        }
    }
}

Window2.xaml.cs

using System.Windows;
using System.Windows.Controls;
namespace WpfApplication1
{
    public partial class Window2 : Window
    {
        public UpdateText SendText;
        public Window2() { InitializeComponent(); }
        private void textBox1_TextChanged(object sender, TextChangedEventArgs e)
        {
            SendText(textBox1.Text);
        }
    }
}

Dave
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
Expect everything to be hard and then enjoy the things that come easy. (code-frog)