Click here to Skip to main content
15,891,864 members
Articles / Desktop Programming / WPF

Iron Web Analyzer

Rate me:
Please Sign up or sign in to vote.
4.83/5 (9 votes)
26 May 2010CPOL5 min read 39.4K   3.4K   37  
Analyze website content for Search Engine Optimization and technical problems (using Iron Python)
#$ironpythonanalyzer
#$NAME Title Analyzer
#$AUTHOR Iron Web Analyzer
#$CONTACT hamed.ji@gmail.com
#$URL http://IronWebAnalyzer.SourceForge.net/Analyzers/HTML_Title_Tag_Analyzer.html
#$DESCRIPTION Analyze Title tag of HTMLs for SEO
#$CONTENTTYPE text/html
#$TESTPATH http://www.coolbit-int.com
def ChangeWord(word):
	lowers = ["a", "as", "in", "or", "and", "by", "to", "their", "is", "your", "his", "her", "for", "then"]
	if word.startswith("www.") or word.startswith("http://"):
		return word.lower()
	if word.lower() in lowers:
		return word.lower()
	elif word.isupper():
		return word
	else:
		return word.capitalize()

if content.AsHtml != None:
	L = content.AsHtml.GetElementByTag("title")

	if len(L) == 0:
		content.AddMessage("Page do not have ANY title tag", "e")
	elif len(L) > 1:
		content.AddMessage("Page have more than one title tag", "e")
	else: # only one title tag
		title = L[0]
		t = title.InnerText.strip()
		if len(t) == 0:
			content.AddMessage("The Title tag is empty", "e", title.Line, title.LinePosition)
		else:
			if len(t) > 95:
				content.AddMessage("Title length(" + str(len(t)) + ") is more than IE maximum title length(95)", "e",	
					title.Line, title.LinePosition)
			elif len(t) > 66:
				content.AddMessage("Title length(" + str(len(t)) + ") is more than search engines limits(66)", "w",	
					title.Line, title.LinePosition)

			BadTitles = ["notitle", "untitle", "page", "untitled page", "default", "noname", "default page",
				 "home", "homepage", "new", "newpage", "products", "about", "contact"]
			if t.lower() in BadTitles or t.lower().startswith("new page"):
				content.AddMessage(t + " is not good title for a page", "e", title.Line, title.LinePosition)

			if t[0].islower():
				content.AddMessage("Title: '" + t + "' - Always use capital letter as first letter of title", "w", 
					title.Line, title.LinePosition)		

			words = t.split(' ')
			newWords = []
			
			if len(words) < 3:
				content.AddMessage("'" + t + "' is just "  + str(len(words)) + " word(s) title. Did you use good keywords and web site name in title ?", "w",
					title.Line, title.LinePosition)
				
			for w in words:
				newWords.append(ChangeWord(w))
				newWords.append(" ")
			
			Count = 0
			Body = content.AsHtml.Document.DocumentNode.InnerText
			for w in words:
				if Body.find(w) != -1:
					Count = Count + 1
			Percent = float(Count) / len(words)
			if Percent < 0.8:
				content.AddMessage("Less than 80% of Title words found in body(" + str(Percent * 100) + "%)", "w", title.Line, title.LinePosition)

			newTitle = "".join(newWords).strip()
			if newTitle != t:
				content.AddMessage("Title Capitalization: Use '" + newTitle + "' instead of '" + t + "' as your title", "w",
					title.Line, title.LinePosition)
				
							

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
Iran (Islamic Republic of) Iran (Islamic Republic of)
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions