ArticlesCopyright Tekcent LimitedSat, 24 Aug 2013 16:10:00 ZSat, 24 August 2013 16:10:00https://www.tekcent.com/articles/2013/block-search-engines-from-crawling-the-web-server/Block search engines from crawling the web serverSay you have a web server with many websites that serves as your testing and development sites for your clients.  It's happened to the best of us that Google has gone and indexed one of your sites, either beacuse you've forgot to add password protection or you've accidently dropped in the LIVE robots.txt file. As described here Google respects the: X-Robots-tag: http://googleblog.blogspot.hk/2007/07/robots-exclusion-protocol-now-with-even.html   You can add the X-Robots-tag tag at your IIS web server level so that it sends a noindex response for every page that is requested from your web server:   Now inspect that the response headers includes the noindex commandSat, 24 Aug 2013 16:10:00 Zhttps://www.tekcent.com/articles/2013/running-code-in-umbraco-after-application-start/Running Code in Umbraco after Application_StartThe WebActivator NuGet package allows us to execute some startup code early in the ASP.NET pipeline. WebActivator was introduced with NuGet to solve the problem of running code without having to add it in the global.asax Running code after Application_Start Install the WebActivator package from NuGet Add a static class to your Visual Studio project. This class will contain your startup code namespace MyUmbraco.Web{ public static class Init { public static void Run() { //Your start up code here } }} Next wire up your startup code in the AssemblyInfo.cs using System.Reflection;using System.Runtime.InteropServices;using MyUmbraco.Web;using WebActivatorEx;[assembly: PostApplicationStartMethod(typeof(Init), "Run")] Here's a full screenshot of AssemblyInfo.cs The PostApplicationStartMethod attribute as the name suggests will run your custom startup code After the Application Start event.Wed, 26 Jun 2013 18:29:00 Zhttps://www.tekcent.com/articles/2013/getting-started-with-umbraco-and-visual-studio-2012/1182Getting Started with Umbraco and Visual Studio 2012Nuget is a Visual Studio extension that makes it easy to install and update third-party libraries and tools in Visual Studio. Make sure its installed before getting started. Create an empty ASP.NET Web Application in Visual Studio Right click your empty ASP.NET project and select "Manage NuGet Packages..." Search for and install the "Umbraco CMS" NuGet package Accept all the license agreements and click "Yes to All" when prompted (It's safe to do this since we started from a Empty ASP.NET Project) You should see this screen after successful completion - NuGet will install any package dependencies so you will see two packages installed on the confirmation screen. This step is optional. If you would like to use Umbraco with Mvc and Razor change the defaultRenderingEngine value in umbracoSettings.config to Mvc.  The default setting is set to WebForms. <defaultRenderingEngine>WebForms</defaultRenderingEngine> Build and run your project to complete the remaining installation steps. That's all!Fri, 21 Jun 2013 18:08:00 Zhttps://www.tekcent.com/articles/2013/a-better-canonical-domain-name-rule-for-iis/1182A Better Canonical Domain Name Rule for IISThe rule that's generated in IIS produces this code in the web.config file. <rule name="Canonical Host Name" stopProcessing="true"> <match url="(.*)" /> <conditions> <add input="{HTTP_HOST}" negate="true" pattern="^www\.tekcent\.com$" /> </conditions> <action type="Redirect" url="http://www.tekcent.com/{R:1}" redirectType="Permanent" /></rule> This rule does't work for us since it redirects any request from localhost or staging.* to our production website. Here's an improved version of the rule that works across localhost, development, staging and production. <rule name="Canonical Host Name" stopProcessing="true"> <match url="(.*)" /> <conditions> <add input="{HTTP_HOST}" pattern="^tekcent.com$" /> </conditions> <action type="Redirect" url="http://www.tekcent.com/{R:1}" redirectType="Permanent" /></rule>   This rule checks for an exact match on "tekcent.com" so sub domains such as staging.tekcent.com and localhost is allowed through. This works for us since canonical domain names is only required in production. An alternative method is to use web.config transforms - But it's a lot more work just to have something only for production use.Mon, 17 Jun 2013 19:10:00 Zhttps://www.tekcent.com/articles/2013/dynamic-robots-exclusion-file-in-aspnet/1182Dynamic Robots Exclusion File in ASP.NETThere will be a day when your client wants to test that Facebook or Twitter sharing function on your staging website, so off you go asking your IT administrator to “temporarily” disable the security. Boom there it is – This is the link to the World Wide Web that triggers the search engines to index your stuff. You can of course remind yourself to ask the IT administrator to re-enable the security. But people forget and these things do slip through with unwanted consequences, such as diluted search engine rankings, duplicate content issues, and real users making e-Commerce purchases on staging! Check out these Google search results that contains either dev, test or staging: Development websites appearing in Google's search results Testing websites appearing in Google's search results  Staging websites appearing in Google's search results The Google Webmasters tools can help you remove unwanted content from search results.  So how do you prevent search engines from indexing your non-production websites? By using a robots.txt file. This file will tell the search engine crawlers what they can or cannot crawl on your website. The goal is to allow the search bots to crawl everything in your production website and nothing in your non-production websites. Create two plain (ANSI encoded) text files.  Do not use Visual Studio to create the text files as they will be created using the UFT-8 encoding and will cause problems down the line, Edit robots.test.txt and add the following code: #DO NOT INDEX ANYTHING ON THIS WEBSITE User-agent: * Disallow: / Edit robots.live.txt and add the following code: #INDEX EVERYTHING YOU CAN FIND ON THIS WEBSITE User-agent: * Disallow: Did you spot the subtle difference between allowing and disallowing crawler access? The "Disallow: " (without the forward slash) allows access to all directories.  Get this wrong and your whole website will drop out of the search engine index! Take note. Next, copy the files to your Visual Studio Project At this point it's probably worth checking if you have the IIS URL Rewriting module installed on the server. Copy the following IIS rewriting rules to the web.config > system.webServer section: <rewrite> <rules> <rule name="Rewrite LIVE robots.txt" enabled="true" stopProcessing="true"> <match url="robots.txt" /> <action type="Rewrite" url="/robots.live.txt" /> <conditions> <add input="{HTTP_HOST}" pattern="^(www.)?tekcent.com" /> </conditions> </rule> <rule name="Rewrite TEST robots.txt" enabled="true" stopProcessing="true"> <match url="robots.txt" /> <action type="Rewrite" url="/robots.test.txt" /> <conditions> <add input="{HTTP_HOST}" pattern="^(www.)?tekcent.com" negate="true" /> </conditions> </rule> </rules></rewrite> After deploying the changes we can test the different versions of our website:     Finally, this useful tool can check the validity of your robots.txt files.Mon, 17 Jun 2013 01:56:00 Zhttps://www.tekcent.com/articles/2013/republish-umbraco-content-in-continuous-integration-server/1182Republish Umbraco Content in Continuous Integration ServerAt Tekcent we develop Umbraco projects using a shared database as this makes collaboration and working in teams easier. So the following steps is intended for this setup.  I will cover the different collaboration options for teams in another blog post. CI setups are a joy since they can automate the build and deployment process for you.  As often during with the development process you will make a bunch of changes to the content database...With a few lines of code you can easily automated publishing from your CI server so you don't have to login to click the "Republish entire website" Using Visual Studio add a Generic Handler to your Umbraco project The generic handler and corresponding code files will be added to your Visual Studio project Edit the codefile and add some code to call into Umbraco's RefreshContent() method.  This will trigger the republishing routine on the server using System.Web; namespace UmbracoRepublish { /// /// Summary description for RefreshContent /// public class RefreshContent : IHttpHandler { public void ProcessRequest(HttpContext context) { umbraco.library.RefreshContent(); context.Response.ContentType = "text/plain"; context.Response.Write("Content Refreshed"); } public bool IsReusable { get { return false; } } } } Compile and run your project. Open the link to your newly created handler in the web browser. You should see "Content Refreshed" displayed on the web browser. It's worth noting that you can redirect back to the homepage at this point - I've chosen to display a message. What's important is that the RefreshContent() method is being called, this is where the "magic" happens Add a post build step to call your handler. We'll use a Powershelll script to do this (new-object net.webclient).DownloadString("http://localhost:56713/refreshcontent.ashx") Make sure your handler is the last step in the build process Run the build...Remember the message that we send in the response? Finally, the all important green lightSat, 01 Jun 2013 00:13:00 Z