Sydney Umbraco Meet-Up - Post CodeGarden Recap

by ElijahGlover 5. July 2011 06:01

Peter Gregory is back once again in Sydney for level 1 and 2 certifications, we thought it would a great time to host a meet-up and fill-in those who couldn’t attend CodeGarden.

Agenda

  • Shannon Deminick from the Umbraco HQ on V5
    Shanon will demo v5 and talk about the progress being made on v5
  • Aaron Powell from Readify on v5 hive and data
    Aaron will give a intro to working with data through Hive
  • Peter Gregory from the Umbraco HQ on v4 future
    Is it the end? - Find out about future plans for v4
  • Peter Gregory from the Umbraco HQ on Deli
    Find out how to commercialise your packages for Umbraco
  • Shannon Deminick from the Umbraco HQ on v5 packages
    Shannon will give an intro to building packages for V5 and a Jump start

When and where

Wednesday 20th July, 6pm – onwards.
TheFARM Digital, Suite 101, 4 – 14 Buckingham St Surry Hills
Please register your interest on (http://our.umbraco.org/events/sydney-umbraco-meet-up) so we can ensure that we have plenty of beer to go around.

Who is welcome?

Everyone!

  • If you’re taking the Level 1 and/ or Level 2 that week then come on down.
  • If you’re currently using Umbraco then come on down. 
  • If you’re just plain interested in what the hell this Umbraco thing is then come on down!


See you there!

Tags:
Categories: Umbraco

Searching Umbraco using Razor and Examine

by Shannon Deminick 15. March 2011 05:51

Since Razor is really just c# it’s super simple to run a search in Umbraco using Razor and Examine.  In MVC the actual searching should be left up to the controller to give the search results to your view, but in Umbraco 4.6 + , Razor is used as macros which actually ‘do stuff’. Here’s how incredibly simple it is to do a search:

@using Examine; @* Get the search term from query string *@ @{var searchTerm = Request.QueryString["search"];} <ul class="search-results"> @foreach (var result in ExamineManager.Instance.Search(searchTerm, true)) { <li> <span>@result.Score</span> <a href="@umbraco.library.NiceUrl(result.Id)"> @result.Fields["nodeName"] </a> </li> } </ul>

That’s it! Pretty darn easy.

And for all you sceptics who think there’s too much configuration involved to setup Examine, configuring Examine requires 3 lines of code. Yes its true, 3 lines, that’s it. Here’s the bare minimum setup:

1. Create an indexer under the ExamineIndexProviders section:

<add name="MyIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>

2. Create a searcher under the ExamineSearchProviders section:

<add name="MySearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>

3. Create an index set under the ExamineLuceneIndexSets config section:

<IndexSet SetName="MyIndexSet" IndexPath="~/App_Data/TEMP/MyIndex" />

This will index all of your data in Umbraco and allow you to search against all of it. If you want to search on specific subsets, you can use the FluentAPI to search and of course if you want to modify your index, there’s much more you can do with the config if you like.

With Examine the sky is the limit, you can have an incredibly simple index and search mechanism up to an incredibly complex index with event handlers, etc… and a very complex search with fuzzy logic, proximity searches, etc…  And no matter what flavour you choose it is guaranteed to be VERY fast and doesn’t matter how much data you’re searching against.

I also STRONGLY advise you to use the latest release on CodePlex: http://examine.codeplex.com/releases/view/50781 . There will also be version 1.1 coming out very soon.

Enjoy!

Categories: Umbraco | Examine | .Net

Linq2Umbraco driver for LINQPad

by SaschaWolter 24. February 2011 06:55

Having used Linq2Umbraco for a while now to retrieve Umbraco data in custom DALs I really needed a way to build queries and execute them against the Umbraco web site’s data without each time compiling the application and maybe even firing up the debugger. Enter LINQPad by by Joseph Albahari, a fantastic program which has pretty much replaced SQL Server Management Studio on my workstation for querying databases. Although you can query the Umbraco database of your web site LINQPad can’t natively operate on an UmbracoDataContext, which is what Linq2Umbraco is using when you export the Umbraco document types to .Net and which you will be using when querying the data in the DAL. So it would be really helpful if you can use the UmbracoDataContext directly in LINQPad as you can use these queries 1:1 in your code. Thankfully LINQPad can be extended to work with custom DataContexts, all that needs to be done is to write a custom data context driver as described here. The documentation is exceptionally good and after a short while the first Linq2Umraco driver for LINQPad was ready and is now free to download:

Download TheFarm Linq2Umbraco Driver for LINQPad 1.0 (16.08kb)

The driver has been compiled against LINQPad 2.31 using .Net 3.5, so it should work fine on both current versions of LINQPad (.Net 3.5 and 4). The umbraco.Linq.Core.dll, which the driver needs as it contains the UmbracoDataContext, will be dynamically loaded. The driver assumes that it exists in the same directory as the dll which contains the custom Linq2Umbraco DataContext. The dynamic loading has 2 advantages:

  • The size of the driver is only 16kb! Smile No need to pack all the dlls in there as they are all present in your Umbraco installation anyway. And this way the driver isn’t bound to one specific Umbraco version.
  • LINQPad requires it’s driver assemblies to be strongly signed, and .Net requires every referenced assembly to be strongly signed as well. However umbraco.Linq.Core is not strongly signed, and although I managed to strongly sign the assembly by using a little tool it did create havoc to the Umbraco installation, so dynamically loading the assembly magically resolves all that headache!

Following is a quick demonstration of the driver in use. I’ve created a small test Umbraco installation with a couple of document types and content nodes:

demo_doctypes

demo_content

The doc types contain just a couple of standard fields like text string, data,  plus one Ultimate picker instance on the Products page (checkbox list, saved as comma delimited string). Now I’ll export the document types to .Net by right clicking on ‘Document Types’, the popup window will be populated like this:

demo_export

It doesn’t matter if you choose POCO with or without abstractions. It is also very advisable to install Matt Brailford’s fantastic AutoExport2DotNet package, it will perform an export to .Net every time you modify the document types.

After downloading and renaming the UmbracoDataContext files I’ll include them in the custom DAL like so:

demo_solution_explorer

As you can see at the moment the DAL project doesn’t contain anything else but the two automatically generated files plus the necessary link to umbraco.Linq.Core. The Umbraco 4.6.1 test installation is showing up right next to it, I need the /App_Data/umbraco.config file from it later.

Now let’s start up LINQPad and add a connection using the custom driver:

demo_add_connecton

Clicking on View more drivers… will reveal the following dialog, where Browse… let’s you select the custom LINQPad driver:

demo_add_driver

Opening the driver will shortly lead to the message ‘Driver successfully loaded’, the driver will now appear in the list of drivers from above:

demo_driver

After hitting ‘Next’ the following dialog will appear:

demo_dialog

Here the properties for the Linq2Umbraco connection are set.

  • Path to custom assembly: This is the assembly that contains the exported UmbracoDataContext, in this case it’s TheFarm.DAL.dll
  • Full name of custom type: A popup window will present you a list of all UmbracoDataContexts in the assembly, in 99% of the cases there will only be one and in this case it’s the TheFarmDemoDataContext as specified above
  • /Appdata/umbraco.config: The umbraco.config file of the Umbraco installation found in /App_Data.
  • Surpress lazy secondary queries: UmbracoDataContext uses the AssociationTree class to list children, e.g. the class Products will contain a list of ProductCategories. When querying Products you will most likely not want to generate a list of all ProductCategories for each Product, Surpress lazy secondary queries will tell the program to stop evaluating these expressions as well.

After hitting OK the connection will be added to LINQPad and you can start querying the custom UmbracoDataContext:

demo_query1

Notice here the output of TopProducts (Ultimate picker), Introduction (Richtext editor) and PricesValidUntil (date). ProductCategorys and Productss are the above mentioned children and grand children which are here surpressed with ‘Surpress lazy secondary queries’. Another example:

demo_query2

You can also easily create custom Linq Extension methods e.g. in the DAL and use them in your queries like so:

demo_query3

[Credits for the ToCSV extension go to Muhammad Mosa.]

TheFarm Linq2UmbracoDataContextDriver 1.0.lpx (16.08 kb)

Categories: .Net | Umbraco

Creating code-behind files for Umbraco templates

by SaschaWolter 28. January 2011 12:04

I’ve always had this idea in my head that one of the downfalls of using Umbraco when coming form standard ASP.Net web application was the missing code-behind files. You know, when you create a new web application and add an .aspx page to it it conveniently comes with a .cs and design.cs file. Most of the time I would even let the code-behind file inherit from my own custom Page/MasterPage implementation, e.g. a SecuredPage that comes with various properties and methods to handle authentication. Although Umbraco uses regular masterpages (if you haven’t turned it off in the web.config) all you get in the backoffice is the actual page template. Now, don’t get me wrong: I love the way Umbraco let’s you edit all aspects of your site via the backend and gives you the utmost flexibility and 100% control over the output, presented in a refreshingly simple manner. Yet sometimes you need a bit more, and it’s just another clear plus for Umbraco that you are able do the following without ever having to modify the core.

The 'aha' moment that it is actually quite easy to add code-behind files to Umbraco masterpages came to me when I had to port a quite big ASP.Net website to Umbraco. The website had grown organically over the years with lots of custom templates, user controls, etc. The site also had multi-language support, all of which was handled in the code-behind files of the pages. The goal was to get it over to Umbraco as quick as possible, then rework the functionality bit by bit. So I started by creating a new Umbraco site and ‘wrapped’ it in a web application project in Visual Studio.

 

1-28-2011 5-00-55 PM

[Please refer to the comments below to find more information on how to set this up in Visual Studio.]

After adding a couple of document types and templates in Umbraco the masterpages folder looks something like this:

1-28-2011 5-28-34 PM

The Root.master file is the main master page, Page1.master and Page2.master are nested master pages in Umbraco. I’ve included all three of them in the solution. Now it’s time to create the code-behind file: right-click on the masterpages folder and add three C# classes and name them Root.master.cs, Page1.master.cs and Page2.master.cs. The result should be something like this:

1-28-2011 5-29-38 PM

Visual Studio automatically groups them together, fantastic. Yet they are not really hooked up yet, VS does the grouping just based on file names. The master directive on Root.master currently looks like this:

<%@ Master Language="C#" MasterPageFile="~/umbraco/masterpages/default.master" AutoEventWireup="true" %>

To hook up the cs file we need to add the CodeBehind and Inherits attributes like so:

<%@ Master Language="C#" MasterPageFile="~/umbraco/masterpages/default.master" AutoEventWireup="true" CodeBehind="Root.master.cs" Inherits="Umbraco_4._6._1.masterpages.Root"%>

You should get an error at this point as the compiler complains that Root is not convertible to System.Web.UI.MasterPage, so we need to fix this in the cs file as well by making the class partial (necessary if you want to later add designer files as well) and inheriting from System.Web.UI.MasterPage. An empty Page_Load message can’t hurt as well:

using System; namespace Umbraco4_6_1.masterpages { public partial class Root : System.Web.UI.MasterPage { protected void Page_Load(object sender, EventArgs e) { } } }

You should now be able to switch between both files by pressing F7 in Visual Studio. Let’s try to add a Property and reference that from the template:

public string Message { get; set; } protected void Page_Load(object sender, EventArgs e) { Message = "All the best from your code-behind file!! :)"; }

and something like this on the template:

<div> <%= Message %> </div>

Now we just need to compile the project and navigate to a content page that uses the Root template to see the message.

 

Adding designer files

[As Simon Dingley pointed out below there is an even easier way to create the designer files: right-click on the master.aspx page and select "Convert to web application", which will create the .designer file for the selected item.]

We can also add a designer file to the duo to make things even better. After adding Root.master.designer.cs, Page1.master.designer.cs and Page2.master.designer.cs the solution looks like this:

1-28-2011 5-49-22 PM

Visual Studio is now rightfully complaining that it got duplicate definitions for the classes and even suggests to add the partial keyword, which we will quickly do. After that is all working and compiling nicely we need to give Visual Studio control over the designer files. That is easily accomplished by slightly modifying each .master file (e.g. by adding a single space to an empty line) and saving it, VS will do the rest for you. The most important thing this will do for you is to reference all controls you add to the template so they are available for use in the code-behind file.

Now let’s try to modify the message value from the code-behind of Page1 by adding

protected void Page_Load(object sender, EventArgs e) { ((Root) Master).Message = "Hello from the nested master page!"; }

to it. Browsing to any Umbraco page that uses the Page1 template will now show the new message.

Categories: .Net | Umbraco

uSnapshot… coming soon!

by Aaron Powell 21. January 2011 11:08

A while ago Shannon and I blogged about a project we were working on called Snapshot which was essentially a way to export flat ASP.Net websites from an Umbraco install.

Since then, TheFARM has launched a few websites using development versions of Snapshot and we’ve been working towards getting a version of Snapshot ready for which anyone could purchase and use.
There have been some hurdles along the way such as legality around ownership, how to release it as a commercial product and my moving on from TheFARM. But finally … this has culminated in something exciting…

uSnapshot is coming!

That’s right sports fans, we’ve renamed Snapshot to uSnapshot and we’re going to be working on finishing v1.

But we realised that we don’t know how everyone wants to use uSnapshot, so we’re throwing open the doors. Today we’ve launched the uSnapshot Beta Program!

We’re looking for people who are interested in helping us beta test uSnapshot as we work towards a v1 release, help us find bugs and ultimately work on a final feature set for the release.

So if you think that uSnapshot may be something that you (or your company) may be interested in please sign up.

Have a look at the uSnapshot site for full details and benefits regarding the beta program.

Categories: Umbraco | uSnapshot

Presenting a new Umbraco data type: Embedded Content

by SaschaWolter 20. January 2011 14:51

 

Are you looking for a way to create content on a node that uses the build-in Umbraco data types, yet let’s the content editor decide how many fields he actually needs? Do you want your content editors to be able to order the fields as they see fit? Are you tired of having to create ‘data’ sub nodes just so you have the data in Xml format for easy transformation? Or are you just looking for a version of the Repeatable Custom Content that works with Umbraco 4.5+?

 

Then it is time to open the curtain for another Umbraco data type grown here at the FARM called Embedded Content (version 1). In short it allows you to add content to a node with pretty much the same outcome as if you were using data sub nodes. For example see this implementation of specifications for a product:

- Product

  - Product specifications

    - Package measurements (value: 200cm x 15cm x 150cm)

    - Package weight (value: 5.3 kg)

    - Available colors (value: yellow, red, blue)

Pretty much the same can be achieved with the Embedded Content data type by creating a new instance of it, adding 2 properties ‘name’ and ‘value’ to it and adding it to the product node. You will then be able to add as many ‘name-value-pairs’ as you like to the control (as you are able to create as many sub nodes as you want) and sort them to your liking (again same as with the sub nodes). Yet it is all embedded in the product node and it gets all saved as Xml in the backend. You can use a range of basic Umbraco data types for the Embedded Content properties, at the moment these are

  • Textstring
  • Textbox multiple
  • True/false
  • Content picker
  • Media picker
  • Simple editor
  • Date picker

After installing the package you will first have to create a new data type with a name of your choice, and then set it to use to 'Embedded Content'. After saving you will be able to add custom properties to the datatype, you can also edit, delete or re-order them. In the following example I have created a new Embedded Content data type with 5 properties:

EmbeddedContent-DatatypeList

Image 1: An Embedded Content data type instance with 5 properties.

 

EmbeddedContent-DatatypeCustomization

Image 1: Add a new property to the control.

 

Here is a quick overview of the individual options:

Name the name of the property, just for the editor
Alias this is the name of the Xml node when data for this schema gets saved (see below for an example)
Type the type of this property
Description a short description for the purposes of the editor
Show in title? if ticked then this property will be displayed in the content list which the content editor sees

 

EmbeddedContent-DatatypeReorder

Image 3: Re-order the properties

 

Now it is time to use the Embedded Content data type on a document type, create a new node and add some content to the control:

EmbeddedContent-ContentAdd

Image 4: Add a new entry to the list

 

EmbeddedContent-ContentReorder

Image 5: Reorder the content list

 

Here is the Xml data which gets saved to the database:

<productSpecification>
  <data>
    <item id="1">
      <name propertyid="1">Package measurements</name>
      <value propertyid="2">200cm x 15cm x 150cm</value>
      <mediaItem propertyid="3" />
      <contentItem propertyid="4" />
      <validFrom propertyid="5" />
    </item>
    <item id="2">
      <name propertyid="1">Packaged weight</name>
      <value propertyid="2">5.3</value>
      <mediaItem propertyid="3" />
      <contentItem propertyid="4" />
      <validFrom propertyid="5" />
    </item>
    <item id="3">
      <name propertyid="1">Available colors</name>
      <value propertyid="2">yellow, green, blue</value>
      <mediaItem propertyid="3">1062</mediaItem>
      <contentItem propertyid="4">1065</contentItem>
      <validFrom propertyid="5">2011-01-14 00:00</validFrom>
    </item>
  </data>
</productSpecification>

 

You can now easily transform this e.g. with XSL:

EmbeddedContent-SampleXslt

Image 6: A simple xsl:for-each statement to loop through the properties

 

The output on a sample page would then be something like this:

EmbeddedContent-ContentOutput

Image 7: sample output

 

In conclusion this is a really flexible and and highly customizable data type which allows you to quickly add a complex type (so to speak) to a content node, giving the content editor as much flexibility as possible concerning the number of items and ordering. As such I am sure it will come in handy for a broad range of tasks and I am really pleased that we can make this data type available for everyone working with Umbraco. Enjoy!

 

Last but not least a couple of things to note:

  • Please stay clear of ‘;’ in the description when defining properties, it will mess up the property definition string (there is also a warning on the control)
  • Although it is possible to change the properties of an Embedded Content type when it is already in use it should be handled with the utmost care. Changing the order of properties will be absolutely fine, changing the control type of the individual properties will pretty much work like it works with the fundamental Umbraco data types. Everything else is a bit risky to say the least…
Tags:
Categories: .Net | Umbraco

Examine output indexing

by Shannon Deminick 1. November 2010 15:39

Last week Pete Gregory (@pgregorynz) and I were discussing different implementations of Examine. Particularly when you need to use Examine events to collate information from different nodes to put into the index for the page being rendered. An example of this is an FAQ engine where you might have an Umbraco content structure such as:

  • Site Container
    • Public
      • FAQs
        • FAQ Item 1
        • FAQ Item 2
        • FAQ Item 3

In this example, the page that is rendered to the end user is FAQs but the data from all 4 nodes (FAQs, FAQ Item 1 –> 4) needs to be added to the index for the FAQs page. To do this you can use Examine events, either using the GatheringNodeData of the BaseIndexProvider, or by using the DocumentWriting event of the UmbracoContentIndexer (I’ll write another post covering the difference between these two events and why they both exist). Though writing Examine event handlers to put the data from FAQ Item 1 –> 4 into the FAQs index isn’t very difficult, it would still be really cool if all of this could be done automatically.

Pete mentioned it would be cool if we could just index the output html of a page (sort of like Google) and suddenly the ideas started to flow. This concept is actually quite easy to do so within the next month or so we’ll probably release a beta of Examine Output Indexing. Here’s the way it’ll all get put together:

  • An HttpModule will be created to do 2 things:
    • Check if the current request is an Umbraco page request
      • If it is, we can easily get the current node being rendered since it’s already been added to the HttpContext items by Umbraco
      • Use the standard Examine handlers to enter the node’s data into the indexes based on the configuration you’ve specified in your Examine configuration files
    • Get the HTML output of the page before it is rendered to the end user, parse the html to get the relevant data and put it into the index for the current Umbraco page
  • We figured that it would also be cool to have an Examine node property that developers could defined called something like: examineNoIndex which we could check for when we determine that it’s an Umbraco page and if this property is set to true, we’ll not index this page.
    • This could give developers more control over what specific pages shouldn’t be indexed based directly from the CMS properties instead of writing custom events

With the above, a developer will simply need to put the HttpModule in their web.config, define an Examine index based on a new provider we create and that’s it. There will be no need to manually collate node data such as the above FAQ example. However, please note that this will work for straight forward searching so if you have complex searching & indexing requirements, I would still recommend using events since you have far more control over what information is indexed.

Any feedback is much appreciated since we haven’t started developing this quite yet.

Categories: Examine | Umbraco | .Net

Examine v1.0 RTM

by Shannon Deminick 22. October 2010 04:46

We finally released Examine version 1.0 a week or so ago. You can find the latest download package from the CodePlex downloads page for Examine: http://examine.codeplex.com/releases/view/50781 

Here’s what you’ll need to know

  • There are some breaking changes from the version that is shipped with Umbraco 4.5 and also from the Examine RC3 release. The downloads tab on CodePlex contains the Release Notes for download which contains all of the information on upgrading & breaking changes
  • READ THE RELEASE NOTES BEFORE UPGRADING
  • There’s a ton of bugs fixed in this release from the version shipped with Umbraco 4.5
  • Lots of new features have been added:
    • Indexing ANY type of data easily using the LuceneEngine index/search providers
    • PDF Indexing for Umbraco
    • XSLT extensions for Umbraco
    • Data Type declarations for indexed fields
    • Date & Number range searching
  • New documentation has been added to CodePlex

Using v1.0 RTM on Umbraco 4.5

The upgrade process from the Examine version shipped with 4.5 to v1.0 RTM should be pretty seamless (unless you are using some specific API calls as noted in the release notes). However, once you drop in the new DLLs you’ll probably notice that the internal search no longer works. This is due to a bug in the Umbraco 4.5. codebase and an non-optimal implementation of Examine which has to do with case sensitivity for application aliases (i.e. Content vs content ). The work-around is simple though: all we need to do is change the Analyzer used for the internal searcher in the Examine configuration file to use the StandardAnalyzer instead of the WhitespaceAnalyzer. This is because the WhitespaceAnalyzer is case sensitive whereas the StandardAnalyzer is not. This issue is fixed in Umbraco Juno (4.6) and will continue to use the WhitespaceAnalyzer so that Examine doesn’t tokenize strings that contain punctuation. For more info on Analyzers, have a look at Aaron’s post.

Next Versions

There probably won’t be too many more changes coming for Examine v1.0 apart from any bug fixing that needs to be done and maybe some tweaks to the Fluent API. We will start working on v2.0 at some point this year or early next year which will take Examine to the next level. It will be less focused on configuration, have a smaller foot print and be much more configurable through code (such as how ASP.Net MVC works).

Categories: Examine | Umbraco

Searching Multi-Node Tree Picker data (or any collection) with Examine

by Aaron Powell 22. September 2010 11:10

With the release of uComponents recently a lot of people are starting to work with a new data type called the MultiNodeTreePicker, and with this I’ve seen a few questions around searching the data it generates using Examine.

The problem is there is a catch, if you’re using the CSV storage (which you must if you’re working with Examine) you’ll hit a problem, the Examine index will have something like this:

1011,1231,1232,1225

But how do you search on that? Searching for ‘1231’ will not return anything, because it’s prefixed with ‘,’ and postfixed with ‘,’. So this brings a problem, how do you search?

Bring on Events

As Shannon spoke about at CodeGarden 10 Examine has a number of different events you can hook into to do different things (slides and code) and this is what we’re going to need to work with.

I’ve touched on events before but this time we’re going to look at a different event, we’re going to look at the GatheringNodeData event.

GatheringNodeData event

So this event in Examine is fired while Examine is scraping the data out of an XML element which it has received. This XML could be from Umbraco (in the scenario we’re looking at here) or it could be from your own data source, and the event is raised once Examine as turned the XML into a Key/ Value representation of it.

The event that raises has custom event arguments, which has a property called Fields. This Fields property is a dictionary which contains the full Key/ Value representation of the data which will end up in Examine!

Now this dictionary is able to be manipulated, so you can add/ remove data as you see if (but that’s a topic for another blog), it also means you can change the data!

Changing the data for our needs

As I mentioned at the start of this we end up with comma-separated string from the datatype which isn’t useful for searching, so we can use an event handler to change what we’ve got. First we need to tie an event handler

public class ExamineEvents : ApplicationBase 
{
	public ExamineEvents() 
	{
		var indexer = ExamineManager.Instance.IndexProviderCollection["MyIndexer"];
		indexer.GatheringNodeData += new EventHandler(GatheringNodeDataHandler);
	}

	void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
	{
		//do stuff here
	}
}

So this is just a simple wire-up, using the ApplicationBase class in Umbraco so that it’ll be created on application start-up. Next we need to implement the event handler:

void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
{
	//grab the current data from the Fields collection
	var mntp = e.Fields["TreePicker"];
	//let's get rid of those commas!
	mntp = mntp.Replace(",", " ");
	//now put it back into the Fields so we can pretend nothing happened!
	e.Fields["TreePicker"] = mntp;
}

And you’re done! Now the data will be written into the index with spaces rather than commas meaning that you can search on each ID without the need for wildcards or any other “hacks” to get it to work.

Note: This will work in the majority of cases, the only reason it’ll fail is if you’re using an analyzer that strips out numbers before indexing. For more information about Lucene analyzers take a look at this article: http://www.aaron-powell.com/lucene-analyzer

Text casing and Examine

by Aaron Powell 23. August 2010 15:41

A few times I’ve seen questions posted on the Umbraco forums which ask how to deal with case insensitivity text with Examine, and it’s also something that we’ve had to handle a few times within our own company.

Here’s a scenario:

  • You have a site search
  • You use examine
  • You want to show the results looking exactly the same as it was before it went into Examine

If you’re running a standard install you’ll notice that the content always ends up lowercased!

This is a bit of a problem, page titles will be lowercase, body content will be lowercase, etc. Part of this will be due to a mistake in Examine, part of it is due to the design of Lucene.

In this article I’ll have a look at what you need to do to make it work as you’d expect.

First, some background

Before we dive directly into what to do to fix it you really should understand what is happening. If you don’t care feel free to skip over this bit though :P.

Searching is a tricky thing, and when searching the statement Examine == examine = false; To get around this searching is best done in a case insensitive manner. To make this work Examine did a forced lowercase of the content before it was pushed into Lucene.Net. This was to ensure that everything was exactly the same when it was searched against.
In hindsight this is not really a great idea, it really should be the responsibility of the Lucene Analyzer to handle this for you.

Many of the common Lucene.Net analyzers actually do automatic lowercasing of content, these analysers are:

  • StandardAnalyzer
  • StopAnalyzer
  • SimpleAnalyzer

So if you’re using the standard Examine config you’ll find yourself using the StandardAnalyzer and still have your content lowercased.

This means that there’s no need to Lucene to concern itself about case sensitivity when searching, everything is parsed by the analyzer (field terms and queries) and you’ll get more matches.

So how do I get around this?

Now that we’ve seen why all your content is generally lower case, how can we work with it in the original format and display it back to the UI?

Well we need some way in which we can have the field data stored without the analyzer screwing around with it.

Note: This doesn’t need to be done if you’re using an analyzer which doesn’t have a LowerCaseTokenizer or LowercaseFilter. If you’re using a different analyzer, like KeywordAnalyzer then this post wont cover what you’re after (since the KeywordAnalyzer isn’t lowercasing, you’re actually using an out-dated version of Examine, I recommend you grab the latest release :)). More information on Analyzers can be found at http://www.aaron-powell.com/lucene-analyzer

Luckily we’ve got some hooks into Examine to allow us to do what we need here, it’s in the form of an event on the Examine.LuceneEngine.Providers.LuceneIndexer, called DocumentWriting. Note that this event is on the LuceneIndexer, not the BaseIndexProvider. This event is Lucene.Net specific and not logical on the base class which is agnostic of any other framework.

What we can do with this event is interact directly with Lucene.Net while Examine is working with it.
You’ll need to have a bit of an understanding of how to work with a Lucene.Net Document (and for that I’d recommend having a read of this article from me: http://www.aaron-powell.com/documents-in-lucene-net), cuz what you’re able to do is play with Lucene.Net… Feel the power!

So we can attach the event handler the same way as you would do any other event in Umbraco, using an Action Handler:

public class UmbracoEvents : ApplicationBase
{
	public UmbracoEvents()
        {
            var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["DefaultIndexer"];

            indexer.DocumentWriting +=new System.EventHandler(indexer_DocumentWriting);
        }
}

To do this we’ve got to cast the indexer so we’ve got the Lucene version to work with, then we’re attaching to our event handler. Let’s have a look at the event handler

void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
	//grab out lucene document from the event arguments
	var doc = e.Document;

	//the e.Fields dictionary is all the fields which are about to be inserted into Lucene.Net
	//we'll grab out the "bodyContent" one, if there is one to be indexed
	if(e.Fields.ContainsKey("bodyContent")) 
	{
		string content = e.Fields["bodyContent"];
		//Give the field a name which you'll be able to easily remember
		//also, we're telling Lucene to just put this data in, nothing more
		doc.Add(new Field("__bodyContent", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
	}
}

And that’s how you can push data in. I’d recommend that you do a conditional check to ensure that the property you’re looking for does exist in the Fields property of the event args, unless you’re 100% sure that it appears on all the objects which you’re indexing.

Lastly we need to display that on the UI, well it’s easy, rather accessing the bodyContent property of the SearchResults, use the __bodyContent and you’ll get your unanalyzed version.

Conclusion

Here we’ve looked at how we can use the Examine events to interact with the Lucene.Net Document. We’ve decided that we want to push in unanalyzed text, but you could use this idea to really tweak your Lucene.Net document. But really playing with the Document is not recommended unless you *really* know what you’re doing ;).

Categories: .Net | Examine | Umbraco