Searching Umbraco using Razor and Examine

by Shannon Deminick 15. March 2011 05:51

Since Razor is really just c# it’s super simple to run a search in Umbraco using Razor and Examine.  In MVC the actual searching should be left up to the controller to give the search results to your view, but in Umbraco 4.6 + , Razor is used as macros which actually ‘do stuff’. Here’s how incredibly simple it is to do a search:

@using Examine; @* Get the search term from query string *@ @{var searchTerm = Request.QueryString["search"];} <ul class="search-results"> @foreach (var result in ExamineManager.Instance.Search(searchTerm, true)) { <li> <span>@result.Score</span> <a href="@umbraco.library.NiceUrl(result.Id)"> @result.Fields["nodeName"] </a> </li> } </ul>

That’s it! Pretty darn easy.

And for all you sceptics who think there’s too much configuration involved to setup Examine, configuring Examine requires 3 lines of code. Yes its true, 3 lines, that’s it. Here’s the bare minimum setup:

1. Create an indexer under the ExamineIndexProviders section:

<add name="MyIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>

2. Create a searcher under the ExamineSearchProviders section:

<add name="MySearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>

3. Create an index set under the ExamineLuceneIndexSets config section:

<IndexSet SetName="MyIndexSet" IndexPath="~/App_Data/TEMP/MyIndex" />

This will index all of your data in Umbraco and allow you to search against all of it. If you want to search on specific subsets, you can use the FluentAPI to search and of course if you want to modify your index, there’s much more you can do with the config if you like.

With Examine the sky is the limit, you can have an incredibly simple index and search mechanism up to an incredibly complex index with event handlers, etc… and a very complex search with fuzzy logic, proximity searches, etc…  And no matter what flavour you choose it is guaranteed to be VERY fast and doesn’t matter how much data you’re searching against.

I also STRONGLY advise you to use the latest release on CodePlex: http://examine.codeplex.com/releases/view/50781 . There will also be version 1.1 coming out very soon.

Enjoy!

Categories: Umbraco | Examine | .Net

Examine output indexing

by Shannon Deminick 1. November 2010 15:39

Last week Pete Gregory (@pgregorynz) and I were discussing different implementations of Examine. Particularly when you need to use Examine events to collate information from different nodes to put into the index for the page being rendered. An example of this is an FAQ engine where you might have an Umbraco content structure such as:

  • Site Container
    • Public
      • FAQs
        • FAQ Item 1
        • FAQ Item 2
        • FAQ Item 3

In this example, the page that is rendered to the end user is FAQs but the data from all 4 nodes (FAQs, FAQ Item 1 –> 4) needs to be added to the index for the FAQs page. To do this you can use Examine events, either using the GatheringNodeData of the BaseIndexProvider, or by using the DocumentWriting event of the UmbracoContentIndexer (I’ll write another post covering the difference between these two events and why they both exist). Though writing Examine event handlers to put the data from FAQ Item 1 –> 4 into the FAQs index isn’t very difficult, it would still be really cool if all of this could be done automatically.

Pete mentioned it would be cool if we could just index the output html of a page (sort of like Google) and suddenly the ideas started to flow. This concept is actually quite easy to do so within the next month or so we’ll probably release a beta of Examine Output Indexing. Here’s the way it’ll all get put together:

  • An HttpModule will be created to do 2 things:
    • Check if the current request is an Umbraco page request
      • If it is, we can easily get the current node being rendered since it’s already been added to the HttpContext items by Umbraco
      • Use the standard Examine handlers to enter the node’s data into the indexes based on the configuration you’ve specified in your Examine configuration files
    • Get the HTML output of the page before it is rendered to the end user, parse the html to get the relevant data and put it into the index for the current Umbraco page
  • We figured that it would also be cool to have an Examine node property that developers could defined called something like: examineNoIndex which we could check for when we determine that it’s an Umbraco page and if this property is set to true, we’ll not index this page.
    • This could give developers more control over what specific pages shouldn’t be indexed based directly from the CMS properties instead of writing custom events

With the above, a developer will simply need to put the HttpModule in their web.config, define an Examine index based on a new provider we create and that’s it. There will be no need to manually collate node data such as the above FAQ example. However, please note that this will work for straight forward searching so if you have complex searching & indexing requirements, I would still recommend using events since you have far more control over what information is indexed.

Any feedback is much appreciated since we haven’t started developing this quite yet.

Categories: Examine | Umbraco | .Net

Examine v1.0 RTM

by Shannon Deminick 22. October 2010 04:46

We finally released Examine version 1.0 a week or so ago. You can find the latest download package from the CodePlex downloads page for Examine: http://examine.codeplex.com/releases/view/50781 

Here’s what you’ll need to know

  • There are some breaking changes from the version that is shipped with Umbraco 4.5 and also from the Examine RC3 release. The downloads tab on CodePlex contains the Release Notes for download which contains all of the information on upgrading & breaking changes
  • READ THE RELEASE NOTES BEFORE UPGRADING
  • There’s a ton of bugs fixed in this release from the version shipped with Umbraco 4.5
  • Lots of new features have been added:
    • Indexing ANY type of data easily using the LuceneEngine index/search providers
    • PDF Indexing for Umbraco
    • XSLT extensions for Umbraco
    • Data Type declarations for indexed fields
    • Date & Number range searching
  • New documentation has been added to CodePlex

Using v1.0 RTM on Umbraco 4.5

The upgrade process from the Examine version shipped with 4.5 to v1.0 RTM should be pretty seamless (unless you are using some specific API calls as noted in the release notes). However, once you drop in the new DLLs you’ll probably notice that the internal search no longer works. This is due to a bug in the Umbraco 4.5. codebase and an non-optimal implementation of Examine which has to do with case sensitivity for application aliases (i.e. Content vs content ). The work-around is simple though: all we need to do is change the Analyzer used for the internal searcher in the Examine configuration file to use the StandardAnalyzer instead of the WhitespaceAnalyzer. This is because the WhitespaceAnalyzer is case sensitive whereas the StandardAnalyzer is not. This issue is fixed in Umbraco Juno (4.6) and will continue to use the WhitespaceAnalyzer so that Examine doesn’t tokenize strings that contain punctuation. For more info on Analyzers, have a look at Aaron’s post.

Next Versions

There probably won’t be too many more changes coming for Examine v1.0 apart from any bug fixing that needs to be done and maybe some tweaks to the Fluent API. We will start working on v2.0 at some point this year or early next year which will take Examine to the next level. It will be less focused on configuration, have a smaller foot print and be much more configurable through code (such as how ASP.Net MVC works).

Categories: Examine | Umbraco

Searching Multi-Node Tree Picker data (or any collection) with Examine

by Aaron Powell 22. September 2010 11:10

With the release of uComponents recently a lot of people are starting to work with a new data type called the MultiNodeTreePicker, and with this I’ve seen a few questions around searching the data it generates using Examine.

The problem is there is a catch, if you’re using the CSV storage (which you must if you’re working with Examine) you’ll hit a problem, the Examine index will have something like this:

1011,1231,1232,1225

But how do you search on that? Searching for ‘1231’ will not return anything, because it’s prefixed with ‘,’ and postfixed with ‘,’. So this brings a problem, how do you search?

Bring on Events

As Shannon spoke about at CodeGarden 10 Examine has a number of different events you can hook into to do different things (slides and code) and this is what we’re going to need to work with.

I’ve touched on events before but this time we’re going to look at a different event, we’re going to look at the GatheringNodeData event.

GatheringNodeData event

So this event in Examine is fired while Examine is scraping the data out of an XML element which it has received. This XML could be from Umbraco (in the scenario we’re looking at here) or it could be from your own data source, and the event is raised once Examine as turned the XML into a Key/ Value representation of it.

The event that raises has custom event arguments, which has a property called Fields. This Fields property is a dictionary which contains the full Key/ Value representation of the data which will end up in Examine!

Now this dictionary is able to be manipulated, so you can add/ remove data as you see if (but that’s a topic for another blog), it also means you can change the data!

Changing the data for our needs

As I mentioned at the start of this we end up with comma-separated string from the datatype which isn’t useful for searching, so we can use an event handler to change what we’ve got. First we need to tie an event handler

public class ExamineEvents : ApplicationBase 
{
	public ExamineEvents() 
	{
		var indexer = ExamineManager.Instance.IndexProviderCollection["MyIndexer"];
		indexer.GatheringNodeData += new EventHandler(GatheringNodeDataHandler);
	}

	void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
	{
		//do stuff here
	}
}

So this is just a simple wire-up, using the ApplicationBase class in Umbraco so that it’ll be created on application start-up. Next we need to implement the event handler:

void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
{
	//grab the current data from the Fields collection
	var mntp = e.Fields["TreePicker"];
	//let's get rid of those commas!
	mntp = mntp.Replace(",", " ");
	//now put it back into the Fields so we can pretend nothing happened!
	e.Fields["TreePicker"] = mntp;
}

And you’re done! Now the data will be written into the index with spaces rather than commas meaning that you can search on each ID without the need for wildcards or any other “hacks” to get it to work.

Note: This will work in the majority of cases, the only reason it’ll fail is if you’re using an analyzer that strips out numbers before indexing. For more information about Lucene analyzers take a look at this article: http://www.aaron-powell.com/lucene-analyzer

Text casing and Examine

by Aaron Powell 23. August 2010 15:41

A few times I’ve seen questions posted on the Umbraco forums which ask how to deal with case insensitivity text with Examine, and it’s also something that we’ve had to handle a few times within our own company.

Here’s a scenario:

  • You have a site search
  • You use examine
  • You want to show the results looking exactly the same as it was before it went into Examine

If you’re running a standard install you’ll notice that the content always ends up lowercased!

This is a bit of a problem, page titles will be lowercase, body content will be lowercase, etc. Part of this will be due to a mistake in Examine, part of it is due to the design of Lucene.

In this article I’ll have a look at what you need to do to make it work as you’d expect.

First, some background

Before we dive directly into what to do to fix it you really should understand what is happening. If you don’t care feel free to skip over this bit though :P.

Searching is a tricky thing, and when searching the statement Examine == examine = false; To get around this searching is best done in a case insensitive manner. To make this work Examine did a forced lowercase of the content before it was pushed into Lucene.Net. This was to ensure that everything was exactly the same when it was searched against.
In hindsight this is not really a great idea, it really should be the responsibility of the Lucene Analyzer to handle this for you.

Many of the common Lucene.Net analyzers actually do automatic lowercasing of content, these analysers are:

  • StandardAnalyzer
  • StopAnalyzer
  • SimpleAnalyzer

So if you’re using the standard Examine config you’ll find yourself using the StandardAnalyzer and still have your content lowercased.

This means that there’s no need to Lucene to concern itself about case sensitivity when searching, everything is parsed by the analyzer (field terms and queries) and you’ll get more matches.

So how do I get around this?

Now that we’ve seen why all your content is generally lower case, how can we work with it in the original format and display it back to the UI?

Well we need some way in which we can have the field data stored without the analyzer screwing around with it.

Note: This doesn’t need to be done if you’re using an analyzer which doesn’t have a LowerCaseTokenizer or LowercaseFilter. If you’re using a different analyzer, like KeywordAnalyzer then this post wont cover what you’re after (since the KeywordAnalyzer isn’t lowercasing, you’re actually using an out-dated version of Examine, I recommend you grab the latest release :)). More information on Analyzers can be found at http://www.aaron-powell.com/lucene-analyzer

Luckily we’ve got some hooks into Examine to allow us to do what we need here, it’s in the form of an event on the Examine.LuceneEngine.Providers.LuceneIndexer, called DocumentWriting. Note that this event is on the LuceneIndexer, not the BaseIndexProvider. This event is Lucene.Net specific and not logical on the base class which is agnostic of any other framework.

What we can do with this event is interact directly with Lucene.Net while Examine is working with it.
You’ll need to have a bit of an understanding of how to work with a Lucene.Net Document (and for that I’d recommend having a read of this article from me: http://www.aaron-powell.com/documents-in-lucene-net), cuz what you’re able to do is play with Lucene.Net… Feel the power!

So we can attach the event handler the same way as you would do any other event in Umbraco, using an Action Handler:

public class UmbracoEvents : ApplicationBase
{
	public UmbracoEvents()
        {
            var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["DefaultIndexer"];

            indexer.DocumentWriting +=new System.EventHandler(indexer_DocumentWriting);
        }
}

To do this we’ve got to cast the indexer so we’ve got the Lucene version to work with, then we’re attaching to our event handler. Let’s have a look at the event handler

void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
	//grab out lucene document from the event arguments
	var doc = e.Document;

	//the e.Fields dictionary is all the fields which are about to be inserted into Lucene.Net
	//we'll grab out the "bodyContent" one, if there is one to be indexed
	if(e.Fields.ContainsKey("bodyContent")) 
	{
		string content = e.Fields["bodyContent"];
		//Give the field a name which you'll be able to easily remember
		//also, we're telling Lucene to just put this data in, nothing more
		doc.Add(new Field("__bodyContent", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
	}
}

And that’s how you can push data in. I’d recommend that you do a conditional check to ensure that the property you’re looking for does exist in the Fields property of the event args, unless you’re 100% sure that it appears on all the objects which you’re indexing.

Lastly we need to display that on the UI, well it’s easy, rather accessing the bodyContent property of the SearchResults, use the __bodyContent and you’ll get your unanalyzed version.

Conclusion

Here we’ve looked at how we can use the Examine events to interact with the Lucene.Net Document. We’ve decided that we want to push in unanalyzed text, but you could use this idea to really tweak your Lucene.Net document. But really playing with the Document is not recommended unless you *really* know what you’re doing ;).

Categories: .Net | Examine | Umbraco

Examine RC3 Released

by Shannon Deminick 18. August 2010 18:30

Hopefully this will be a quick RC! I’m really hoping to release v1.0 RTM by early next week (latest). If you are able to help out with some testing it would be amazing!!

Here's what's new:

  • PDF Indexing
  • Easily implement custom data indexing outside of Umbraco
  • More XSLT Extensions for Umbraco
  • Some framework refactoring so a new DLL: Examine.LuceneEngine.dll which contains all of the Lucene.Net implementation
    • Because of this refactoring, if you've built your own providers, you may need to update our code to work, otherwise it is backwards compatible for most people.
  • More unit tests
  • More documentation

Get it while it’s hot! And don’t forget to read the release notes.

DOWNLOAD FROM CODEPLEX HERE

Categories: Examine | Umbraco

Paging with Examine

by Aaron Powell 18. August 2010 13:12

I’ve been asked this question a few times, how do you implement pagination in your Examine search results.

Well a fun-fact is that Lucene doesn’t have really good way to do this, it just involves skipping to the point that you want to get the results from.

Thanks to .NET and LINQ we have extension methods which handles that nicely, Skip and Take, and since that the search results are IEnumerable underneath they can be used. But there’s one problem, doing this will result in a bit of a performance hit, as your initial Skip would hydrate a bunch of entities from the underlying Lucene store, which is where you loose performance.

So we implemented our own version of Skip! So you can just use Skip and Take as standard, and they’ll be evaluated without loosing performance.

Here’s the code:

int pageNumber = string.IsNullOrEmpty(Request.QueryString["pageNum"]) ? 0 : int.Parse(Request.QueryString["pageNum"]);
int pageSize = 10; //this could be a config value or something

/* serach snipped */

var results = searcher.Search(...);

//And we'll just set a repeater with the results
Repeater.DataSource = results.Skip(pageNumber * pageSize).Take(pageSize);
Repeater.DataBind();

It's just that easy.

Categories: .Net | Examine | Umbraco

How to build a search query in Examine

by Aaron Powell 12. August 2010 12:14

Now that Examine is able to be used by a wider audience than Umbraco understanding how search works is possibly a bit more important ;).

So today while answering a question on the Umbraco forum I thought that what I was going on about is something that more people might want to hear about. And really, I do like the sound of my own (virtual) voice…

Understanding Lucene.Net from Examine

For this I’m going to be looking at the Lucene.Net implementation of Examine, and this is agnostic of whether you’re using UmbracoExamine or Examine.LuceneEngine.

To get started you should familiarize yourself a bit with the Lucene Query Parser Syntax, as that’s what we’re using internally to get the data back from Lucene.Net.

Also, we’re going to be working with the Fluent API for Examine, so fell free to read up on that here and here. With Examine we’ve made it easy to see what the Lucene query you’re building up is as you’re working with the Fluent API, in fact if you to a .ToString() call on the ISearchCriteria instance you’ll be able to see what you’re search query looks like (we’ve also got some other information in the result of that method call too).

So you can see what’s been generated, let’s build a query and dissect it:

var criteria = searcher.CreateSearchCriteria(IndexTypes.Content);
criteria = criteria.NodeName("Hello").And().NodeTypeAlias("world").Compile();

Console.WriteLine(criteria.ToString()); //+(+nodeName:Hello +nodeTypeAlias:world) +__IndexType:content

So this is what we've got, a total of three conditions… But wait, there’s only two conditions that I entered right?

Not quite, Examine has some smarts built into it around what you’re searching on and it’ll add that restriction on your behalf. This is so you don’t get results back from a different index type in your query. Note: If you don’t specify the IndexType then this condition wont be added for your.

Also, to ensure that all your entered queries don’t get killed by the IndexType (a problem in earlier Examine builds) we combine everything you enter into a GroupedAnd statement.

Let’s have a look at the parts we did request, and how they are comprised:

+nodeName:Hello +nodeTypeAlias:world

Ok, so what we've got here is our two conditions which we've added using Examine. Each is an AND (or in Lucene terminology SHOULD) and this is denoted by the +. Next we have a field name, in this case either nodeName or nodeTypeAlias. Youl’ll notice back in our Fluent API query we actually generated all of that using the build in methods, rather than having to use more magic strings. Next there is a : to indicate where the field name ends. Lastly we have the term which we’re going to search against, either Hello or world.

So essentially it’s built up of BOOLEAN_OPERATION&FieldName:Term (the & is so it’s slightly readable).

Conclusion

This was a brief look at how the Fluent API for Examine will turn your typed query into a Lucene query that is then searching.

With this knowledge you should be better able to design complex queries, use mixed conditionals and just plain go crazy.

Categories: .Net | Examine | Umbraco

Using Examine to index & search with ANY data source

by Shannon Deminick 10. August 2010 10:38

During CodeGarden 2010 a few people were asking how to use Examine to index and search on data from any data source such as custom database tables, etc… Previously, the only way to do this was to override the Umbraco Examine indexing provider, remove the Umbraco functionality embedded in there, and then do a lot of coding yourself.  …But now there’s some great news! As of now you can use all of the Examine goodness with it’s embedded Lucene.Net with any data source and you can do it VERY easily.

Some things you need to know about the new version:

  1. I haven’t made a release version of this yet as it still needs some more testing, though we are putting this into a production site next week.
  2. If you want to try this, currently you’ll need to get the latest source from Examine @ CodePlex
  3. If you are using a previous version of Examine, there’s a few breaking changes as some of the class structures have been moved, however you config file should still work as is… HOWEVER, you should update your config file to reflect the new one with the new class names
  4. There is now 3 DLLs, not just 2:
    • Examine.DLL
      • Still pretty much the same… contains the abstraction layer
    • Examine.LuceneEngine.DLL
      • The new DLL to use to work with data that is not Umbraco specific
    • UmbracoExamine.DLL
      • The DLL that the Umbraco providers are in

Ok, now on to the good stuff. First, I’ve added a demo project to this post which you can download HERE. This project is a simple console app that contains a sample XML data file that has 5 records in it. Here’s what the app does:

  1. This re-indexes all data
  2. Searches the index for node id 1
  3. Ensures one record is found in the index
  4. Updates the dateUpdated time stamp for the data record
  5. Re-indexes the record with node id 1’

So assuming that you have some custom data like a custom database table, xml file, or whatever, there’s really only 3 things that you need to do to get Examine indexing your custom data:

  1. Create your own ISimpleDataService
    • There is only 1 method to implement: IEnumerable<SimpleDataSet> GetAllData(string indexType)
    • This is the method that Examine will call to re-index your data
    • A SimpleDataSet is a simple object containing a Dictionary<string, string> and a IndexedNode object (which consists of a Node Id and a Node Type)
    • For example, if you had a database row, your SimpleDataSet object for the row would be the dictionary of the rows values, it’s node id and type … easy.
  2. Use the ToExamineXml() extension method to re-index individual nodes/records
    • Examine relies on data being in the same XML structure as Umbraco (which we might change in version 2 sometime in the future… like next year) so we need to transform simple data into the XML structure. We’ve made this quite easy for you; all you have to do is get the data from your custom data source into a Dictionary<string, string> object and use this extension method to pass the xml structure in to Examine’s ReIndexNode method.
    • For example: ExamineManager.Instance.ReIndexNode(dataSet.ToExamineXml(dataSet["Id"], "CustomData"), "CustomData");  where dataSet is a Dictionary<string, string> .
  3. Update your Examine config to use the new SimpleDataIndexer index provider and the new LuceneSearcher search provider

If you’re not using Umbraco at all, then you’ll only need to have the 2 Examine DLLs which don’t reference the Umbraco DLLs whatsoever so everything is decoupled.

I’d recommend downloading the demo app and running it as it will show you everything you need to know on how to get Examine running with custom data. However, i know that people just like to see code in blog posts, so here’s the config for the demo app:

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="Examine" type="Examine.Config.ExamineSettings, Examine"/> <section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine.LuceneEngine"/> </configSections> <Examine> <ExamineIndexProviders> <providers> <!-- Define the indexer for our custom data. Since we're only indexing one type of data, there's only 1 indexType specified: 'CustomData', however if you have more than one type of index (i.e. Media, Content) then you just need to list them as a comma seperated list without spaces. The dataService is how Examine queries whatever data source you have, in this case it's a custom data service defined in this project. A custom data service only has to implement one method... very easy. --> <add name="CustomIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine.LuceneEngine" dataService="ExamineDemo.CustomDataService, ExamineDemo" indexTypes="CustomData" runAsync="false"/> </providers> </ExamineIndexProviders> <ExamineSearchProviders defaultProvider="CustomSearcher"> <providers> <!-- A search provider that can query a lucene index, no other work is required here --> <add name="CustomSearcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine.LuceneEngine" /> </providers> </ExamineSearchProviders> </Examine> <ExamineLuceneIndexSets> <!-- Create an index set to hold the data for our index --> <IndexSet SetName="CustomIndexSet" IndexPath="App_Data\CustomIndexSet"> <IndexUserFields> <add Name="name" /> <add Name="description" /> <add Name="dateUpdated" /> </IndexUserFields> </IndexSet> </ExamineLuceneIndexSets> </configuration>
Categories: .Net | Examine | Umbraco

Examine demo site source code from CodeGarden 2010

by Shannon Deminick 1. July 2010 17:49

A few people were asking for the source code from my Examine presentation at CodeGarden, so here it is. I’m not going to go in to all of the details of this site or the Examine config as it’s pretty simple. However, i will give you a very quick run down of it and if you attended CodeGarden and my presentation, you’d probably already know this.

The Umbraco config for this demo site is simple: A search form, a couple of search result pages with different templates (results using XSLT extensions, results to query media using the FluentAPI, custom results using the FluentAPI). Then there’s the content: 5 very simple nodes consisting of a text field and a numeric field and a miniature blog with some posts and comments.

I’ve included all of the source files and a backup of the database that it was running on. The source files are left in the same state as we left the demo during the presentation. So to get it up and running, just restore the database to your MS SQL server, update your web.config, and put the project files into IIS (or open the solution in Visual Studio).

Download here

Categories: .Net | Examine | Umbraco