Monday, March 25, 2013

Create Web Forms for Marketers Custom Save Action and Edit Screen

I was recently working on a project where I needed to create a custom save action to add to my Web Forms for Marketers module.  I needed a custom save action to push the data to salesforce and I also needed a custom edit screen so the author could setup some configuration values the action needed. Here are the details of what I did starting with the save action.

Save Action

The first thing you need to do is create a new class that inherits the “ISaveAction” interface (Sitecore.Form.Submit.ISaveAction) and implement the Execute method.

public class SalesforceSaveAction : ISaveAction
    public string FormKey { get; set; }
    public string FieldsToSend { get; set; }
    void ISaveAction.Execute(ID formid, AdaptedResultList fields, params object[] data)
        // Code to execute here

That is really all you need. Now it all becomes about custom code and configuration.  To configure the save action to show up you need to go to Modules –> WFM –> Settings –> Save action (master DB). Right click on save actions and select new save action. Give the item the name of the new save action you want to see in the list. In the assembly line put the name of the assembly your code shows up in, and in the Class put the name of the class with the namespace declaration. Don’t pay attention to the “Editor” section just yet. We will come back to that.


Once you have this step done you can now set up your save action via the form designer. Select the new action in the drop down and click add.


That is it. You now have a new custom save action. You can use the AdaptedResultList parameter object to access all the fields on the form and work with them. If all you want to do is work with the form values you are set. However, if you also need to allow the editor to provide some info to the action you need to also create an edit screen.  

Edit Screen

Creating and configuring the edit screen will make it so the “edit” button on the right will be enabled when you select your custom save action.  There are two parts to doing this. First you have to create the Sheer UI and then you have to create the code that Sheer UI calls. I started building this out from Sitecore’s “My First Sheer UI Application” example but found it only a somewhat helpful. Since I am not really creating an application it was not really addressing what I wanted to do. The XML section was helpful in understand what controls I can declare in the XML to create the UI. The first step is really just creating your XML file that holds that XAML that makes up your UI. I created a “SalesforceEditor.xml” file defined like this:

<?xml version="1.0" encoding="utf-8" ?>
<control xmlns=""
    <FormDialog ID="Dialog" Icon="Applications/32x32/gear.png"
      Header="Salesforce Web-to-Lead"
      Text="Define the OID for your Salesfoce form and define your mappings.">
      <CodeBeside Type="Web.Common.SalesforceWTLEditor,Web.Common"/>
      <GridPanel Class="scfContent" ID="MainGrid" Columns="2" Rows="5"
        Margin="20 25 25 15" Width="100%" Border="1">
        <Literal Text="Form Key:" GridPanel.Align="left" GridPanel.Column="0" GridPanel.Row="1"/>
        <Edit ID="FormKey" GridPanel.Column="1" GridPanel.Row="1"></Edit>

All the fields are standard XAML controls. The key field to note is the “CodeBeside” element. This points the UI to where the codebeside file is that will execute against the XAML class. For our simple example we are just trying to create an edit screen that looks like this. This gives the editor a simple place to create an associated Salesforce key for each form they is generated.


Once we have the XML file we need to tell Sitecore about it. This happens in few places.

First you need to tell Sitecore about the layout you just created (the XML file). In the Sitecore “core” database you need to setup your page, under Sitecore –> layout –>layouts –> layouts, right click and add new item from template “/sitecore/templates/System/Layout/Xml layout.” Name it what you want and set the path to the actual physical location of where your xml file is.

Second, in the Sitecore “core” database you need to setup your page, under Sitecore –> content –> Applications –> Dialogs right click and add application.  Per the below picture give it a name and a icon. It can be whatever you want. Once this is done you want to go to the “presentation” ribbon and select the “Details” icon in the “layouts” section. Here you will select “edit” and select the Sitecore layout you created in the previous step.


At this point you are done with the Sitecore configuration except for one thing. Earlier I told you about one field in the custom save action screen in Sitecore to ignore. Here is where it becomes meaningful. In the first image above there is a field called “editor.” By putting “control:SendToWebServiceSettings” we connect the edit button of the custom save action to our new Sheer UI edit screen. NOTE: The “SendToWebServiceSettings” is the name of our main node in the XML file.

Now the last thing. You need to create the code behind file that does all the processing of the XAML form.

Create a class that inherits from DialogForm

class SalesforceWTLEditor : DialogForm
   protected override void OnLoad(EventArgs e)
       if (!Context.ClientPage.IsEvent)
           // Execute your edit page onload code          
   protected override void OnOK(object sender, EventArgs args)
              ParametersUtil.NameValueCollectionToXml(new NameValueCollection() 
            {"FormKey", PersistentFormKeyValue },  
            {"FieldsToSend", string.Join(",", selectedFields)} 
            base.OnOK(sender, args);

The Onload method, as you would expect, fires off when the form is opened. The OnOk is fired off when the user clicks the ok button at the button on the form.  I don’t have all my code here but I have listed the key parts. In the code behind you can access all the field IDs in your XAML. For example I can do MainGrid.Controls to get a list of all controls inside the GridPanel defined in my XML above. I have some SheerResponse code I am setting. This is the key code to setting information in the edit UI that can be accessed via your custom save action. You will note I am setting a NameValue collection for “FormKey” and “FieldsToSend.” The key and the values are just strings. If you take a look at the custom save action class at the top of the post you will see I set two public parameters of “FormKey” and “FieldsToSend.” The SheerResponse.SetDialogValue method pushes my name value collection to those properties. So when my save action fires off I can access FieldsToSend or FormKey and it has the string value of the namevalue collection.

Tuesday, March 19, 2013

Custom Crawler for Parsing PDF files with Sitecore

I recently had to create a crawler in my Sitecore 6.5 project that looked at PDF files in the Media library and called an external API to get a list of PDF files to parse and index. You can do this with Sitecore but the examples for doing this are old and really don’t work any more. It was a bit painful to try and get it all working. It is actually not a hard process, it is just the lack of working examples that made it hard to put all the parts together. I will break this into two parts 1) Create a Customer Crawler 2) Setup PDF indexing. You need to do them both to make PDF indexing happen and both, at least for me had no working examples I could find.

Create a Custom Crawler

For the crawler I started with Sitecore’s documentation (section3.2). It got me started but did not work the way they have it set up, but it does get you introduced to what a crawler is and takes you most the way there.

1) Create the new class

public class FileCrawler : BaseCrawler, ICrawler
2) Implement the two interface methods.
void ICrawler.Add(IndexUpdateContext context)
void ICrawler.Initialize(Index index)

3) Add some properties

public string Root { get; set; }
public string Database { get; set; }

These properties are set via the configuration file you will setup in a moment. Root defines where in the Sitecore tree you want your crawler to start working. Database allows you via your config to change rather the crawler is looking at Web or Master database.

Add is the main entry point to your code. Initialize is called but you may or may not want to do anything in here. For me I did not want to do anything other then a little logging. The Add method is where all my code started. With this you have all the code you need for a custom crawler. You can put a breakpoint somewhere inside the Add method and once you do the next steps you should be able to hit that breakpoint (yes, that means you can attached like normal for Sitecore debugging and debug the crawler).

Here is what my class now looks like:

using Sitecore;
using Sitecore.Search;
using Sitecore.Search.Crawlers;
public class FileCrawler : BaseCrawler, ICrawler
    public string Root { get; set; }
    public string Database { get; set; }
    public float Boost { get; set; }
    long _totalProcessedSize;
    int _fileCount;
    int _successCount;
    int _failureCount;
   void ICrawler.Add(IndexUpdateContext context)
       _fileCount = 0;
       _totalProcessedSize = 0;
       _successCount = 0;
       _failureCount = 0;
       Stopwatch watch = Stopwatch.StartNew();
       Log.Info(string.Format("Finished parsing files -- Total files:{0}(Errors:{1}-Success:{2}) -- {3}m:{4}s:{5}ms -- Total bytes {6}",_fileCount, _failureCount,_successCount, watch.Elapsed.Minutes, watch.Elapsed.Seconds,
           watch.Elapsed.Milliseconds, _totalProcessedSize), this);
   void ICrawler.Initialize(Index index)
       Log.Info("File Crawler Init", this);

Line 19 is where I will get into the PDF part of this post, but for now this is the code for my crawler.

3) Setup your config file

In Sitecore’s documentation they tell you to create a FileCrawler.config file. If you don’t already have a file in your project that holds information about custom indexing you will need to set this new config file up. If you already have one for this purpose you can just add a new index or area inside a location attribute in that file (these are located in <website>/app_config/include). Using the details of the config setup they provided I ran into all types of issues getting errors like “AddIndex method not found” or “Add method not found.” Here is what I set up to get it working.

        <index patch:after="index[@id='system']" id="MyIndexName" type="Sitecore.Search.Index, Sitecore.Kernel">
            <param desc="name">$(id)</param>
            <param desc="folder">MyIndexName</param>
            <Analyzer ref="search/analyzer" />
            <locations hint="list:AddCrawler">
                <tqsFiles type="MyNamespace.FileCrawler, MyNamespace">
                    <Root>/sitecore/media library/files/resources</Root>

Once you have this you should now be able to go into Sitecore –> control panel –> databases –> Rebuild Search Indexes and see your new index (“MyIndexName”). If you see your new index you can attached to the w3w process and put a breakpoint in the Add method. When you have your breakpoint ready to go make sure your new index is checked and click “rebuild.” You should hit your breakpoint. That is it, you can now create whatever custom code you want in here using your database and root properties to know where to look for the data. The context item passed into the “Add” method is where you create or add new documents which are added to the index. Just make sure you do “context.AddDocument().” Without this the index will never get updated with your information.

Setup PDF Indexing

Now lets setup some code that will grab all the files in the media library and index any PDF file it finds. Again I was able to find an old Sitecore document (section 2.3 and chapter 5. Chapter 5 provides imagea link to some old open source libraries you will need, but they are old. Updated libraries can be found for PDFBox here.) on this subject that got me started but it did not work on its own. I will not bore you will all my code here, just the important methods for PDF parsing.

First, download the zip file from the link above for PDFBox. When you unzip the file make sure you unblock the files, if not you will get errors when trying to build. You will need to add these dlls as references to your project. The zip file comes with a lot of dlls and I am not sure when each one is needed. Some are called and loaded at runtime, though they are not needed at build time, but add them to your bin folder.

Once you have the dlls and reference set up you are ready for the main methods. I will touch on two methods here. The ParsePDF method does what you would think. This actually takes the string from a media item and parses it. The AddPDFContent takes Lucence.Net Document object and adds the index fields to the document.

protected virtual void AddPDFContent(Document document, MediaItem media)
   _totalProcessedSize += media.Size;
   if (media.GetMediaStream().CanRead)
       document.Add(this.CreateTextField(BuiltinFields.Content, this.ParsePDF(media.GetMediaStream(), media.Name)));
       document.Add(this.CreateTextField(BuiltinFields.Name, media.Name));
private string ParsePDF(Stream mediaStream, string fileName)
   PDDocument doc = null; wrapper = null;
       Stream stream = mediaStream;
       wrapper = new;
       doc = PDDocument.load(wrapper);
       PDFTextStripper stripper = new PDFTextStripper();
       var docText = stripper.getText(doc);
       return docText;
   catch (Exception Ex)
       Log.Error("Error parsing " + fileName + " for indexing", Ex, this.GetType());
       return String.Empty;
       if ((doc != null) && (wrapper != null))

The ParsePDF method does the work of reading the stream from the PDF file and getting the text from it. Then it just returns that string to the AddPDFContent method which puts that string in the BuiltinFields.Content field (when looking at the index this is the “_content” field).

You can added a TextField (this.CreateTextField) or a DataField (this.CreateDataField) to the document. The text fields are used by the index to find hits and the data fields are used so you can programmatically access information about the document if a hit is found. So if you want a piece of data to be accessible to both the index and programmatically accessible you will want to add a textfield and a datafield for that value.

That is it. Now just pass in the root path to where your PDF files are, get the media stream from those files and call these methods.

After I had finished my coding I finally did find a good example on code project. So hopefully between this post and that post you can get what you need.

Share this post :