How to use Zend_Search_Lucene with the PHP framework CodeIgniter

If you’ve heard the buzz about Apache’s open source search engine, Lucene, then you probably already know what a great search engine tool it is.  The search engine is fast, has ports to various languages, and was written to be able to share the search index between the different Lucene ports.

The PHP version of Lucene is packaged in the Zend frameworkand is called Zend_Search_Lucene.  When it comes to PHP frameworks, I tend to prefer using CodeIgniteras opposed to Zend.  So, you might ask, how can you use a favored framework such as CodeIgniter with the power of Lucene’s search capabilities?

Install CodeIgniter 1.7.1

I downloaded a copy of the latest version of CodeIgniter 1.7.1 and configured it to run the default welcome action.  Next, I made a copy of the welcome controller and view to test my indexer and search actions (which we’ll get to in just a minute).

Install Zend Framework 1.7

Next, I downloaded the latest version of the Zend Framework 1.7.5.  After extracting the zip file, copy the Zend folder inside ZendFramework-1.7.5/library and paste it into the CodeIgniter framework under  System/application/libraries.

Create A Zend Loader Class
The next thing that needs to be done is create a loader file to load the Zend library classes in CodeIgniter.  This tutorial also explains how to create this loader class.  This file below is named Zend.php and should be located in the System/application/libraries folder of CodeIgniter.

 <?php if (!defined('BASEPATH')) exit('No direct script access allowed');
class CI_Zend
{  
    function __construct($class = NULL)  
    {   
        // include path for Zend Framework   
        // alter it accordingly if you have put the 'Zend' folder elsewhere   
        ini_set('include_path',   ini_get('include_path') .
        PATH_SEPARATOR . APPPATH . 'libraries');

        if ($class)   
        {    
            require_once (string) $class . EXT;
            log_message('debug', "Zend Class $class Loaded");
        }
        else
        {    
            log_message('debug', "Zend Class Initialized");
        }  
    }

    function load($class)  
    {   
        require_once (string) $class . EXT;   
        log_message('debug', "Zend Class $class Loaded");  
    }
}
//End of File: Zend.php

Creating An Indexer

Now we will create the search index.  For demonstration purposes, I’m going to place the indexer and search functions in the same controller.  You should have your indexer in a separate controller with security that will keep everyone from being able to run it.

We’ll start with the copy of the welcome controller, which I named home.php.  After changing the class name and function calls to home instead of welcome, the contents of the file should look like this.  Also, add the sanitize function below.

 <?php
class Home extends Controller
{  
    function Home()  
    {
        parent::Controller();
    }    

    function index()  
    {   
        $this->load->view('home_view');  
    }

    function sanitize($input)
    {
        return htmlentities(strip_tags($input));
    }
}
/* End of file home.php */
/* Location: ./system/application/controllers/home.php */

Now, we can just replace the contents of the index() function with the following.

$this->load->library('zend', 'Zend/Feed');   
$this->load->library('zend', 'Zend/Search/Lucene');   
$this->load->library('zend');   
$this->zend->load('Zend/Feed');   
$this->zend->load('Zend/Search/Lucene');     

//Create index.   
$index = new Zend_Search_Lucene('c:\wamp\www\ci\tmp\feeds_index', true);      
$feeds = array(    
    'http://www.cmjackson.net/feed/rss/',    
    'http://andrewmjackson.com/feed/rss');       

//grab each feed.   
foreach($feeds as $feed)   
{    
    $channel = Zend_Feed::import($feed);    
    echo $channel->title().'<br />';        

    //index each item.    
    foreach($channel->items as $item)    
    {     
        if ($item->link() && $item->title() && $item->description())     
        {      
            //create an index doc.      
            $doc = new Zend_Search_Lucene_Document();            
            $doc->addField(Zend_Search_Lucene_Field::Keyword(
                'link', $this->sanitize($item->link())));      
            $doc->addField(Zend_Search_Lucene_Field::Text(
                'title', $this->sanitize($item->title())));      
            $doc->addField(Zend_Search_Lucene_Field::Unstored(
                'contents', $this->sanitize($item->description())));            

            echo "\tAdding: ". $item->title() .'<br />';      
            $index->addDocument($doc);     
        }    
    }   
}      

$index->commit();      
echo $index->count() .' Documents indexed.<br />';

This indexer will read in the RSS feeds from this website as well as my brother’swebsite and index the contents of the feed.  When the index is created, you must specify a location to store the index.  These are binary files that Lucene creates and it does not require a database for storage.

In this article, the author further explains the fields of the index document and when each should be used.

Feeds are not the only resource that Lucene can index.  Web sites, databases, Microsoft Office documents, etc.  Find out more information on Zend Search Lucene in the Zend Framework Manual .

A Basic Search

After running the indexer, you are ready to try searching the documents that are indexed.  For demonstration purposes, I’ve added another function to the same controller as the index called search().  This function does not get the results of a form, but instead simulates a string query as if it were from a form.

function search()  
{   
    $this->load->library('zend', 'Zend/Search/Lucene');   
    $this->load->library('zend');   
    $this->zend->load('Zend/Search/Lucene');      

    $index = new Zend_Search_Lucene('c:\wamp\www\ci\tmp\feeds_index');      

    $query = 'new movie';      
    $hits = $index->find($query);      

    echo 'Index contains '. $index->count() .
        ' documents.<br /><br />';   
    echo 'Search for "'. $query .'" returned '. count($hits) .
        ' hits<br /><br />';      

    foreach($hits as $hit)   
    {    
        echo $hit->title .'<br />';    
        echo 'Score: '. sprintf('%.2f', $hit->score) .'<br />';    
        echo $hit->link .'<br /><br />';   
    }    
}

This function loads the same index that we previously created and searches for the key phrase ‘new movie’.  The results that are returned are sorted by their score ranking.  To make the search results look more like google, styling could be added  as well as formatting the result entries, but this gives you a good idea of the basic functions of the search engine and how it works.

You may also like...

33 Responses

  1. How soon will you update your blog? I’m interested in reading some more information on this issue.

  2. I try to update my blog when I have something interesting to share. If you have something specific you would like to know about, let me know and I will help if I can.

  3. yunus says:

    How will update index data?

  4. You would recreate the index each time you wish to update it. You could setup the indexing request on a schedule using cron jobs or even run it manually. Another option would be to run indexing before searching (if it has been a while since the last indexing), but this may take some time and may hurt your user’s experience with your search.

  5. chalalaz says:

    Good job. thank you.

  6. Joe Song says:

    Nice. Very clear, easy to see how to expand on it.

  7. Can u post about using zend_ldap with codeigniter, im trying to do it but dont work for me, please give me a help.
    Reggards from Cuba,
    Dairon

  8. subbu says:

    Great. Thanks a lot.

  9. Sorry, I haven’t used zend_ldap yet. If I do find an opportunity to use it, I’ll be sure to post something. I’m sure there are tons of tutorials that can be found on google.

  10. jon matt says:

    i am using codeigniter framework in my site.
    i have only 3 fields “website”, “data”,”tags” in mysql database

    do you suggest to use this Zend_Search_Lucene search? i read that this will slow down the page. is there a better solution that you can suggest for me?

  11. I’m not sure what would slow down the page? The indexer will run slow, but you don’t have your users run the indexer. Instead, setup a cronjob or something to schedule it to run. Users should only run the code that queries the search index (such as under the Basic Search section).

  12. Axelline says:

    Thank you so much. So there’s no worry about Lucene. I thought, Lucene can only make at Java. And then I should connecting Java and PHP with Javabridge, but I don’t know how to implement it. Finally I found Zend, and it makes Lucene easy to use. How stupid I’m…

  13. Brian says:

    Thank you, this has really helped me out!

    My next challenge is to paginate the results, ideally via the codeigniter Pagination class (as opposed to the Zend Pagination class). Any suggestions?

  14. @Brian
    Sorry, I have not messed with CodeIgniter, or PHP for that matter, in over a year now. I would search around and see what you can find on the subject.

    I have been working with a pagination class in c# that should be similar to what you need to do in PHP.

    Basically, you need to create a class that stores a collection of objects. You can set other properties on the class like Total_Items, Page_Index, Page_Size, etc to tell how many records there are in total and which page you’re currently looking at.

    Then, just use SQL to read in only the records you want to display and count the total number of records.

  15. saad says:

    nice article. I have to try it asap. I have been looking into creating a small search engine (which can potentially grow larger), saw the elasticsearch built on lucene. But, first, I will work simpl-lucene. My site is built on CI. The thing I am interested in is, the sentences are untagged ( in my case and so there is no meta-data stored with the sentence), like you add some random words in twitter. How the indexing will work here in this case?
    Any thoughts?

  16. @saad
    You can pass your entire sentences into the indexer (either as separate fields or as one field; whichever makes more sense to you). Lucene will break up the words and be able to search any word within the sentence. So, you would want to index all fields in your database table that you want to be able to search.

  17. Great info thanks! What language is close to php as far as functionality and ease of use?

  18. @Mackenzie
    For me, there are two sides: Microsoft and Linux. If I’m working on a site in the Linux world, I prefer PHP. If I’m using Microsoft, I like using C# and ASP.NET MVC. Codeigniter is a PHP built MVC framework that I also like using. There are similar ideas between the two, but the libraries that come with each are different. Website documentation is pretty good for both though.

  19. ChrisH says:

    Thanks Chris,

    I didn’t know it would be this easy to use Lucene on a no-java-available shared host!

    This is a great mini tut!

    :)

  20. cristik says:

    Thanks a lot for this. I was trying to add search capabilities to my GoDaddy shared hosting account, and not being able to run a full text search server over there limited my possibilities a lot. I too use CI so integrating the Zend Lucene with the steps provided here worked like a charm.

    Thanks again.

  21. Steve says:

    This looks like a great tutorial.

    How could it be used to search a site with ‘flat’ static site content?

    Cheers,
    Steve

  22. leomar says:

    Hello
    I’m trying to use zend lucene in an application I’m building in zend framework. I’m using layer separation model of vision and control, I wonder if it is possible to send me a zip on this post in which explains the use of Zend_Search_Lucene, I looked but could not yet implement in my project. I think I’ll get a model based on .. If you can provide’m very grateful.

  23. @leomar,
    I’m sorry, I don’t know off-hand where these files would be. I’ve not messed with Lucene or PHP in quite a while now. Part of the problem may be due to new versions of the products used. I try to list the exact versions of products I use, so that viewers can recreate what I’ve done. I would suggest trying this post with the versions listed above and once it works there, you can try it by upgrading one product at a time until you find the one causing the issue. Then, you can determine how to resolve the issue.

  24. @Steve,
    I think you would need a PHP script to crawl the static content and generate the RSS feed content from the crawled data. If the site is large enough, you may want to have the Crawler save processed data to a database and then have another script to generate a RSS feed from the database data.

    It’s been a while since I’ve messed with Lucene, but I believe you could also read directly from a database to populate the title and other fields for the search. The PHP code would be a little different though. Google should turn up some examples on how to do this.

  25. Chandra says:

    Terrific post however I was wanting to know if you could write a litte
    more on this topic? I’d be very grateful if you could elaborate a little bit more. Thanks!

  26. Imvu Credits Generator says:

    Good day! Do you know if they make any plugins to safeguard against hackers?
    I’m kinda paranoid about losing everything I’ve worked
    hard on. Any tips?

  27. Noor says:

    I downloaded the latest Zend framework release ‘ZendFramework-2.3.3.zip’ form site http://framework.zend.com/downloads, but I did not find the folder “Search” in path “Zend/Search/Lucene” !! from where I can get it?

  28. Noor says:

    Ok, I discover the problem, zf2 put zendSearch as an optional library, so I have to download it alone

  29. Noor says:

    following your steps, the search works for the previous version of zf1., but right now I want to work with zf2.
    I put the folder ZendSearch under librariy/zend, and I have chnaged the links to look like this :
    $this->load->library(‘zend’);
    $this->zend->load(‘Zend/ZendSearch/Lucene/Lucene’);
    But unfortunately nothing works ?? what is wrong ?

  30. @Noor,
    Sorry I cannot help more. When this article was written was probably the last time I had used Lucene with PHP.

  1. June 30, 2009

    […] If you are interested in using the Zend library of tools with CodeIgniter, please check out How to use Zend_Search_Lucene with CodeIgniter. […]

  2. March 23, 2013

    […] have also used this method described here http://www.cmjackson.net/2009/02/17/how-to-use-zend_search_lucene-with-the-php-framework-codeigniter… that uses loading a library, but doesn’t work either, I get the same […]

  3. March 23, 2013

    […] have also used this method described here http://www.cmjackson.net/2009/02/17/how-to-use-zend_search_lucene-with-the-php-framework-codeigniter… that uses loading a library, but doesn’t work either, I get the same […]

Leave a Reply

Your email address will not be published. Required fields are marked *