Web Design, Programming, Tutorials
How to use Zend_Search_Lucene with the PHP framework CodeIgniter
If you’ve heard the buzz about Apache’s open source search engine, Lucene, then you probably already know what a great search engine tool it is. The search engine is fast, has ports to various languages, and was written to be able to share the search index between the different Lucene ports.
The PHP version of Lucene is packaged in the Zend frameworkand is called Zend_Search_Lucene. When it comes to PHP frameworks, I tend to prefer using CodeIgniteras opposed to Zend. So, you might ask, how can you use a favored framework such as CodeIgniter with the power of Lucene’s search capabilities?
Install CodeIgniter 1.7.1
I downloaded a copy of the latest version of CodeIgniter 1.7.1 and configured it to run the default welcome action. Next, I made a copy of the welcome controller and view to test my indexer and search actions (which we’ll get to in just a minute).
Install Zend Framework 1.7
Next, I downloaded the latest version of the Zend Framework 1.7.5. After extracting the zip file, copy the Zend folder inside ZendFramework-1.7.5/library and paste it into the CodeIgniter framework under System/application/libraries.
Create A Zend Loader Class
The next thing that needs to be done is create a loader file to load the Zend library classes in CodeIgniter. This tutorial also explains how to create this loader class. This file below is named Zend.php and should be located in the System/application/libraries folder of CodeIgniter.
<?php if (!defined('BASEPATH')) exit('No direct script access allowed');
class CI_Zend
{
function __construct($class = NULL)
{
// include path for Zend Framework
// alter it accordingly if you have put the 'Zend' folder elsewhere
ini_set('include_path', ini_get('include_path') .
PATH_SEPARATOR . APPPATH . 'libraries');
if ($class)
{
require_once (string) $class . EXT;
log_message('debug', "Zend Class $class Loaded");
}
else
{
log_message('debug', "Zend Class Initialized");
}
}
function load($class)
{
require_once (string) $class . EXT;
log_message('debug', "Zend Class $class Loaded");
}
}
//End of File: Zend.php
Creating An Indexer
Now we will create the search index. For demonstration purposes, I’m going to place the indexer and search functions in the same controller. You should have your indexer in a separate controller with security that will keep everyone from being able to run it.
We’ll start with the copy of the welcome controller, which I named home.php. After changing the class name and function calls to home instead of welcome, the contents of the file should look like this. Also, add the sanitize function below.
<?php
class Home extends Controller
{
function Home()
{
parent::Controller();
}
function index()
{
$this->load->view('home_view');
}
function sanitize($input)
{
return htmlentities(strip_tags($input));
}
}
/* End of file home.php */
/* Location: ./system/application/controllers/home.php */
Now, we can just replace the contents of the index() function with the following.
$this->load->library('zend', 'Zend/Feed');
$this->load->library('zend', 'Zend/Search/Lucene');
$this->load->library('zend');
$this->zend->load('Zend/Feed');
$this->zend->load('Zend/Search/Lucene');
//Create index.
$index = new Zend_Search_Lucene('c:\wamp\www\ci\tmp\feeds_index', true);
$feeds = array(
'http://www.cmjackson.net/feed/rss/',
'http://andrewmjackson.com/feed/rss');
//grab each feed.
foreach($feeds as $feed)
{
$channel = Zend_Feed::import($feed);
echo $channel->title().'<br />';
//index each item.
foreach($channel->items as $item)
{
if ($item->link() && $item->title() && $item->description())
{
//create an index doc.
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Keyword(
'link', $this->sanitize($item->link())));
$doc->addField(Zend_Search_Lucene_Field::Text(
'title', $this->sanitize($item->title())));
$doc->addField(Zend_Search_Lucene_Field::Unstored(
'contents', $this->sanitize($item->description())));
echo "\tAdding: ". $item->title() .'<br />';
$index->addDocument($doc);
}
}
}
$index->commit();
echo $index->count() .' Documents indexed.<br />';
This indexer will read in the RSS feeds from this website as well as my brother’swebsite and index the contents of the feed. When the index is created, you must specify a location to store the index. These are binary files that Lucene creates and it does not require a database for storage.
In this article, the author further explains the fields of the index document and when each should be used.
Feeds are not the only resource that Lucene can index. Web sites, databases, Microsoft Office documents, etc. Find out more information on Zend Search Lucene in the Zend Framework Manual .
A Basic Search
After running the indexer, you are ready to try searching the documents that are indexed. For demonstration purposes, I’ve added another function to the same controller as the index called search(). This function does not get the results of a form, but instead simulates a string query as if it were from a form.
function search()
{
$this->load->library('zend', 'Zend/Search/Lucene');
$this->load->library('zend');
$this->zend->load('Zend/Search/Lucene');
$index = new Zend_Search_Lucene('c:\wamp\www\ci\tmp\feeds_index');
$query = 'new movie';
$hits = $index->find($query);
echo 'Index contains '. $index->count() .
' documents.<br /><br />';
echo 'Search for "'. $query .'" returned '. count($hits) .
' hits<br /><br />';
foreach($hits as $hit)
{
echo $hit->title .'<br />';
echo 'Score: '. sprintf('%.2f', $hit->score) .'<br />';
echo $hit->link .'<br /><br />';
}
}
This function loads the same index that we previously created and searches for the key phrase ‘new movie’. The results that are returned are sorted by their score ranking. To make the search results look more like google, styling could be added as well as formatting the result entries, but this gives you a good idea of the basic functions of the search engine and how it works.
about 7 months ago
How soon will you update your blog? I’m interested in reading some more information on this issue.
about 7 months ago
I try to update my blog when I have something interesting to share. If you have something specific you would like to know about, let me know and I will help if I can.
about 5 months ago
How will update index data?
about 5 months ago
You would recreate the index each time you wish to update it. You could setup the indexing request on a schedule using cron jobs or even run it manually. Another option would be to run indexing before searching (if it has been a while since the last indexing), but this may take some time and may hurt your user’s experience with your search.
about 4 months ago
Good job. thank you.
about 3 months ago
Nice. Very clear, easy to see how to expand on it.