Finally, the sample project for my full text search implementation in Delphi is ready. The idea involves providing a user of my application the ability to do full text (Google like) searches on a set of documents by searching inside document content. For the task, I’ve selected SQLite for the database and FireDac for the database connectivity and operations.
For the sake of simplicity and the proof-of-concept approach this sample program is really simple: it will index all TXT files under some directory (including sub directories) and allow to do simple “match” type full text search queries. Certainly, in my real world application, the task is more complex and involves non-TXT files, some Windows service application and alike … but for the start, if you are up to implementing full text search, this sample project should give you a kick-start.
The database selected is SQLite. SQLite is a self-contained, serverless, zero-configuration, transactional SQL database engine, and has the FTS engine built inside.
To connect to and operate on the database I’ve picked FireDac. FireDac knows how to talk to SQlite (any many other databases) and has proven to be robust enough when working with SQLite.
Here’s how this sample program works:
- The user selects a folder containing TXT files (in any subfolders)
- If the full text index database has not been created (i.e. the .SDB file does not exist)
- the database is created
- TXT files and their content are imported into database
- Display the files in a tree (Virtual Tree View)
- Each file / folder node has a check box to allow selection
- FTS queries can be done against the range of selected files
The Database: Multi User Read-Write SQLite Connection
SQLite is primarily used in single connection access to the database scenarios. In my scenario I have to allow multiple access (from various application and application instances) to open, read and write to the database. The TADConnection (+TADPhysSQLiteDriverLink) component is used to connect and talk to the database. By default, SQLite driver settings are set for single connection, to allow multiple connections one has to alter some parameters. Here’s how my connection parameters look:
DriverID=SQLite Database= … OpenMode=CreateUTF16 Password=aes-256:12345 LockingMode=Normal JournalMode=WAL Synchronous=Full SQLiteAdvanced=temp_store=MEMORY SQLiteAdvanced=page_size=4096 SQLiteAdvanced=auto_vacuum=FULL
SQLite database will be created at a connection establishment if it does not exist. I would suggest to read above the above parameters used in FireDac help.
After the connection has been established and the database has been created with tables to support FTS queries (+ referential integrity), we fill in the database:
The Data: TXT files and Their Content
A simple TDirectory.GetFiles is used to get all the TXT files in a selected directory.
var subFiles : TStringDynArray; subFile : string; begin subFiles := TDirectory.GetFiles( main.MainForm.RootDirectory, TSearchOption.soAllDirectories, function(const Path: string; const SearchRec: TSearchRec): Boolean begin result := SameText(ExtractFileExt(SearchRec.Name), '.txt'); end); for subFile in subFiles do UpdateFileInDB(subFile); end;
Files are imported into the database one by one:
function TDM.UpdateFileInDB(const fileName: string; const removeOld : boolean = false): boolean; var sStream: TStringStream; documentRowID : integer; begin result := true; if removeOld then sqliteConnection.ExecSQL('DELETE FROM Document WHERE FileName = ' + QuotedStr(fileName)); sStream := TStringStream.Create; try sStream.LoadFromFile(fileName); try MainForm.LogInMemo(' +' + fileName); sqliteConnection.ExecSQL('INSERT INTO Document VALUES (NULL, :fileName)', [fileName]); documentRowID := sqliteConnection.ExecSQLScalar('SELECT LAST_INSERT_ROWID() AS rowid'); //1 page per document for the sake of simplicity sqliteConnection.ExecSQL('INSERT INTO FTSData VALUES (NULL, :id, :page, :txt)', [documentRowID, 1, sStream.DataString]); except on E: Exception do begin result := false; MessageDlg('Error writing to database:' + E.Message, mtError, [mbOk], -1); end; end; finally sStream.Free; end; end;
Now, all the files have been “indexed” and they are presented in the tree view control. I’ve used Virtual TreeView.
The FTS Search
Finally, specify what files to include in the FTS search, specify your token (word) you are looking for and hit “Search”:
procedure TDM.RetrieveSearchResults(const searchToken: string; const inFiles : TStringList = nil); var fn : string; st : integer; begin //get search results sqlQuery.SQL.Clear; //temp table for selected files sqlQuery.SQL.Add('CREATE TEMPORARY TABLE IF NOT EXISTS PartOfDocument (FileName STRING);'); sqlQuery.SQL.Add('DELETE FROM PartOfDocument;'); if Assigned(inFiles) then for fn in inFiles do sqlQuery.SQL.Add('INSERT INTO PartOfDocument VALUES (' + QuotedStr(fn) + ');'); //main fts query sqlQuery.SQL.Add('SELECT Document.Id, Document.FileName FROM PageContent'); sqlQuery.SQL.Add('INNER JOIN FilePages ON FilePages.rowid = PageContent.rowid'); sqlQuery.SQL.Add('INNER JOIN Document ON Document.rowid = FilePages.DocumentID'); if Assigned(inFiles) AND (inFiles.Count > 0) then sqlQuery.SQL.Add('INNER JOIN PartOfDocument ON PartOfDocument.FileName = Document.FileName'); sqlQuery.SQL.Add('WHERE PageContent MATCH ' + QuotedStr(searchToken)); sqlQuery.SQL.Add(';'); sqlQuery.OpenOrExecute; MainForm.SearchResults.Clear; while NOT sqlQuery.Eof do begin MainForm.SearchResults.Add( TSearchResult.Create( sqlQuery.Fields.AsInteger, //doc.id sqlQuery.Fields.AsString //file name )); sqlQuery.Next; end; sqlQuery.Close; end;
A temporary table is created (if it does not exist), filled in with selected file names. The actual search query is a JOIN between different tables but the most important part is the MATCH on the FTS table. Results are displayed in a list view.
That’s it. Questions? Comments? More than welcome!