Full Text TXT Search In Delphi (+FireDac +SQLite). Sample Working Project.

delphi-sqlite-firedac-fts
Finally, the sample project for my full text search implementation in Delphi is ready. The idea involves providing a user of my application the ability to do full text (Google like) searches on a set of documents by searching inside document content. For the task, I’ve selected SQLite for the database and FireDac for the database connectivity and operations.

For the sake of simplicity and the proof-of-concept approach this sample program is really simple: it will index all TXT files under some directory (including sub directories) and allow to do simple “match” type full text search queries. Certainly, in my real world application, the task is more complex and involves non-TXT files, some Windows service application and alike … but for the start, if you are up to implementing full text search, this sample project should give you a kick-start.

Download Sample Project Source Code. Download Executable Only.

The database selected is SQLite. SQLite is a self-contained, serverless, zero-configuration, transactional SQL database engine, and has the FTS engine built inside.
To connect to and operate on the database I’ve picked FireDac. FireDac knows how to talk to SQlite (any many other databases) and has proven to be robust enough when working with SQLite.

Here’s how this sample program works:

  1. The user selects a folder containing TXT files (in any subfolders)
  2. If the full text index database has not been created (i.e. the .SDB file does not exist)
    • the database is created
    • TXT files and their content are imported into database
  3. Display the files in a tree (Virtual Tree View)
  4. Each file / folder node has a check box to allow selection
  5. FTS queries can be done against the range of selected files

The Database: Multi User Read-Write SQLite Connection

SQLite is primarily used in single connection access to the database scenarios. In my scenario I have to allow multiple access (from various application and application instances) to open, read and write to the database. The TADConnection (+TADPhysSQLiteDriverLink) component is used to connect and talk to the database. By default, SQLite driver settings are set for single connection, to allow multiple connections one has to alter some parameters. Here’s how my connection parameters look:

DriverID=SQLite
Database= …
OpenMode=CreateUTF16
Password=aes-256:12345
LockingMode=Normal
JournalMode=WAL
Synchronous=Full
SQLiteAdvanced=temp_store=MEMORY
SQLiteAdvanced=page_size=4096
SQLiteAdvanced=auto_vacuum=FULL

SQLite database will be created at a connection establishment if it does not exist. I would suggest to read above the above parameters used in FireDac help.

After the connection has been established and the database has been created with tables to support FTS queries (+ referential integrity), we fill in the database:

The Data: TXT files and Their Content

A simple TDirectory.GetFiles is used to get all the TXT files in a selected directory.

var
  subFiles : TStringDynArray;
  subFile : string;
begin
  subFiles := TDirectory.GetFiles(
                main.MainForm.RootDirectory,
                TSearchOption.soAllDirectories,
                function(const Path: string; const SearchRec: TSearchRec): Boolean
                begin
                  result := SameText(ExtractFileExt(SearchRec.Name), '.txt');
                end);

  for subFile in subFiles do
    UpdateFileInDB(subFile);
end;

Files are imported into the database one by one:

function TDM.UpdateFileInDB(const fileName: string; const removeOld : boolean = false): boolean;
var
  sStream: TStringStream;
  documentRowID : integer;
begin
  result := true;

  if removeOld then
    sqliteConnection.ExecSQL('DELETE FROM Document WHERE FileName = ' + QuotedStr(fileName));

  sStream := TStringStream.Create;
  try
    sStream.LoadFromFile(fileName);

    try
      MainForm.LogInMemo(' +' + fileName);

      sqliteConnection.ExecSQL('INSERT INTO Document VALUES (NULL, :fileName)', [fileName]);

      documentRowID := sqliteConnection.ExecSQLScalar('SELECT LAST_INSERT_ROWID() AS rowid');

      //1 page per document for the sake of simplicity
      sqliteConnection.ExecSQL('INSERT INTO FTSData VALUES (NULL, :id, :page, :txt)', [documentRowID, 1, sStream.DataString]);
    except on E: Exception do
      begin
        result := false;
        MessageDlg('Error writing to database:' + E.Message, mtError, [mbOk], -1);
      end;
    end;
  finally
    sStream.Free;
  end;
end;

Now, all the files have been “indexed” and they are presented in the tree view control. I’ve used Virtual TreeView.

The FTS Search

Finally, specify what files to include in the FTS search, specify your token (word) you are looking for and hit “Search”:

procedure TDM.RetrieveSearchResults(const searchToken: string; const inFiles : TStringList = nil);
var
  fn : string;
  st : integer;
begin
  //get search results
  sqlQuery.SQL.Clear;

  //temp table for selected files
  sqlQuery.SQL.Add('CREATE TEMPORARY TABLE IF NOT EXISTS PartOfDocument (FileName STRING);');
  sqlQuery.SQL.Add('DELETE FROM PartOfDocument;');

  if Assigned(inFiles) then
    for fn in inFiles do
      sqlQuery.SQL.Add('INSERT INTO PartOfDocument VALUES (' + QuotedStr(fn) + ');');


  //main fts query
  sqlQuery.SQL.Add('SELECT Document.Id, Document.FileName FROM PageContent');
  sqlQuery.SQL.Add('INNER JOIN FilePages ON FilePages.rowid = PageContent.rowid');
  sqlQuery.SQL.Add('INNER JOIN Document ON Document.rowid = FilePages.DocumentID');
  if Assigned(inFiles) AND (inFiles.Count > 0) then
    sqlQuery.SQL.Add('INNER JOIN PartOfDocument ON PartOfDocument.FileName = Document.FileName');
  sqlQuery.SQL.Add('WHERE PageContent MATCH ' + QuotedStr(searchToken));
  sqlQuery.SQL.Add(';');

  sqlQuery.OpenOrExecute;

  MainForm.SearchResults.Clear;
  while NOT sqlQuery.Eof do
  begin
    MainForm.SearchResults.Add(
      TSearchResult.Create(
        sqlQuery.Fields[0].AsInteger, //doc.id
        sqlQuery.Fields[1].AsString  //file name
        ));
    sqlQuery.Next;
  end;

  sqlQuery.Close;
end;

A temporary table is created (if it does not exist), filled in with selected file names. The actual search query is a JOIN between different tables but the most important part is the MATCH on the FTS table. Results are displayed in a list view.

That’s it. Questions? Comments? More than welcome!

Part 1: Full Text Search Functionality – Implementation Idea
Part 2: SQLite: Referential Integrity With Full Text Search Virtual Tables

11 thoughts on “Full Text TXT Search In Delphi (+FireDac +SQLite). Sample Working Project.

  1. SilverWarior

    HI!
    Tried to compile your example but no luck (whole bunch of files missing). I gues this is probably due to me still using Delphi XE3.
    So could you please offer a link to already compiled example (binary).

    Reply
    1. zarkogajic Post author

      Hi, source code is XE3, but uses FireDac and VirtualTreView. I’ve added the link to download the exe only (however am not sure that running only the exe will be enough to understand how FTS is implemented).

      Reply
      1. SilverWarior

        Hi!
        Sorry for not replying sooner.
        Running only EXE did tell me enough to say that this implementation is absolutely TERRIBLE.

        Let me explain:
        At first it tried to use your example program to process about 160 MB of random TXT files. Test was unsucsessful as you application encountered somekinda error which caused it to use up all 1 GB heap memory and returned OutOfMemory error. After that it seemed that you applicatjion went into somekinda infinite loop.
        So I killed it and tried working on smaller amount of data. This time I gave your application about 2.6 MB of various TXT files (about 2500 TXT files). It managed to process them without the problem this time. But what surprised me greatly is that the size if the database your application generated for the provided data vas some more than 8 MB.
        Seriously ( MB database for 2.6 MB of imput data. That is terrible.
        Of course the reason for this lies in the fact that you are probably storing text inside the dabase just as regular text. So then later database creates dictionaries, word references and such which in the end means that your database is larger than your input text.

        With the approach I suggested you would already be compressing text as soon as you are saving it into database. Becouse before you will be saving text into database you would replace each word with the index number at which that word can be found in word dictionary. In other words you would be applying word compression before saving text into database. This should result in database actually being smaller than the input text.

        Reply
        1. zarko

          Hi,

          Thank you for the comments, Your tests are ok – and what we have also.

          Please note that the sample application provided in the article is really not meant to be used in real-life scenarios (hence: ” proof-of-concept”). This is, and as is said in the text, just a sample working project – to show how one can do FTS with Delphi and SQLite supporting referential integrity on virtual FTS tables.

          The source code is what you should be looking at – if you need something similar in your Delphi applications.

          To compress data in the FTS table I’m actually using the “compress” and “uncompress” options (http://www.sqlite.org/fts3.html#section_6_1) in the FTS table constructing DDL.

          Also, the actual processing of the files in done by a Windows service application and not like it is done in this sample project.

          Reply
  2. SilverWarior

    I would like to trst you application inside my delph but unfortunately I only own Proffesional version of delphi which doesn’t have FireDac components. So no way for me to compile your program.

    Also after deading some more about FTS table implementation now I see why you have such increase in database size.

    As for compression you have mentioned SQL lite only uses ZLIB compression which isn’t as good as good implemented word compression.
    ZLIB compression is compression agorithm which preformes average with all data types.
    But Word compression on the other hand is specifically designed to be as much efficient as posible in comressing raw text data.

    Anywhay since I plan on using Word compression in athleast two of my projects I’ll try to make a demonstrational application for my design so you can test it and see which one you like best.
    But I can not promise to you when sice I’m a bit short on time lately.

    Reply
  3. Robbi

    hi, i run your source code and get some error message and when i search the error i’ve got this

    uADStanIntf,
    uADStanOption, uADStanError, uADGUIxIntf, uADPhysIntf, uADStanDef,
    uADStanPool, uADStanAsync, uADPhysManager, uADStanExprFuncs,
    uADGUIxFormsWait, uADCompGUIx, uADPhysSQLite, Data.DB, uADCompClient,
    uADStanParam, uADDatSManager, uADDAptIntf, uADDAptManager, uADCompDataSet

    that’s the error, by the way im using XE5 and FireDAC installed, could u help me… i really interested with ur program

    Reply
    1. Francisco Dueñas

      The real problem of missed components and compilation errors is that Zarlos’s demo is made with a FireDAC versio that was named AnyDAC before, when embarcadero bought AnyDAC they renamed it to FireDAC, but the components, at least in first two versions of delphi had the same name, later they changed all TADXXX to TFDXXX, so what you need to no is to rename all components named TADXXX to TFDXXX and also rename some used units.

      Check this link that can help you migarte from AnyDAC to FireDAC:
      http://docwiki.embarcadero.com/RADStudio/Seattle/en/Migrating_AnyDAC_Applications_to_FireDAC

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *