Ah … real world problems require some proof-of-concept almost-finished implementation solutions.
This time I’m into figuring out what would be the best way to introduce full text search in one of my Delphi applications.
There’s a Delphi application operating on files. Files are located in various sub-folders and, with 99% certainty, files will not change – i.e. a provided folder and file structure is “ready only”. Folders and files are referred to using some XML as the backbone – just imagine an XML file where tags map to folders and subfolders (in a tree like fashion), files carry some attributes (name, title, version, owner, etc.) mapped to XML file-tag attributes.
The application, beside other stuff, offers search functionality for a user to locate / filter out those files “assigned to an owner” or “having a title” and alike – therefore something you could call a “metadata search”.
We are now looking into an option to provide something like full text search to the user – to have the ability to locate a file where the file content matches some token (word or partial word).
Due to the nature of the application and the nature of the target user audience, there are some requirements for any possible implementation: light implementation – must be easy to use, no complex third party engines, should work the same way on all Windows versions, no fat (paid) databases – therefore an implementation existing users will not be aware of. Also: preferably free (so the final price of the application for the end user does not go up) 🙂
For the sake of simplicity let’s say we are talking about TXT files. In reality the files are not TXT files – but whatever the file type actually is –we already know how to grab the textual content from it.
For the idea, here’s a simplified XML structure – all “file” tags should have all 3 attributes (name, title, owner):
<items root="E:\ftsTXTTest\A\"> <folder name="a1"> <file title="t" owner="o" name="n">a1\d14.txt</file> <file title="t" owner="o" name="n">a1\d17.txt</file> </folder> <folder name="a2"> <file title="t" owner="o" name="n">a2\c6.txt</file> <file title="t" owner="o" name="n">a2\c6per.txt</file> <folder name="a21"> <file>a2\a21\announce.txt</file> <file>a2\a21\announce_fr.txt</file> </folder> <folder name="a22"> <file>a2\a22\c6.txt</file> <file>a2\a22\d6.txt</file> <folder name"a221"> <file>a2\a22\a221\d11.txt</file> <file>a2\a22\a221\d12.txt</file> </folder> </folder> </items>
“Light” Full Text Search in Delphi
My initial idea was to check out what Windows as OS has to offer through its indexed Windows Search, but since each version of Windows brings something new I’ve very quickly decided not to go along that path.
I’ve spent some time investigating available options and it seemed it all boils down to either using some fat FTS engine/framework like Rubicon, Lucene, Sphinks OR rely on some database having built-in support for full text search.
Following the requirements stated before, I would want to go for a database having support for FTS queries. The database should be free, “natively” supported by Delphi and embedded (with no restrictions).
An embedded database is a database that does not run in a separate process, but instead is directly linked (embedded or integrated) into the application requiring access to the stored data. An embedded database is hidden from the application’s end-user and requires little or no ongoing maintenance.
Out of those embedded databases that include support for FTS queries – SQLite seems like the best choice.
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is in the public domain and does not require a license.
Having picked SQLite the natural selection for me is FireDac (formerly AnyDac from Da-Soft).
FireDAC enables native high-speed direct access from Delphi to InterBase, SQLite, MySQL, SQL Server, Oracle, PostgreSQL, DB2, SQL Anywhere, Advantage DB, Firebird, Access, Informix, DataSnap and more.
To sum up the above:
- Task: provide full text search queries for a read-only folder-file structure (where file types are known and content can be programmatically extracted)
- At one time, in most cases, only 1 user will “attack” the folder (and the FTS database)
- Text extraction requires time – should be done out of users eyes
- Once extraction is done and content is stored, provide FTS type queries
- Requisites: light implementation, free, no fat-bulky engines, easy to setup and use from inside the existing application.
- Possible solution: SQLite + FireDac (+Delphi)
- Possible “problems”: speed of initial extraction, size of the database, updates to the database (even if read-only things happen), …
Complex and Less Complex Tasks In Implementing FTS
The “complex” part is the text extraction as it can take some time to extract the content (text) from files and store it in the database for FTS retrieval.
Once a folder is processed, and since we are talking about read-only locations, the search functionality from within the application is not a complex task. Text extraction could run in threads, be implemented as a Windows service or something alike – that’s something I still have to decide (read: try out).
I’ve already done a proof of concept application using FireDac and SQLite and things seem to be nicely aligning to what the final goal is – of course on a small folder structure where text extraction takes a few seconds.
Next time I’ll share some code to how to create the FTS-enabled SQLite database supporting referential integrity and how to use FireDac.
As always is the case, I would not like to spend a few moths figuring out everything leads to a dead-end. 🙂
Any thoughts that you want to share, if you had some similar task to implement in your Delphi applications?
I would like to be as-sure-as-possible I’ve picked the right path.