As a second step in my goal to implement full text search functionality in my Delphi application I had to create a sample SQLite database supporting FTS queries.
Full text search models in databases allow to do what Google does with web documents – quickly search for documents containing some word (token).
FTS enabled tables in SQLite are special tables created using the virtual table paradigm. There’s lots of info on FTS tables and their construction and usage on the SQLite web site.
Reminder: my idea is to see how to have full test search implemented in a Delphi application processing files under some read-only folder structure.
Each file (to end up in my FTS table) has several pages and the content of a page should be stored in a FTS table.
Warning: from now on I will presume you have enough knowledge on SQL syntax and SQLite to be able to read and understand “CREATE TABLE…”, “INSERT INTO…” and alike.
Initial idea would be to create the database using the following model (simplified):
|Id (primary field)
|Id (primary field)
DocumentId (foreign key / ref integrity)
Therefore two tables. First table “Document” stores file names, this table is a parent/master table, while table “FileContent” is a child table storing several records (page number and page text content) for each entry in its parent table (each record in child is for one page of the file being processed).
In the above model the DocumentID field in FileContent table should reference Document.Id so that SQL queries like Delete and Update on the Document table would also delete/update the corresponding rows in the FileContent table.
If the above could work (and it does not as you will see) the DDL SQL would look like:
CREATE TABLE Document ( Id INTEGER PRIMARY KEY, FileName TEXT NOT NULL, ); /*SQLite Full Text Search Tables */ CREATE VIRTUAL TABLE FileContent USING FTS4 ( DocumentId INTEGER REFERENCES Document(Id) ON DELETE CASCADE ON UPDATE CASCADE, Page INTEGER, PageContent TEXT, UNIQUE(DocumentId, Page) );
The above unfortunately does NOT work.
Virtual tables in SQLite do NOT support any constraints (referential integrity, foreign keys)!
DDL for FTS virtual table allowing foreign keys
Luckily, SQLite is enough powerful to support views and triggers and even triggers on views! There are ways to have best of both worlds: referential integrity on FTS virtual tables.
After some testing, the structure of the database best matching my requirements is :
CREATE TABLE Document ( Id INTEGER PRIMARY KEY, FileName TEXT NOT NULL, ); CREATE TABLE FilePages( DocumentId INTEGER REFERENCES Document(Id) ON DELETE CASCADE ON UPDATE CASCADE, Page INTEGER, UNIQUE(DocumentId, Page) ); CREATE VIRTUAL TABLE PageContent USING fts4(); CREATE TRIGGER FilePages_DELETE AFTER DELETE ON FilePages BEGIN /* required when ref. integrity deletes rows in FilePages as a result of delete row(s) operation in Document */ DELETE FROM PageContent WHERE rowid = OLD.rowid; END; CREATE VIEW FTSData AS SELECT FilePages.rowid AS rowid, FilePages.DocumentId, FilePages.Page, PageContent.Content FROM FilePages JOIN PageContent ON FilePages.rowid = PageContent.rowid; CREATE TRIGGER FTSData_INSERT INSTEAD OF INSERT ON FTSData BEGIN INSERT INTO FilePages(DocumentId, Page) VALUES (NEW.DocumentId, NEW.Page); INSERT INTO PageContent (rowid, Content) VALUES (last_insert_rowid(), NEW.Content); END; CREATE TRIGGER FTSData_DELETE INSTEAD OF DELETE ON FTSData BEGIN DELETE FROM FilePages WHERE rowid = OLD.rowid; DELETE FROM PageContent WHERE rowid = OLD.rowid; END;
|Document [1 – ∞ FilePages]||FilePages [1 – 1 PageContent]||PageContent|
Id (primary field – alias to “rowid”)
DocumentId (foreign key / ref integrity)
In SQLite, every row of every table has a signed integer ROWID field. The ROWID for each row is unique among all rows in the same table. If a table contains a column of type INTEGER PRIMARY KEY, then that column becomes an alias for the ROWID. If no columns are specified when creating the FTS table, a single column “Content” table is created.
Recipe: 3 tables, 1 view, and 3 triggers. Triggers are database operations that are automatically performed when a specified database event occurs. Views are pre-packaged SELECT statements.
The view FTSData serves to join the FTS table PageContent with its “one-to-one” relation table “FilePages” (the relation is actually imposed by triggers). FilePages are in one-to-many relation with Document being the parent table.
insert into Document values (1, 'file 1'); insert into FTSData values(null, 1, 1, 'doc 1 page 1'); insert into FTSData values(null, 1, 2, 'doc 1 page 2'); insert into Document values (2, 'file 2'); insert into FTSData values(null, 2, 1, 'doc 2 page 1'); insert into FTSData values(null, 2, 2, 'doc 2 page 2'); /* and finally a simple FTS query */ SELECT FROM PageContent WHERE MATCH 'page'
Any comments / better ideas?
Part 1: Full Text Search Functionality – Implementation Idea
Part 3: Full Text TXT Search In Delphi (+FireDac +SQLite). Sample Working Project.
I am not really a big fan of online processing for Full Text search, usually I see it as a separate task that does the indexing process in background. especially when you scale the process up , you can get into real trouble doing it online. usually I create a status table with the document ID and a status which can be either TO BE INDEXED , INDEXED , TO BE DELETED. The background process picks up both TO BE INDEXES and TO BE DELETED to do the indexing process and clean up of documents that are deleted.
Gaetan, thanks for the comment. I actually have the indexing and processing of the files being done in a Windows Service application. The real-world database has a “status” field in the Document table (to process, processed, to update, failed, etc). I’ll write about the real-world implementation with more info in the near future.
Hi, Zarko, can you, please send me your project? Or tell me, what components do you use to connect to database? My problem is – when I try to access virtual table from Delphi via FireDac components, a got Access Violation error.