Building a Fast Full Text Search in TimescaleDB for Trading Instruments
When building our algorithmic trading platform, we faced an interesting challenge: how could our AI agent efficiently search and identify the correct financial instruments when researching trading opportunities? The solution needed to be both blazingly fast and highly accurate - two requirements that don't always go hand in hand.
The Problem: Finding Needles in the Haystack
The platform essentially stores market data in TimescaleDB (a PostgreSQL extension optimized for time-series data) for now, where we maintain tick-level data for thousands of instruments. While TimescaleDB excels at storing and retrieving time-series data efficiently, we needed a complementary search capability for our instruments metadata.
The AI agent needs to quickly find trading symbols with identifiers like NSE:RELIND:EQ
, NSE:NIFTY50:OPT:26-Jun-2025:23500:CE
, or NFO:IDFBAN:FUT:26-Jun-2025
based on partial information or even fuzzy search terms.
Here's a simplified version of our instruments
table schema:
CREATE TABLE instruments (
token VARCHAR(50),
symbol VARCHAR(50),
short_name VARCHAR(100),
series VARCHAR(50),
company_name VARCHAR(255),
expiry_date VARCHAR(50),
strike_price NUMERIC,
option_type VARCHAR(10),
lot_size INTEGER,
tick_size NUMERIC,
instrument_type VARCHAR(50),
exchange VARCHAR(50),
source_file VARCHAR(50),
identifier VARCHAR(100),
isec_token VARCHAR(50),
isec_token_level1 VARCHAR(50),
isec_token_level2 VARCHAR(50),
PRIMARY KEY (token, symbol, series, exchange)
);
Why Full Text Search?
While PostgreSQL offers several ways to query text data (LIKE, ILIKE, regular expressions), these methods become increasingly inefficient as the dataset grows. Full Text Search (FTS) provides several advantages:
- Performance: FTS is optimized for searching through large volumes of text
- Relevance ranking: Results can be ranked by relevance
- Linguistic features: Support for stemming, stop words, and language-specific processing
- Partial matching: Finding words even when the search term is incomplete
Implementing Full Text Search for Instruments
Step 1: Create a Search Vector Column
First, we need to add a column to store the tsvector (text search vector) data:
ALTER TABLE instruments ADD COLUMN search_vector tsvector;
Step 2: Define Which Fields to Include in the Search
We want to make specific fields searchable. Each field might have different importance (weight) in the search:
UPDATE instruments
SET search_vector =
setweight(to_tsvector('english', coalesce(symbol, '')), 'A') ||
setweight(to_tsvector('english', coalesce(company_name, '')), 'B') ||
setweight(to_tsvector('english', coalesce(short_name, '')), 'B') ||
setweight(to_tsvector('english', coalesce(instrument_type, '')), 'C') ||
setweight(to_tsvector('english', coalesce(exchange, '')), 'C') ||
setweight(to_tsvector('english', coalesce(identifier, '')), 'A');
This assigns different weights to different fields:
- Weight A (highest): symbol and identifier
- Weight B (medium): company_name and short_name
- Weight C (lower): instrument_type and exchange
Step 3: Create a Trigger to Update the Search Vector
To keep the search vector updated whenever instruments data changes:
CREATE OR REPLACE FUNCTION instruments_search_vector_update() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', coalesce(NEW.symbol, '')), 'A') ||
setweight(to_tsvector('english', coalesce(NEW.company_name, '')), 'B') ||
setweight(to_tsvector('english', coalesce(NEW.short_name, '')), 'B') ||
setweight(to_tsvector('english', coalesce(NEW.instrument_type, '')), 'C') ||
setweight(to_tsvector('english', coalesce(NEW.exchange, '')), 'C') ||
setweight(to_tsvector('english', coalesce(NEW.identifier, '')), 'A');
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER instruments_search_vector_trigger
BEFORE INSERT OR UPDATE ON instruments
FOR EACH ROW EXECUTE FUNCTION instruments_search_vector_update();
Step 4: Create a GIN Index for Fast Searching
CREATE INDEX instruments_search_idx ON instruments USING GIN (search_vector);
GIN (Generalized Inverted Index) is optimized for cases where the same values appear in many rows, making it perfect for full-text search.
Step 5: Implement the Search Query
Now we can create a function for the AI agent to search for instruments:
CREATE OR REPLACE FUNCTION search_instruments(search_term TEXT)
RETURNS TABLE (
token VARCHAR,
symbol VARCHAR,
company_name VARCHAR,
instrument_type VARCHAR,
exchange VARCHAR,
identifier VARCHAR,
rank FLOAT
) AS $$
BEGIN
RETURN QUERY
SELECT
i.token,
i.symbol,
i.company_name,
i.instrument_type,
i.exchange,
i.identifier,
ts_rank(i.search_vector, to_tsquery('english', search_term)) AS rank
FROM
instruments i
WHERE
i.search_vector @@ to_tsquery('english', search_term)
ORDER BY
rank DESC
LIMIT 10;
END
$$ LANGUAGE plpgsql;
Optimizing the Search Experience
After implementing the basic search functionality, we made several optimizations:
1. Handle Partial Words with Custom Lexemes
The default PostgreSQL FTS configuration breaks text into lexemes (root words) which might not work well for symbols like "RELIND" (for Reliance Industries). We created a custom configuration:
-- Create a custom text search configuration
CREATE TEXT SEARCH CONFIGURATION trading_instruments (COPY = english);
-- Modify it to handle partial words and special cases
ALTER TEXT SEARCH CONFIGURATION trading_instruments
ALTER MAPPING FOR asciiword, word, numword, asciihword, hword, numhword
WITH simple;
2. Implement Fuzzy Matching for Typos
To handle typos and slight variations in search terms, we implemented trigram-based fuzzy matching:
-- Enable the pg_trgm extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Add a trigram index on key fields
CREATE INDEX instruments_trigram_idx ON instruments
USING GIN ((symbol || ' ' || company_name || ' ' || identifier) gin_trgm_ops);
-- Create a function for fuzzy search
CREATE OR REPLACE FUNCTION fuzzy_search_instruments(search_term TEXT)
RETURNS TABLE (
token VARCHAR,
symbol VARCHAR,
company_name VARCHAR,
instrument_type VARCHAR,
exchange VARCHAR,
identifier VARCHAR,
similarity FLOAT
) AS $$
BEGIN
RETURN QUERY
SELECT
i.token,
i.symbol,
i.company_name,
i.instrument_type,
i.exchange,
i.identifier,
similarity(i.symbol || ' ' || i.company_name || ' ' || i.identifier, search_term) AS similarity
FROM
instruments i
WHERE
i.symbol || ' ' || i.company_name || ' ' || i.identifier % search_term
ORDER BY
similarity DESC
LIMIT 10;
END
$$ LANGUAGE plpgsql;
3. Combining FTS with Time-Series Data Queries
Since our AI agent often needs to search for an instrument and then immediately query its historical data, we created a specialized function that combines both operations:
CREATE OR REPLACE FUNCTION search_and_get_recent_ticks(
search_term TEXT,
time_from TIMESTAMPTZ,
time_to TIMESTAMPTZ
)
RETURNS TABLE (
token VARCHAR,
symbol VARCHAR,
price NUMERIC,
volume BIGINT,
timestamp TIMESTAMPTZ
) AS $$
DECLARE
instrument_token VARCHAR;
BEGIN
-- First find the instrument
SELECT i.token INTO instrument_token
FROM instruments i
WHERE i.search_vector @@ to_tsquery('english', search_term)
ORDER BY ts_rank(i.search_vector, to_tsquery('english', search_term)) DESC
LIMIT 1;
-- Then return its recent tick data
RETURN QUERY
SELECT
t.token,
i.symbol,
t.price,
t.volume,
t.timestamp
FROM
ticks t
JOIN
instruments i ON t.token = i.token
WHERE
t.token = instrument_token
AND t.timestamp BETWEEN time_from AND time_to
ORDER BY
t.timestamp DESC;
END
$$ LANGUAGE plpgsql;
How AI Agent could use the Full Text Search
The AI agent interacts with the search functionality through a simple API. When researching market opportunities, it follows this workflow:
- Initial research: The agent identifies potential instruments based on news, trends, or strategic analysis
- Symbol search: It performs a full-text search to find the exact instrument identifier
- Data retrieval: Once it has the identifier, it can retrieve historical tick data for backtesting
- Strategy creation: Based on the analysis, it generates trading strategy code
For example, if the agent reads news about Reliance Industries and wants to create a strategy, it might:
- Search for "Reliance Industries" using
search_instruments('reliance & industries')
- Get back
NSE:RELIND:EQ
as the top result - Retrieve historical data using the token
- Generate and test a strategy specific to this instrument
Future Improvements
As we expand to international markets and commodities, we'll need to enhance our search capabilities:
1. Multi-Language Support
For international markets, we plan to implement multilingual search support:
-- Create configurations for different languages
CREATE TEXT SEARCH CONFIGURATION trading_jp (COPY = japanese);
CREATE TEXT SEARCH CONFIGURATION trading_de (COPY = german);
-- Add language-specific columns
ALTER TABLE instruments ADD COLUMN search_vector_en tsvector;
ALTER TABLE instruments ADD COLUMN search_vector_jp tsvector;
ALTER TABLE instruments ADD COLUMN search_vector_de tsvector;
2. Real-Time Instrument Updates
For markets with frequently changing instruments (like options with new expiries):
-- Create a hypertable for instrument updates log
SELECT create_hypertable('instrument_updates', 'update_time');
-- Create a materialized view for the latest instrument data
CREATE MATERIALIZED VIEW latest_instruments AS
SELECT DISTINCT ON (token, exchange) *
FROM instrument_updates
ORDER BY token, exchange, update_time DESC;
-- Refresh schedule
CREATE OR REPLACE FUNCTION refresh_instruments_mv()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY latest_instruments;
END
$$ LANGUAGE plpgsql;
-- Run every hour
SELECT cron.schedule('0 * * * *', 'SELECT refresh_instruments_mv()');
3. Synonym Expansion
To handle different naming conventions across markets:
-- Create a synonyms table
CREATE TABLE instrument_synonyms (
word TEXT PRIMARY KEY,
synonyms TEXT[]
);
-- Example data
INSERT INTO instrument_synonyms VALUES
('stock', ARRAY['equity', 'share', 'scrip']),
('index', ARRAY['indices', 'benchmark']),
('futures', ARRAY['fut', 'forward', 'forwards']);
-- Function to expand search terms with synonyms
CREATE OR REPLACE FUNCTION expand_search_terms(search_term TEXT)
RETURNS TEXT AS $$
DECLARE
result TEXT := search_term;
term_record RECORD;
BEGIN
FOR term_record IN
SELECT word, synonyms FROM instrument_synonyms
WHERE search_term ~* ('\m' || word || '\M')
LOOP
result := result || ' | ' || array_to_string(term_record.synonyms, ' | ');
END LOOP;
RETURN result;
END
$$ LANGUAGE plpgsql;
Conclusion
Implementing full-text search in PostgreSQL with TimescaleDB has significantly improved our AI agent's ability to identify trading instruments quickly and accurately. This capability is crucial for our automated strategy creation workflow, enabling the agent to move seamlessly from market research to strategy implementation.
While the current implementation meets our needs for the Indian markets, we've laid the groundwork for expanding to international markets and commodities with minimal changes to the underlying architecture. The combination of TimescaleDB's time-series strengths with PostgreSQL's powerful text search capabilities gives us a robust foundation for our algorithmic trading platform.
As we continue to refine our AI agent's capabilities, having fast and accurate instrument search remains a cornerstone of the system's architecture, enabling more sophisticated trading strategies and faster response to market opportunities.