A scaled research, learning, and forecasting system. Acquisition from the internet, processing via FinBERT sentiment, SBERT embeddings, PCA feature extraction, and AI model sentiment analysis. Training and forecasting with LSTM, XGBoost, committee meta-model, and backtesting.
Built on Python 3.14 with pytorch-cuda, Postgres with TimescaleDB, and VectorDB.
Data Flow
flowchart LR
subgraph CRAWL["Crawl"]
C[Crawler]
end
subgraph STORE["Storage"]
SD[Sentiment Data]
BS[Blob Store]
NE[News Embeddings]
CF[Context Features]
TF[Topic Features]
KG[Knowledge Graph]
PT[Market Data]
MP[Model Performance]
end
subgraph PROCESS["Process"]
SP[Sentiment
Processor]
EMB[Embedding
Manager]
FE[Feature
Engineers]
end
subgraph TRAIN["Train"]
MOD[Model
Trainers]
end
C -->|article text| BS
C -->|metadata row| SD
SD -->|blob_path FK| BS
SP -->|read content| BS
SP -->|scores + labels| SD
SP -->|embedding| NE
EMB -->|read content| BS
EMB -->|384-dim vector| NE
NE --> FE
FE --> CF
FE --> TF
FE --> KG
PT & CF & TF & KG & SD --> MOD
MOD -->|predictions| MP
MOD -->|model files| NAS[(NAS)]
