Skip to content

Personal Finance on Palantir Foundry

My family has a joint checking account, multiple credit cards, and the usual flow of subscriptions, groceries, Amazon orders, and irregular expenses that any household deals with. The problem isn’t tracking individual transactions — every bank app does that. The problem is awareness: understanding spending patterns over time, catching anomalies early, answering questions like “how much did we spend on eating out last month?” without logging into three different apps, and getting proactive alerts before problems develop.

My wife and I both needed visibility into our finances, but we interact with technology differently. I wanted something I could query conversationally through my AI agent. She wanted something that would surface relevant information without requiring her to go looking for it. A dashboard alone wouldn’t solve it — we needed a data layer smart enough to answer questions, detect patterns, and push alerts.

I work on Palantir Foundry every day as a Data Architect in aerospace and defense. I can’t share anything about that work, but I can bring the same architectural patterns — ontology modeling, pipeline design, data integration, API-first access — to a personal project that demonstrates the full stack.

Foundry is dramatic overkill for personal finance. That’s the point. A spreadsheet could track transactions. Mint could categorize them. But neither lets me model transactions as first-class ontology objects with 28 properties, build AIP Logic functions that translate natural language into structured queries, or expose the entire data layer through OSDK to external applications. This project exists to demonstrate what Foundry can do when you apply enterprise data architecture thinking to a well-understood domain.

Financial Data Pipeline

The system has four layers, each with distinct responsibilities:

Ingestion pulls data from external sources into Foundry. Plaid provides transaction data from joint bank accounts via a REST API connector. Gmail provides email receipts via Foundry’s native Gmail connector. Yahoo Mail receipts are forwarded to Gmail and piggyback on the existing connector — no custom code needed.

Pipeline transforms raw data through a series of datasets. Raw transactions land as snapshots, get cleaned and normalized in an incremental transform (deduplication, amount normalization, date parsing), then enriched by joining with receipt data for merchant detail that Plaid doesn’t provide.

Ontology models the cleaned data as PlaidTransaction objects — first-class entities in Foundry’s semantic layer that can be queried, linked, and exposed through APIs.

AIP Logic provides AI-powered functions that sit on top of the ontology, enabling natural language queries and pattern detection that would be impractical to implement as traditional code.

The PlaidTransaction object type lives in the project’s ontology with 28 properties. The property design reflects a deliberate choice: flatten aggressively, link sparingly.

Why 28 Properties Instead of Linked Objects

Section titled “Why 28 Properties Instead of Linked Objects”

Plaid’s API returns nested JSON — location has sub-fields, payment_meta has sub-fields, personal_finance_category has primary and detailed levels. The typical relational instinct is to normalize: create a Location object, a PaymentMethod object, a Category object, and link them.

For this domain, that’s wrong. Transactions are the unit of analysis. Every query — “what did we spend at Target last month,” “show me all dining transactions over $50,” “what’s our grocery trend” — starts and ends with transactions. Creating linked objects would mean every query requires joins across the ontology, which adds complexity without adding analytical value. The location of a transaction is an attribute of that transaction, not an independent entity worth tracking.

The exceptions prove the rule: Account is modeled as a separate object type because accounts have independent lifecycle (balances change, accounts get added/closed) and are genuinely a different entity. Category could go either way, but Plaid’s category taxonomy is stable enough that storing primary_category and detailed_category as properties (not links) keeps queries simple.

PropertyTypeSourcePurpose
transaction_idString (PK)PlaidUnique identifier, deduplication key
amountDoublePlaidTransaction amount (positive = debit)
dateDatePlaidTransaction date
merchant_nameStringPlaidMerchant name (cleaned)
primary_categoryStringPlaidTop-level category (Food, Shopping, etc.)
detailed_categoryStringPlaidGranular category (Groceries, Restaurants, etc.)
account_idStringPlaidLink to Account object
associated_userIntegerCustom1=Principal A, 2=Principal B, 3=Dependent
pendingBooleanPlaidWhether transaction is still pending
payment_channelStringPlaidIn store, online, other
receipt_matchedBooleanPipelineWhether an email receipt was matched

The associated_user property is a custom addition — Plaid doesn’t know who made the purchase. This gets populated through a combination of account ownership (some cards are individual) and manual tagging for shared accounts.

The Plaid connector is a REST API source configured in Foundry’s Data Connection framework. It authenticates with Plaid’s API using client credentials, pulls transactions for all linked accounts, and lands them as raw JSON in a snapshot dataset. A downstream transform parses, cleans, and deduplicates.

The key design decision: snapshot, not incremental, at the source level. Plaid’s /transactions/get endpoint returns a window of transactions, not a change feed. Trying to track incremental state at the connector level adds fragile complexity. Instead, the connector pulls the full window every sync, and the first transform handles deduplication using transaction_id as the natural key. Foundry’s incremental transforms then propagate only genuine changes downstream.

The Gmail connector pulls emails matching specific search queries (e.g., receipts from known merchants, order confirmations). A transform extracts structured data from email bodies — amounts, merchant names, order numbers — and joins this with Plaid transactions to enrich records with detail that Plaid’s API doesn’t provide.

This is particularly valuable for online purchases where Plaid might show “AMZN Mktp US” but the email receipt shows the specific items ordered. The receipt data doesn’t replace Plaid data — it supplements it with a receipt_details property and sets receipt_matched = true.

Yahoo Mail receipts are handled by forwarding to Gmail, which then feeds through the existing Gmail connector. This was a deliberate simplicity choice — building a custom IMAP connector for a second email provider adds engineering effort for marginal benefit when auto-forwarding achieves the same result with zero code.

Three AIP Logic functions provide the AI-powered layer:

Takes no arguments. Examines recent transaction data and produces a structured analysis: spending trends by category, month-over-month comparisons, unusual patterns, and merchant frequency analysis. The Logic function has access to the full transaction ontology and can perform aggregations that would be tedious to express as traditional queries.

The most powerful function. Accepts a natural language question — “How much did we spend on groceries in January?” or “What are our top 5 merchants by spend?” — and translates it into structured ontology queries. The function returns a natural language response with supporting data.

This is where Foundry’s AIP Logic shines. The function doesn’t just do text matching — it understands the ontology schema, knows which properties map to which concepts, and can compose multi-step queries (filter by date range, group by category, sort by amount) from a single English sentence.

Detects recurring transactions by analyzing merchant, amount, and frequency patterns. Returns identified subscriptions and bills with their detected cadence (monthly, weekly, annual) and amounts. This replaces manual subscription tracking — the system detects new recurring charges automatically and can alert when a known subscription amount changes.

External applications access Foundry through the Ontology SDK (OSDK). The MCP server — a TypeScript application running on Kubernetes — uses the OSDK client to call AIP Logic functions and query the ontology.

The OSDK client authenticates using an OAuth2 confidential client flow. The MCP server holds a client ID and secret, obtains access tokens from Foundry’s OAuth endpoint, and automatically refreshes them before expiry. This is a deliberate choice over user-delegated auth — the MCP server operates as a service principal with its own identity and permissions.

The OSDK client is configured with read-only permissions. No Actions (Foundry’s write operations) are exposed. The MCP server can query transactions, call Logic functions, and read ontology objects, but it cannot create, modify, or delete anything in Foundry.

This is a security decision. The MCP server is accessed by AI agents that operate autonomously. Giving an autonomous agent write access to financial data is a risk that doesn’t justify the benefit. If data needs to be modified, it happens through Foundry’s UI or pipeline — never through the agent path.

The 7 MCP tools map to two categories:

REST API tools (structured queries, implemented in the MCP server):

  • get_balances — Current account balances
  • get_transactions — Filtered transaction lookups with date, merchant, category, amount filters
  • get_spending_summary — Aggregated spending grouped by category, merchant, day, week, or month
  • get_alerts — Low balance warnings, large transactions, pending charges

AIP Logic tools (AI-powered, via OSDK → Foundry):

  • get_recurring — Detected subscriptions and recurring charges
  • analyze_spending — Trend analysis and anomaly detection
  • query_finances — Natural language financial Q&A

The REST API tools handle structured queries where the parameters are known. The AIP Logic tools handle open-ended questions where the query structure itself needs to be inferred from natural language.

Full-stack Foundry expertise. This project touches every layer: Data Connection (Plaid, Gmail connectors), pipeline transforms (incremental, snapshot, joins), ontology modeling (object types, properties, links), AIP Logic (AI-powered functions), OSDK (external application access), and OAuth (service authentication). It’s a complete tour of the platform.

Ontology design thinking. The decision to flatten PlaidTransaction to 28 properties instead of normalizing into linked objects reflects real ontology design experience — knowing when to favor query simplicity over data normalization, and how that choice propagates through every downstream consumer.

Integration architecture. Three different integration patterns (REST API source, native email connector, OSDK client) demonstrate understanding of when to use which Foundry integration mechanism, and how to design pipelines that are resilient to source-side limitations (like Plaid’s lack of a change feed).

Enterprise patterns at personal scale. The system uses the same architectural patterns — source-of-truth ontology, incremental transforms, service principal auth, read-only API access — that apply to enterprise deployments. The domain is personal finance; the engineering is production-grade.

AI-augmented data access. AIP Logic functions show how to layer AI capabilities on top of structured data, enabling natural language interaction with a formal ontology without sacrificing the precision of structured queries for cases where precision matters.