Co-founder, Sutura Genomics · YC Startup School 2026
Computational cancer biology · Katy, Texas
Sean Lee
I build computational models of cancer from patient genomic data and present them as first author. I started independent cancer research at 14, working under an associate professor at MD Anderson — writing the code, building the models, running the analysis.
01 The startup
Accepted to Y Combinator Startup School 2026 · Co-founder
Sutura Genomics
Automated non-rigid tissue alignment for spatial transcriptomics, built with graph deep learning to solve the one case where every existing method structurally fails: tissue tearing.
Sutura Genomics is a startup I co-founded with Rushil Maniar; it was accepted to Y Combinator's Startup School (San Francisco, July 25–26, 2026).
The problem
Spatial transcriptomics maps gene expression across tissue at cellular resolution — the technology now reshaping cancer research, neuroscience, and drug discovery. To build a 3D picture, tissue is cut into thin slices, and those slices warp, stretch, and tear during prep. Before any analysis, the slices have to be aligned back together — a step called tissue registration. Today's state-of-the-art tools, PASTE2 and STalign, use Optimal Transport, which handles smooth warps but breaks on tears: the math literally cannot represent the discontinuity a tear creates. It's a structural limitation, not a tuning problem — and tearing is one of the most common failure modes in real experiments, with no existing tool that handles it well.
What we built
- 01 Encode each slice as a graph Spots are nodes; edges encode spatial proximity and expression similarity. A graph transformer maps both slices into one feature space.
- 02 Match spots across slices Cross-attention between the two graphs finds which spots in slice A map to slice B.
- 03 Predict a deformation field A displacement head outputs a per-spot vector — how far and which way to move each spot.
- 04 Regularize smoothness locally Smoothness holds only between graph neighbors, so spots across a tear are never forced to agree — the structural break from OT.
What we've proven
Roughly 7× lower registration error than PASTE2 under tissue tearing — same warps, same scoring.
On the exact same torn-tissue warps PASTE2 was scored on — same severity grid, seed, ground truth, and scoring — our graph model holds ~7× lower registration error under tearing and stays significantly flatter as severity climbs. This result is specific to the tearing regime; smooth warps are not the case we built for.
What we haven't proven yet
- Single evaluation seed so far — no error bars yet across multiple warp realizations; a multi-seed run is in progress.
- Trained and evaluated on the same tissue donor. Cross-donor generalization isn't shown yet; a leave-one-out test is running now.
- An error tail at severity 8: most spots stay sub-pitch, but the torn seam produces the harder cases in the tail.
- Not published or peer-reviewed — no preprint is up yet, and the GitHub repo isn't public.
Why it matters
Every multi-slice 3D tissue study needs alignment first, and spatial transcriptomics is on track to be a foundational technology of the next decade of biology — with datasets from 10x Genomics Visium, Slide-seq, and MERFISH growing fast across hundreds of labs. Alignment gates everything downstream, and tearing — one of the most common ways real tissue fails — is the case the field still can't handle. Fix the alignment layer and the analysis above it gets to run on data that currently has to be thrown away. That's the gap Sutura is built for.
- Cancer research — 3D tumor maps of invasion and immune evasion.
- Drug discovery — resolving which cells respond to a therapy versus resist it.
- Neuroscience — brain atlases assembled from thousands of sections.
- Developmental biology — mapping tissue formation across stages.
Where it's going
Phase 1 (→12 months): the best open-source tissue-alignment tool — cited, used by 10+ labs, with the preprint published. Phase 2 (12–24 months): a managed cloud platform — upload data, align in the cloud, download results, no bioinformatics setup, priced per project or dataset. Phase 3 (24+ months): a full spatial-omics analysis platform spanning cell-type annotation, trajectory analysis, and multi-sample integration. The bet: spatial transcriptomics becomes as standard as single-cell RNA-seq, every lab doing it needs alignment, and Sutura is the alignment layer.
Team
- Rushil Maniar Co-founder — pipeline, model architecture, training
- Sean Lee Co-founder — website, outreach, preprint, early user research
Accepted to Y Combinator's Startup School — a two-day founder program — in San Francisco on July 25–26, 2026. To be precise: this is acceptance to the Startup School program; Sutura is not YC-funded and is not a YC-batch company.
- spatial transcriptomics
- graph deep learning
- tissue registration
- non-rigid alignment
- graph transformer
- cross-attention
- open source
02 Research
I build computational models of cancer from genomic data and present them as first author. I write the code, run the analysis, and own the claims down to the script.
I study pSTAT3 — a signaling state tied to inflammation in tumors — across six gastrointestinal cancers. The question I keep asking: what kind of tumor microenvironment does this state actually mark, and does it predict who lives longer?
-
Manuscript · under submission · 2026
Tumor Phospho-STAT3 Marks a Shared Angiogenic–Fibrotic, Immune-Replete Microenvironment Across Gastrointestinal Cancers: A Multi-Omic Analysis of 1,274 Tumors
Molecular Oncology · FEBS Press / Wiley · gold open access
I asked whether tumor pSTAT3 is written into the genome or into the tissue around the cells. Across 1,274 primary tumors and six GI cancers, no somatic mutation reached significance in any cohort — and the cohorts were well-powered, so that null is informative. The phenotype is microenvironmental, not genetically encoded: a shared angiogenic–fibrotic core where three of four independent methods converge (cross-method ρ ≈ 0.5–0.6). It also reframes a field assumption — these tumors are immune-replete, not immune-excluded. Cytotoxic CD8 and NK cells are not depleted and are often increased; the immune compartment is reorganized, not absent. On its own the program is survival-neutral (pooled HR 0.86, P = .11); prognosis tracks the net stromal-over-immune balance (HR 1.16, P = .009), not total microenvironment burden (HR 0.99). I built it on a claim→script→output provenance ledger with falsification tests pre-committed before execution, and it cleared independent biostatistics, falsification red-team, and provenance-audit review.
- No somatic gene reached significance in any cohort — an informative null.
- A shared angiogenic–fibrotic core; cross-method ρ ≈ 0.5–0.6.
- Immune-replete, not immune-excluded — CD8/NK not depleted, often increased.
- Prognosis tracks the net stromal-over-immune balance (HR 1.16, P = .009).
- External validation held under non-transcriptomic purity control (liver: ρ +0.37, q = 1×10⁻⁵).
-
Manuscript · under submission to Scientific Reports · 2026
An Identifiability Framework for Compartment-Resolved STAT3 Attribution from Bulk Tumors: the pSTAT3 "Immune-Enriched Fibrotic" Signal Is Stroma-Borne in Colorectal Cancer
Scientific Reports · sole author
A bulk pSTAT3 reading can't tell whether the signal comes from the cancer cell or the stroma around it — every tumor is a mixture of the two. I deconvolved TCGA GI tumors against single-cell references, recovered the STAT3 program separately in the malignant and stromal compartments, and asked which one carries the "immune-enriched fibrotic" signal. The catch: both compartment scores come from the same mixture, so they're collinear by construction, and the usual partial-correlation "fix" can sign-reverse into a fake suppression effect. So I gated every verdict on an explicit identifiability criterion (collinearity / VIF) and read attribution from raw associations and a variance partition. The contrast was identifiable in only the colorectal cohort — and there the immune signal was carried about twice as strongly by stromal STAT3, with measured pSTAT3 tracking cancer-associated-fibroblast abundance. Bulk pSTAT3 in GI cancers marks a STAT3-active microenvironment, not a tumor-cell-autonomous program.
- 1,116 tumors across 5 GI cohorts; deconvolution validated by pseudobulk recovery, an orthogonal NNLS method, and ESTIMATE purity.
- Malignant-vs-stromal STAT3 was identifiable in only 1 of 5 cohorts — collinearity (VIF up to 21) made the rest unidentifiable.
- In colorectal, the immune association was ~2× stronger for stromal than malignant STAT3 (raw ρ 0.46 vs 0.24).
- Partial correlations sign-reversed under collinearity (+0.24 → −0.31) — a net-suppression artifact, not biology.
- Robust to gene set, deconvolution setting, abundance residualization, and leave-one-out (all 462 colorectal patients).
-
Conference poster · first author · 2026
A Novel 6-Gene pSTAT3 Transcriptomic Score Identifies an Immunosuppressive, Chemotherapy-Resistant Phenotype and Predicts Poor Survival in Biliary Tract Cancer
Cholangiocarcinoma Foundation 2026 Annual Conference
As first author, I led a 6-gene pSTAT3 activity score, validated against protein-level STAT3 phosphorylation measured by RPPA (R = 0.498, p = 0.005). Applied to 198 biliary tract cancer patients, a high score predicted significantly worse progression-free and overall survival across all stages. High-score tumors showed an immunosuppressive, chemotherapy-resistant microenvironment — including reduced SLC29A1, a plausible gemcitabine-resistance route — and first-line chemo-immunotherapy did not rescue prognosis.
Score genes- SOCS3
- BCL2
- MYC
- MMP9
- HGF
- IL6
- 6-gene score validated against RPPA STAT3_pY705 (R = 0.498).
- High score predicts worse PFS and OS across all stages.
- Reduced SLC29A1 — a candidate gemcitabine-resistance mechanism.
- Microenvironment profiled with 29 Bagaev immune signatures.
The presented poster — CCF 2026 Annual Conference. -
Current work · in progress
FGF19–FGFR4 signaling axis
MD Anderson · research mentorship
Investigating the FGF19–FGFR4 axis to identify novel therapeutic targets in GI cancers. Same approach as before: build the model, run the analysis, and assume it's lying to me until it survives falsification.
Stack: R 4.6 / Bioconductor 3.23, Python, GSVA, MCP-counter, Spatial EcoTyper, Cox / Kaplan–Meier survival modeling. Discovery on the TCGA PanCancer Atlas; validation on CPTAC proteogenomic cohorts. The organ-conditional direction (adverse in luminal upper-GI, favorable in hepatobiliary) is a stated testable prediction, not yet demonstrated.
- multi-omics
- tumor microenvironment
- GI oncology
- survival modeling
- reproducible provenance
03 The record
The youngest presenter — and the only high schooler — in the documented history of the Cholangiocarcinoma Foundation Annual Conference.
The documented record shows no undergraduate or high-school presenters before me. I cleared that line as a first author. Verifiable from the conference's recorded record.
Class of 2028 · Cinco Ranch High School
-
Research Intern · MD Anderson Cancer Center
Jan 2024 – presentResearch mentorship · GI Medical Oncology
~2.5 years, remote. Transcriptomic profiling and molecular pathways in biliary tract cancer; lead and first author. Cross-institutional collaboration with Baylor College of Medicine.
-
First-author poster · Cholangiocarcinoma Foundation Annual Conference
2026Youngest presenter on record · MD Anderson travel grant
A 6-gene pSTAT3 transcriptomic score predicting poor survival in biliary tract cancer (n = 198). The presented poster of record.
-
Clinical Research Apprentice · MD Anderson Cancer Center
Dec 2025 – presentShadowing an Associate Professor · GI Medical Oncology
Shadowing a top patient-care provider in direct patient care, and connecting with specialized medical oncologists.
-
Poster Presenter · ESMO Annual Congress
2027 · upcomingSingapore · MD Anderson travel grant
Presenting "Tumor Phospho-STAT3 Marks a Shared Angiogenic–Fibrotic, Immune-Replete Microenvironment Across Gastrointestinal Cancers: A Multi-Omic Analysis of 1,274 Tumors."
-
Returning Presenter · Cholangiocarcinoma Foundation Annual Conference
2027 · upcomingFGF19–FGFR4 · hepatocellular carcinoma
Returning to CCF to present current work on the FGF19–FGFR4 signaling axis — identifying novel therapeutic targets in hepatocellular carcinoma and other GI cancers.
-
DAVAonco Conference · Bermuda
By invitationShadowed the invited MD Anderson faculty
One MD Anderson faculty member is invited to present each year; I attended alongside the faculty I shadow.
- first author
- biliary tract cancer
- computational oncology
04 A system I built
The Research Brain
I built the best research partner I've ever had by assuming it's lying to me. It's an AI research system layered over a ~1,300-note knowledge base on hepatobiliary and pan-GI oncology, genomics, and drug development. It carried one genomics manuscript to submission.
Everyone's racing to make AI smarter. In cancer research, smarter was never the problem — one fabricated number isn't a typo, it's a retraction. So I built mine to be accountable instead.
The accountability isn't a prompt. It's machinery: Claude Code hooks, custom agents, and slash commands that won't let a claim through unless it survives them. Built with custom agents — biostatistician, citation-verifier, falsification-red-team, provenance-auditor, manuscript-drafter, literature-scout — and commands /qc, /verify-claim, and /falsify.
-
Provenance or it didn't happen
It can't state a result without pointing to the file that produced it — a claim→script→output ledger.
-
Verified or stamped unverified
Every citation is web-verified or marked "unverified." No middle ground.
-
Confidence is earned
Nothing is called "confident" until a separate red-team agent has tried, and failed, to falsify it.
-
No hype, by construction
Effect sizes, confidence intervals, and multiple-testing correction enforced as I write.
-
QC on every edit
A PostToolUse hook flags a missing result file, an unverified DOI, or drift between a note's numbers and the ledger — as I type.
Could it still hallucinate? Of course. The difference is mine has to get past a system built to assume it will.
- Claude Code
- provenance ledger
- falsification red-team
- citation verification
05 Built
Things I've built
A self-funded business, a nonprofit cutting costs for cancer patients, and a student org that helps members secure spots in real research positions.
-
Self-funded business
OngoingFounder
I run a photography business and have turned reselling, photography, and building websites into $35,000+ in net profit — fully bootstrapped, no outside capital. Entrepreneurial from a young age.
$35,000+ net profit -
Throughway
Apr 2026 – presentCFO & Co-Founder
A nonprofit reducing financial toxicity and improving insurance access for cancer patients. We're mentored by the MD Anderson Financial Clearance Center. I'm using MD Anderson and Cholangiocarcinoma Foundation connections to expand partnerships and the donor network, with growth planned around the CCF 2027 Annual Conference.
Mentored by the MD Anderson Financial Clearance Center -
Cinco Ranch High Cancer Research Society
May 2026 – presentPresident & Founder
I founded and lead a student org — now 120+ members — that connects aspiring researchers with real cancer-research opportunities. I mentor members through the full process: developing ideas, cold-emailing professors and PIs, building relationships. 25+ have already secured research positions at Stanford, Yale, MIT, MD Anderson, Google, and UT — across psychology, neuroscience, cancer, and virology, from incoming freshmen to college undergraduates.
120+ members · 25+ placed in research cancerresearchsociety.science
06 About
Who's building this
I'm a rising junior at Cinco Ranch High School, focused on translational oncology and computational biology. I started independent cancer research at 14 — and have since been first author on cancer-genomics work, collaborating directly with an associate professor at MD Anderson Cancer Center, three researchers at Baylor College of Medicine, and the University of Houston.
My work spans biliary tract cancer, hepatocellular carcinoma, and NSCLC radiomics. The work is real: I write the code, build the models, and run the analysis on patient genomic data — still in high school, still building.
I believe the gap between data and clinical impact should be smaller than it is. Everything I build is pointed at that gap.
- cancer genomics
- bioinformatics
- tumor immunology
- survival modeling
07 Skills & methods
Skills & Methods
The stack I use to take patient genomic data from raw matrix to first-author result.
Languages & tools
- Python
- R
- Bioconductor
- NumPy
- Pandas
- scikit-learn
- SimpleITK
Genomics & bioinformatics
- TCGA data analysis
- RPPA proteomics
- Transcriptomic analysis
- GSVA
- MCP-counter
- Spatial EcoTyper
Statistics & survival
- Survival modeling
- Kaplan–Meier
- Cox proportional hazards
- Biostatistics
- Multiple-testing correction
Imaging & radiomics
- PyRadiomics
- GLCM radiomics
- 3D Slicer
Domains
- Computational biology
- Cancer genomics
- Tumor immunology
Communication
- Research writing
- Manuscript preparation
- Public speaking
08 Beyond research
Beyond research
Research is most of my time, not all of it. Here is the rest.
-
Student Volunteer
Jun – Jul 2025Texas Children's Hospital
Accepted at 4.3% (1 of 13); I assisted nurses and staff, delivered supplies, and transported patients.
-
Musician
May 2025 – presentMusic for Medicine
I perform live music for elderly residents at care homes and assisted-living facilities.
-
Teaching Assistant
Aug 2024 – presentKCPCH Korean School
-
Volunteer
Jun – Aug 2025Hope Plus Project / Hope to the Future Association
I designed SDG-themed shoes supporting children's health rights in South Sudan, certified by the Undersecretary of the Ministry of Youth and Sports, Republic of South Sudan (ref. HFA-25020043).
09 Contact
Contact
Reach out directly — collaboration, a question about the work, or just to talk. Email is best.
seangplee@gmail.comSean GP Lee · Katy, Texas · 2026