Das Wichtigste in Kürze
- Die Integration von KI und Materialwissenschaften verspricht eine transformative Beschleunigung der Materialentdeckung.
- Die Entwicklung geht über isolierte Modelle hinaus hin zu agentischen Systemen, die planen, handeln und aus dem gesamten Entdeckungszyklus lernen.
- Ein "Pipeline-zentrierter" Ansatz ist entscheidend, um den gesamten Prozess von der Datenerfassung bis zur experimentellen Validierung zu optimieren.
- Große Sprachmodelle (LLMs) spielen eine zentrale Rolle bei der Mustererkennung, prädiktiven Analyse und der Verarbeitung natürlicher Sprache in diesem Bereich.
- Die Automatisierung von Experimenten durch KI-gesteuerte Roboterlabore bietet das Potenzial für eine beschleunigte Materialentwicklung.
- Herausforderungen bestehen in der Datenknappheit, Heterogenität und der Notwendigkeit einer genauen Unsicherheitsquantifizierung.
- Die zukünftige Entwicklung erfordert eine stärkere Integration von Mensch und KI in wissenschaftlichen Workflows, um die Effizienz und Vertrauenswürdigkeit zu maximieren.
Agentische Intelligenz in den Materialwissenschaften: Ein Paradigmenwechsel in der Entdeckung
Die Konvergenz von Künstlicher Intelligenz (KI) und Materialwissenschaften eröffnet transformative Möglichkeiten für die Beschleunigung der Materialentdeckung. Traditionelle Ansätze, die sich auf isolierte, fein abgestimmte Modelle konzentrieren, weichen zunehmend agentischen Systemen. Diese Systeme sind in der Lage, den gesamten Entdeckungszyklus zu planen, auszuführen und daraus zu lernen. Dieser Artikel beleuchtet die Entwicklung hin zu einer agentischen Intelligenz in den Materialwissenschaften, wobei ein "Pipeline-zentrierter" Ansatz im Vordergrund steht, der den gesamten Prozess von der Datenerfassung bis zur experimentellen Validierung optimiert.
Die Evolution der KI in den Materialwissenschaften
Die Materialwissenschaften stehen vor der Herausforderung, Erkenntnisse aus einer ständig wachsenden Literatur zu integrieren und die autonome Entdeckung neuartiger, funktionaler Materialien zu beschleunigen. Frühere Anwendungen des maschinellen Lernens (ML) in diesem Bereich, einschließlich der anfänglichen Nutzung von Großen Sprachmodellen (LLMs), folgten oft einem statischen Modell. Dabei wurden Modelle auf kuratierten Datensätzen trainiert, um spezifische Aufgaben wie die Vorhersage von Eigenschaften oder die Extraktion von Entitäten zu erfüllen. LLMs zeichnen sich durch ihr Verständnis von Text, die effiziente Gewinnung chemischer Notationen, experimenteller Protokolle und Fachjargon aus unstrukturierten Texten aus, wodurch die Wissensextraktion und Hypothesengenerierung aus der Literatur verbessert werden.
Die Evolution von traditionellem maschinellem Lernen hin zu LLMs hat die KI von einem reinen Rechenwerkzeug zu einem intelligenten Forschungskollaborator gewandelt, der in der Lage ist, zu argumentieren, Hypothesen zu generieren, Simulationen zu orchestrieren und Experimente zu entwerfen. Diese Entwicklung lässt sich in vier Hauptphasen unterteilen:
- Prädiktive Modelle auf statischen Datensätzen: Frühe Arbeiten basierten auf Feature Engineering und etablierten datengesteuerte Werkzeuge für die Materialwissenschaft. Überwachte und unüberwachte Lerntechniken unterstützten die Eigenschaftsvorhersage und Mustererkennung.
- Foundation Models als programmierbare Priors: Die Entstehung von Foundation Models, die auf breiten Datensätzen trainiert werden und für eine Vielzahl von Aufgaben anwendbar sind, markiert einen Wendepunkt. Transformer-Architekturen ermöglichten es, große Mengen an Daten zu verarbeiten und komplexe, kontextabhängige Beziehungen in wissenschaftlichen Texten zu erfassen.
- Post-Training-Methoden zur Zielformung und Steuerbarkeit: Methoden wie "Chain-of-Thought"-Prompting und speichererweiterte Architekturen verbessern die Argumentationsfähigkeiten und ermöglichen den Abruf externen Wissens.
- Agentische Systeme mit Werkzeugnutzung und langfristigen Belohnungen: Dies stellt den Übergang von passiven Modellen zu aktiven Systemen dar, die mit ihrer Umgebung interagieren, daraus lernen und sich an reale Aufgaben anpassen können.
Reaktive Aufgaben in den Materialwissenschaften aus KI-Perspektive
Die Anwendung von LLMs in den Materialwissenschaften stellt einen signifikanten Paradigmenwechsel dar, der die prädiktive Modellierung von strukturspezifischem Deep Learning hin zu verallgemeinerbaren, wissensintensiven Repräsentationslernen verschiebt. Die meisten aktuellen Implementierungen arbeiten jedoch noch reaktiv und aufgabenbezogen.
Vorhersage
Die Vorhersage in diesem Kontext umfasst sowohl die quantitative Prognose kontinuierlicher Eigenschaften (Regression) als auch die kategoriale Zuordnung (Klassifikation) über verschiedene Materialdatenmodalitäten hinweg. Die Innovation bei der Anwendung von LLMs liegt in ihrer Fähigkeit, sowohl symbolische strukturelle Eingaben (z.B. Formeln, Raumgruppen) als auch hochdimensionale Merkmalsätze (z.B. DFT-abgeleitete Merkmale) zu verarbeiten.
- Regressionsaufgaben: Vorhersage kontinuierlicher physikalischer Eigenschaften wie thermodynamische Stabilität, mechanische Steifigkeit oder elektrische und thermische Eigenschaften. LLMs verbessern hier die Vorhersagegenauigkeit durch die Nutzung globaler Kontexte und multimodaler Daten.
- Klassifikationsaufgaben: Automatische Kategorisierung von Materialien basierend auf ihrer Zusammensetzung, Struktur oder anderen charakteristischen Daten. LLMs können hierbei mit verrauschten experimentellen Daten umgehen und multimodale Eingaben integrieren.
- Fortgeschrittene Methoden: Hybridarchitekturen aus Graph Neural Networks (GNNs) und Transformatoren nutzen die Stärken beider Ansätze. Multi-Task Learning (MTL) verbessert die Generalisierbarkeit, während Unsicherheitsquantifizierung (UQ) für vertrauenswürdige KI-Systeme unerlässlich ist.
Datengewinnung (Mining)
Die Informationsgewinnung in den Materialwissenschaften konzentriert sich darauf, experimentelle Beschreibungen und Leistungsdaten aus Publikationen in strukturierte Formate für datengesteuerte Forschung zu überführen. Frühe Arbeiten nutzten regelbasierte Text-Mining-Systeme, während neuere Forschung statistische und repräsentationslernende Ansätze einführte. LLM-basierte Ansätze, wie ChatExtract, haben die Genauigkeit und Robustheit der Extraktion erheblich verbessert. Extrahierte Entitäten und Beziehungen werden zunehmend in Wissensgraphen integriert, um strukturierte Wissensrepräsentationen für nachgelagerte Schlussfolgerungen zu erstellen.
Generierung
Das generative Paradigma hat die KI in den Materialwissenschaften tiefgreifend geprägt. LLMs können zur Entwicklung neuartiger Materialien und Synthesemethoden eingesetzt werden.
- Strukturgenerierung: Ziel ist es, neue Kandidatenstrukturen vorzuschlagen, die gültig, stabil und potenziell synthetisierbar sind. LLMs, die Kristallstrukturen als Textsequenzen repräsentieren, können diese autoregressiv generieren.
- Inverses Design: Hierbei werden Materialien identifiziert, die spezifische gewünschte Eigenschaften aufweisen. Agentische Frameworks, die LLMs mit Diffusionsmodellen und Eigenschaftsvorhersagemodellen kombinieren, ermöglichen die autonome Generierung von Materialien mit gewünschten Eigenschaften.
- Syntheserouten-Generierung: Dies bezieht sich auf den Aufbau machbarer Wege zur Synthese eines Zielmaterials. LLMs können hierbei die Auswahl geeigneter Vorläufer, Reaktionspfade und Reaktionsbedingungen unterstützen.
Optimierung und Verifizierung
Die Prozessoptimierung in den Materialwissenschaften wird durch selbstfahrende Plattformen wie Ada vorangetrieben, die Bayesian Optimization nutzen, um Prozesse in einem geschlossenen Kreislauf autonom zu optimieren. Simulationen und KI-basierte Verifizierung spielen eine entscheidende Rolle, wobei hochpräzise Simulationen als virtuelle Labore fungieren können. Agentenbasierte Closed-Loop-Labore integrieren KI mit automatischer Experimente und ermöglichen einen iterativen Zyklus von Design, Ausführung, Bewertung und Evolution.
Daten und Wissen
Die Knappheit und Heterogenität von Materialdaten stellen eine langjährige Herausforderung dar. Data Augmentation-Techniken, Multi-Fidelity-Lernen und Few-Shot-Learning helfen, diese Probleme zu mildern. Die Wissensintegration in KI4MS zielt darauf ab, relevantes Materialwissen in das Design und Training von KI-Modellen einzubeziehen, um deren Genauigkeit, Interpretierbarkeit und Generalisierungsfähigkeit zu verbessern. Dies geschieht durch physikinformierte neuronale Netze (PINNs), strukturbasierte Wissensintegration und die Integration von Ontologien und Wissensgraphen.
Multimodalität
Agentische LLMs müssen multimodale Daten in jeder Phase der Pipeline verarbeiten und verstehen. Dies ist eine Voraussetzung für das Schließen des Kreislaufs, wobei jede Phase und Komponente unterschiedliche Modalitäten erzeugt, die für die Entscheidungsfindung gemeinsam interpretiert werden müssen. Fortschritte in der multimodalen Fusion ermöglichen die Integration von Text, Strukturdiagrammen, Mikroskopiebildern und spektroskopischen Profilen in einem einzigen Repräsentationsraum.
Erklärbarkeit (Explainability)
Erklärbare KI ist ein Schlüsselbestandteil KI-basierter Anwendungen in den Materialwissenschaften, insbesondere im Hinblick auf autonome agentische LLMs. Erklärungen dienen als Verifizierungsartefakte für hochriskante experimentelle Aktionen. Die Erklärbarkeit wird anhand von drei Achsen bewertet:
- Physikalische Gültigkeit: Stimmen die Erklärungen mit etablierten wissenschaftlichen Gründen und physikalischen Gesetzen überein?
- Treue: Spiegelt die Erklärung den internen Denkprozess des Modells genau wider?
- Stabilität: Führen ähnliche Eingaben zu konsistenten Erklärungen?
Ansätze reichen von spärlichen und geschlossenen Modellen über Aufmerksamkeits- und Graph-Erklärer bis hin zu physikinformierten Interpretierbarkeitsmethoden.
Pipeline-zentrierte Perspektive
Die meisten aktuellen KI4Mat-Sci-Formulierungen konzentrieren sich auf eng gefasste, meist überwachte Aufgaben, die isoliert von end-to-end Entdeckungsergebnissen optimiert werden. Dies führt zu einer Lücke zwischen Benchmark-Erfolg und realem experimentellen Einfluss. Eine pipeline-zentrierte Perspektive erfordert eine kontinuierliche Neubewertung von Aufgaben, Daten und Bewertungskriterien im Hinblick auf das ultimative Entdeckungsziel.
Agentische Systeme für die Materialentdeckung
Echte Beschleunigung der Entdeckung erfordert die Integration von KI-Fähigkeiten in kohärente, zielgerichtete Schleifen, die planen, ausführen und aus Experimenten lernen können. Agentische Systeme in der Materialforschung arbeiten nach dem Prinzip der Closed-Loop-Entdeckung. Sie definieren eine Kontrollstrategie, die Aktionen auf der Grundlage des Systemzustands auswählt, um ein langfristiges Ziel zu maximieren.
- Von passiver Vorhersage zu aktiver Kognition: Die Rolle der KI hat sich von einem passiven Analysewerkzeug zu einem aktiven Teilnehmer an der Entdeckung gewandelt.
- Autonome Schleifen: Selbstfahrende Labore (SDLs) kombinieren Roboterhardware mit KI-Modellen, um Ergebnisse vorherzusagen, Unsicherheiten zu bewerten und das nächste Experiment iterativ zu planen.
- Sichere Exploration und zukünftige Ökosysteme: Agentische Systeme müssen Exploration und Exploitation ausbalancieren, um lokale Optima zu vermeiden und Ressourcen effizient zu nutzen. Die Zukunft wird wissenschaftliche Multi-Agenten-Ökosysteme umfassen, in denen autonome Entitäten zusammenarbeiten.
Wissenschaftler-KI: Über Datenanpassung hinaus zu wissenschaftlichem Denken
Die neue Phase der KI in den Materialwissenschaften geht über genaue Vorhersagen hinaus und zielt darauf ab, dass die KI selbst wissenschaftliche Argumentation betreibt. Dies bedeutet, dass sie systematisch Überzeugungen über die physikalische Welt erwirbt, testet und verändert.
- Hypothesengenerierung und Suche in wissenschaftlichen Räumen: Scientist AI nutzt Priors, Evidenz und Unsicherheiten, um Wege im Materialraum zu empfehlen. Generative Modelle und LLMs tragen zur Formulierung von Hypothesen bei.
- Wissenschaftliches kritisches Denken: Chemisches Denken muss mit Unsicherheit, verfälschten und knappen Datensätzen sowie langsamen, kostspieligen experimentellen Rückmeldungen umgehen.
- Experiment- und Simulationsplanung (Entscheidungsfindung unter Unsicherheit): Planung bedeutet, die nächsten Aktionen auszuwählen, die auf der Grundlage der aktuellen Hypothesen, verfügbaren Daten und Priors am nützlichsten oder informativsten sind.
- Interpretation, Erklärung und Hypothesenrevision: Der interpretative Teil des Denkprozesses ist der Abgleich neuer Daten mit alten Hypothesen und die Entwicklung neuer Hypothesen, die die Fakten besser erklären.
Mensch-KI-Kollaboration in wissenschaftlichen Workflows
Die weitere Verbesserung von KI4MatSci wird nicht allein durch algorithmische und rechnerische Leistungsfähigkeit erreicht werden, sondern auch durch die Gestaltung von Mensch-KI-Workflows, die menschliche Intuition und Domänenkenntnisse mit maschineller Argumentation, Datenintegration und Hypothesengenerierung verbinden. Interpretierbare Modelle und große Sprachsysteme ermöglichen eine kontinuierliche Rückkopplung zwischen menschlicher Intuition und maschinellem Denken. Mit zunehmender Autonomie von KI-Systemen verschiebt sich die Rolle menschlicher Wissenschaftler hin zu übergeordnetem Management, ethischer Aufsicht und konzeptueller Innovation.
Pipeline-zentrierte Perspektive
Die Agenten müssen aus Erfahrungen lernen, da Hypothesen durch Simulationen und Experimente getestet werden müssen. Aktuelle agentische Systeme sind noch unvollständig, da Rückmeldungen aus den Entdeckungserfolgen und -misserfolgen nicht umfassend in frühere Prozessschritte zurückgeführt werden. Zudem arbeiten die meisten agentischen Systeme in Simulationen und nicht in realen experimentellen Umgebungen. Echte agentische Systeme für die Materialentdeckung müssen eng mit Roboterlaboren und experimentellen Rückmeldungen verbunden sein.
Diskussion und Fazit
Die hier vorgestellte pipeline-zentrierte Sichtweise integriert Erkenntnisse aus der Korpus-Kuration, dem Pre-Training, der Domänenanpassung und dem Instruction Tuning in zielgerichtete, agentische LLMs, die in offenen experimentellen Umgebungen operieren. Durch die gemeinsame Organisation des Feldes um drei Leitfragen werden maschinelles Lernen und materialwissenschaftliche Perspektiven in einem einzigen konzeptionellen Rahmen für die autonome Materialentdeckung zusammengeführt.
Es wird argumentiert, dass die gesamte Pipeline abstimmbar sein sollte, angetrieben vom Endziel der Entdeckung neuartiger Materialien. Dies bedeutet, dass nicht nur neuronale Netze trainiert werden, sondern auch Daten- und Kuratierungsrichtlinien, adaptive Pre-Training-Korpora, Retrieval- und Tool-Calling-Strategien sowie Experimentvorschläge und sogar Aspekte des experimentellen Protokolls selbst gelernt werden, alles unter Nutzung der ultimativen Belohnungssignale.
Agentische LLMs können sowohl in virtuellen als auch in realen Umgebungen operieren. Es ist praktikabel, sie zunächst in der virtuellen Welt zu trainieren und dann in die reale materialwissenschaftliche Umgebung zu übertragen und dort kontinuierlich zu trainieren. Ein wesentlicher Engpass ist die begrenzte Zugänglichkeit der Laborautomatisierung. Die Demokratisierung der Laborautomatisierung ist entscheidend, um der anhaltenden Datenknappheit in den Materialwissenschaften zu begegnen.
Die Konvergenz von KI und Biowissenschaften birgt biosekuritäre Risiken. Ähnlich können LLM-Agenten-gesteuerte Materialentdeckung und autonome Laborworkflows zwar nützliche Forschung beschleunigen, aber auch Missbrauch erleichtern (z.B. Entwicklung von Materialien mit gefährlichen Eigenschaften). Daher ist die Entwicklung vertrauenswürdiger und sicherer LLM-Agenten für die materialwissenschaftliche Forschung von entscheidender Bedeutung.
Die "Empowerment-Plasticity"-Sichtweise legt nahe, dass die Plastizität eines Agenten der Empowerment der Umgebung entspricht und umgekehrt. Dies bedeutet, dass der Agent seine Plastizität dynamisch anpassen muss, um adverses Empowerment der Umgebung zu begrenzen. Um die Gesamterträge des gesamten Systems über die Zeit zu maximieren, sollte die Umgebung, in der der Agent agiert, eine hohe Plastizität aufweisen, z.B. mit mehr steuerbaren Komponenten oder Werkzeugen.
Die Materialwissenschaft sollte jeden Modul im Arbeitsablauf routinemäßig überprüfen und fragen, ob er unter Lern- oder Optimierungsdruck gesetzt werden kann, anstatt als feststehende Heuristik behandelt zu werden. Die pipeline-zentrierte, end-to-end Perspektive betrachtet den Materialentdeckungszyklus als ein komplexes System, in dem so viele Komponenten wie möglich trainierbar sind und in dem Signale erfolgreicher oder gescheiterter Entdeckungen im Laufe der Zeit sowohl Modelle als auch vorgelagerte Pipeline-Entscheidungen neu formen können, um das ultimative Ziel, neuartige, nützliche und sichere Materialien zu finden, besser zu erreichen.
Bibliographie
* M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
* H. Abdi and L. J. Williams (2010) Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4), pp. 433–459.
* D. Abel, M. Bowling, A. Barreto, W. Dabney, S. Dong, S. Hansen, A. Harutyunyan, K. Khetarpal, C. Lyle, R. Pascanu, et al. (2025) Plasticity as the mirror of empowerment. arXiv preprint arXiv:2505.10361.
* M. Abolhasani and E. Kumacheva (2023) The rise of self-driving labs in chemical and materials sciences. Nature Synthesis 2(6), pp. 483–492.
* E. C. Acikgoz, C. Qian, H. Ji, D. Hakkani-Tür, and G. Tur (2025) Self-improving llm agents at test-time. In arxiv.
* D. Adak, Y. S. Rawat, and S. Vyas (2025) MolVision: molecular property prediction with vision language models. arXiv preprint arXiv:2507.03283.
* F. Adams, A. McDannald, I. Takeuchi, and A. G. Kusne (2024) Human-in-the-loop for bayesian autonomous materials phase mapping. Matter 7(2), pp. 697–709.
* C. Agarwal, O. Queen, H. Lakkaraju, and M. Zitnik (2023) Evaluating explainability for graph neural networks. Scientific Data 10(1), pp. 144.
* C. Agarwal, S. H. Tanneru, and H. Lakkaraju (2024) Faithfulness vs. plausibility: on the (un)reliability of explanations from large language models. External Links: 2402.04614.
* N. Alampara, I. Mandal, P. Khetarpal, H. S. Grover, M. Schilling-Wilhelmi, N. M. A. Krishnan, and K. M. Jablonka (2024a) MaCBench: a multimodal chemistry and materials science benchmark. In AI for Accelerated Materials Design - NeurIPS 2024.
* N. Alampara, S. Miret, and K. M. Jablonka (2024b) MatText: do language models need more than text & scale for materials modeling?. arXiv preprint arXiv:2406.17295.
* J. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. Binkowski, R. Barreira, O. Vinyals, K. Zisserman, and K. Simonyan (2022) Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022).
* A. K. Alkan, S. Sourav, M. Jablonska, S. Astarita, R. Chakrabarty, N. Garuda, P. Khetarpal, M. Pióro, D. Tanoglidis, K. G. Iyer, M. S. Polimera, M. J. Smith, T. Ghosal, M. Huertas-Company, S. Kruk, K. Schawinski, and I. Ciucă (2025) A survey on hypothesis generation for scientific discovery in the era of large language models. External Links: 2504.05496.
* S. I. Allec and M. Ziatdinov (2025) Active and transfer learning with partially bayesian neural networks for materials and chemicals. Digital Discovery 4(5), pp. 1284–1297.
* D. Alvarez-Melis and T. S. Jaakkola (2018) Towards robust interpretability with self-explaining neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Red Hook, NY, USA, pp. 7786–7795.
* B. Amirian, A. S. Dale, S. Kalinin, and J. Hattrick-Simpers (2025a) Building trustworthy ai for materials discovery: from autonomous laboratories to z-scores. External Links: 2512.01080.
* B. Amirian, A. S. Dale, S. Kalinin, and J. Hattrick-Simpers (2025b) Building trustworthy ai for materials discovery: from autonomous laboratories to z-scores. External Links: 2512.01080.
* Y. An, J. Greenberg, A. Kalinowski, X. Zhao, X. Hu, F. J. Uribe-Romo, K. Langlois, J. Furst, and D. A. Gómez-Gualdrón (2023) Knowledge graph question answering for materials science (kgqa4mat): developing natural language interface for metal-organic frameworks knowledge graph (mof-kg) using llm. arXiv preprint arXiv:2309.11361.
* C. W. Andersen, R. Armiento, E. Blokhin, G. J. Conduit, S. Dwaraknath, M. L. Evans, Å. Åkesson, A. Fekete, A. Gopakumar, S. Gražulis, et al. (2021) OPTIMADE, an API for exchanging materials data. Scientific data 8(1), pp. 217.
* J. A. Anderson (1995) An introduction to neural networks. MIT press.
* N. H. Angello, D. M. Friday, C. Hwang, S. Yi, A. H. Cheng, T. C. Torres-Flores, E. R. Jira, W. Wang, A. Aspuru-Guzik, M. D. Burke, et al. (2024) Closed-loop transfer enables artificial intelligence to yield chemical knowledge. Nature 633(8029), pp. 351–358.
* M. Ansari and S. M. Moosavi (2024) Agent-based learning of materials datasets from the scientific literature. Digital Discovery 3(12), pp. 2607–2617.
* D. M. Anstine and O. Isayev (2023) Generative models as an emerging paradigm in the chemical sciences. Journal of the American Chemical Society 145(16), pp. 8736–8750.
* Anthropic (2025) Claude 3.7 sonnet and claude code. Note: Anthropic News Accessed November 2025.
* L. M. Antunes, K. T. Butler, and R. Grau-Crespo (2023) Crystal structure generation with autoregressive large language modeling. arXiv preprint arXiv:2307.04340.
* L. M. Antunes, K. T. Butler, and R. Grau-Crespo (2024) Crystal structure generation with autoregressive large language modeling. External Links: 2307.04340.
* R. Aoki, F. Tung, and G. L. Oliveira (2022) Heterogeneous multi-task learning with expert diversity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19(6), pp. 3093–3102.
* N. Author (2025) Agentic assistant for material scientists. ChemRxiv. Note: Preprint.
* Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. (2022) Constitutional ai: harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
* A. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek (2002) Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311(3–4), pp. 590–614.
* A. G. Barto (2024) In the beginning ml was rl. Note: Video Reinforcement Learning Conference (RLC 2024), Oct 1, 2024. Edited by Gor Baghdasaryan.
* P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.
* S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky (2022a) E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications 13(1), pp. 2453.
* S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky (2022b) E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications 13(1), pp. 2453.
* C. Beeler, S. G. Subramanian, K. Sprague, N. Chatti, C. Bellinger, M. Shahen, N. Paquin, M. Baula, A. Dawit, Z. Yang, X. Li, M. Crowley, and I. Tamblyn (2023) ChemGymRL: an interactive framework for reinforcement learning for digital chemistry. External Links: 2305.14177.
* P. Belcak, G. Heinrich, S. Diao, Y. Fu, X. Dong, S. Muralidharan, Y. C. Lin, and P. Molchanov (2025) Small language models are the future of agentic AI. arXiv preprint arXiv:2506.02153.
* I. Beltagy, K. Lo, and A. Cohan (2019) SciBERT: a pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3615–3620.
* A. Bender and I. Cortés-Ciriano (2021) Artificial intelligence in drug discovery: what is realistic, what are illusions? part 1: ways to make an impact, and why we are not there yet. Drug discovery today 26(2), pp. 511–524.
* Y. Bengio et al. (2025) Statement on biosecurity risks at the convergence of ai and the life sciences. Note: [https://www.nti.org/analysis/articles/statement-on-biosecurity-risks-at-the-convergence-of-ai-and-the-life-sciences/] Nuclear Threat Initiative. Accessed: 2025-08-14.
* M. Bensberg and M. Reiher (2024) Uncertainty-aware first-principles exploration of chemical reaction networks. The Journal of Physical Chemistry A 128(22), pp. 4532–4547.
* J. Bi, Y. Xu, F. Conrad, H. Wiemer, and S. Ihlenfeldt (2025) A comprehensive benchmark of active learning strategies with automl for small-sample regression in materials science. Scientific Reports 15(1), pp. 37167.
* A. Biswas, J. Rade, N. Masud, M. H. H. Hasib, A. Balu, J. Zhang, S. Sarkar, A. Krishnamurthy, J. Ren, and A. Sarkar (2025) Conversational llm-based decision support for defect classification in afm images. IEEE Open Journal of Instrumentation and Measurement.
* A. Biswas, Y. Liu, N. Creange, Y. Liu, S. Jesse, J. Yang, S. V. Kalinin, M. A. Ziatdinov, and R. K. Vasudevan (2024) A dynamic bayesian optimized active recommender system for curiosity-driven partially human-in-the-loop automated experiments. npj Computational Materials 10(1), pp. 29.
* W. Blau, V. G. Cerf, J. Enriquez, J. S. Francisco, U. Gasser, M. L. Gray, M. Greaves, B. J. Grosz, K. H. Jamieson, G. H. Haug, et al. (2024) Protecting scientific integrity in an age of generative ai. Vol. 121, National Academy of Sciences.
* D. A. Boiko, R. MacKnight, and G. Gomes (2023a) Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332.
* D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes (2023b) Autonomous chemical research with large language models. Nature 624(7992), pp. 570–578.
* D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes (2023c) Autonomous chemical research with large language models. Nature 624(7992), pp. 570–578.
* M. Bollaert, O. Augereau, and G. Coppin (2023) Measuring and calibrating trust in artificial intelligence. In IFIP Conference on Human-Computer Interaction, pp. 232–237.
* R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Goel, N. Goodman, S. Grossman, N. Guha, T. Hashimoto, P. Henderson, J. Hewitt, D. E. Ho, J. Hong, K. Hsu, J. Huang, T. Icard, S. Jain, D. Jurafsky, P. Kalluri, S. Karamcheti, G. Keeling, F. Khani, O. Khattab, P. W. Koh, M. Krass, R. Krishna, R. Kuditipudi, A. Kumar, F. Ladhak, M. Lee, T. Lee, J. Leskovec, I. Levent, X. L. Li, X. Li, T. Ma, A. Malik, C. D. Manning, S. Mirchandani, E. Mitchell, Z. Munyikwa, S. Nair, A. Narayan, D. Narayanan, B. Newman, A. Nie, J. C. Niebles, H. Nilforoshan, J. Nyarko, G. Ogut, L. Orr, I. Papadimitriou, J. S. Park, C. Piech, E. Portelance, C. Potts, A. Raghunathan, R. Reich, H. Ren, F. Rong, Y. Roohani, C. Ruiz, J. Ryan, C. Ré, D. Sadigh, S. Sagawa, K. Santhanam, A. Shih, K. Srinivasan, A. Tamkin, R. Taori, A. W. Thomas, F. Tramèr, R. E. Wang, W. Wang, B. Wu, J. Wu, Y. Wu, S. M. Xie, M. Yasunaga, J. You, M. Zaharia, M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng, K. Zhou, and P. Liang (2022a) On the opportunities and risks of foundation models. External Links: 2108.07258.
* R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Goel, N. Goodman, S. Grossman, N. Guha, T. Hashimoto, P. Henderson, J. Hewitt, D. E. Ho, J. Hong, K. Hsu, J. Huang, T. Icard, S. Jain, D. Jurafsky, P. Kalluri, S. Karamcheti, G. Keeling, F. Khani, O. Khattab, P. W. Koh, M. Krass, R. Krishna, R. Kuditipudi, A. Kumar, F. Ladhak, M. Lee, T. Lee, J. Leskovec, I. Levent, X. L. Li, X. Li, T. Ma, A. Malik, C. D. Manning, S. Mirchandani, E. Mitchell, Z. Munyikwa, S. Nair, A. Narayan, D. Narayanan, B. Newman, A. Nie, J. C. Niebles, H. Nilforoshan, J. Nyarko, G. Ogut, L. Orr, I. Papadimitriou, J. S. Park, C. Piech, E. Portelance, C. Potts, A. Raghunathan, R. Reich, H. Ren, F. Rong, Y. Roohani, C. Ruiz, J. Ryan, C. Ré, D. Sadigh, S. Sagawa, K. Santhanam, A. Shih, K. Srinivasan, A. Tamkin, R. Taori, A. W. Thomas, F. Tramèr, R. E. Wang, W. Wang, B. Wu, J. Wu, Y. Wu, S. M. Xie, M. Yasunaga, J. You, M. Zaharia, M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng, K. Zhou, and P. Liang (2022b) On the opportunities and risks of foundation models. External Links: 2108.07258.
* R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, D. Byrd, …, and P. Liang (2021) On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
* S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J. Lespiau, B. Damoc, A. Guy, J. Menick, R. Ring, T. Hennigan, S. Huang, L. Maggiore, C. Jones, A. Cassirer, A. Brock, M. Paganini, G. Irving, O. Vinyals, S. Osindero, K. Simonyan, J. Rae, E. Elsen, and L. Sifre (2022) Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, pp. 2206–2240.
* C. J. Brabec, V. Dyakonov, J. Parisi, and N. S. Sariciftci (2003) Organic photovoltaics: concepts and realization. Vol. 60, Springer Science & Business Media.
* A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller (2024) ChemCrow: augmenting large-language models with chemistry tools. Nature Machine Intelligence 6, pp. 525–535.
* L. Breiman (2001) Random forests. Machine learning 45(1), pp. 5–32.
* T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, A. Saxe, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei (2020) Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33, pp. 1877–1901.
* B. Burger, P. M. Maffettone, V. V. Gusev, C. M. Aitchison, Y. Bai, X. Wang, X. Li, B. M. Alston, B. Li, R. Clowes, N. Rankin, B. Harris, R. S. Sprick, and A. I. Cooper (2020) A mobile robotic chemist. Nature 583(7815), pp. 237–241.
* K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh (2018) Machine learning for molecular and materials science. Nature 559(7715), pp. 547–555.
* D. A. Calian, G. Farquhar, I. Kemaev, L. M. Zintgraf, M. Hessel, J. Shar, J. Oh, A. György, T. Schaul, J. Dean, H. van Hasselt, and D. Silver (2025) DataRater: meta-learned dataset curation. External Links: 2505.17895.
* R. Caruana (1997) Multitask learning. Machine learning 28(1), pp. 41–75.
* J. M. Cavanagh, K. Sun, A. Gritsevskiy, D. Bagni, Y. Wang, T. D. Bannister, and T. Head-Gordon (2025) SmileyLlama: modifying large language models for directed chemical space exploration. External Links: 2409.02231.
* H. Chaib, L. Mohammedi, L. Benmebrouk, A. Boukraa, B. Daoudi, and A. Achouri (2020) Effect of metal atom substitutions in li based hydrides for hydrogen storage. International Journal of Hydrogen Energy 45(53), pp. 28920–28929.
* I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos (2020) LEGAL-BERT: the muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu (Eds.), Online, pp. 2898–2904.
* J. Chang, P. Nikolaev, J. Carpena-Núñez, R. Rao, K. Decker, A. E. Islam, J. Kim, M. A. Pitt, J. I. Myung, and B. Maruyama (2020) Efficient closed-loop maximization of carbon nanotube growth rate using bayesian optimization. Scientific reports 10(1), pp. 9040.
* L. Chanussot, A. Das, S. Goyal, T. Lavril, M. Shuaibi, M. Riviere, K. Tran, J. Heras-Domingo, C. Ho, W. Hu, et al. (2021a) The open catalyst 2020 (OC20) dataset and community challenges. ACS Catalysis 11(10), pp. 6059–6072.
* L. Chanussot, A. Das, S. Goyal, T. Lavril, M. Shuaibi, M. Riviere, K. Tran, J. Heras-Domingo, C. Ho, W. Hu, A. Palizhati, A. Sriram, B. Wood, J. Yoon, D. Parikh, C. L. Zitnick, and Z. Ulissi (2021b) Open catalyst 2020 (oc20) dataset and community challenges. ACS Catalysis 11(10), pp. 6059–6072.
* H. Chase (2022) LangChain.
* C. Chen and S. P. Ong (2022) A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science 2, pp. 718–728.
* C. Chen, W. Ye, Y. Zuo, C. Zheng, and S. P. Ong (2019) Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of materials 31(9), pp. 3564–3572.
* M. Chen, J. Tworek, H. Jun, Q. Yuan, H. Pondé, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. W. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, I. Babuschkin, S. Balaji, S. Jain, A. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba (2021) Evaluating large language models trained on code. ArXiv abs/2107.03374.
* Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro (2020) Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Optics express 28(8), pp. 11618–11633.
* Z. Chen, Y. Xie, Y. Wu, Y. Lin, S. Tomiya, and J. Lin (2024) An interpretable and transferrable vision transformer model for rapid materials spectra classification. Digital Discovery 3(2), pp. 369–380.
* Z. Chen, Y. Luo, and M. Sra (2025) Engaging with ai: how interface design shapes human-ai collaboration in high-stakes decision-making. External Links: 2501.16627.
* S. R. Chitturi, A. Ramdas, Y. Wu, B. Rohr, S. Ermon, J. Dionne, F. H. d. Jornada, M. Dunne, C. Tassone, W. Neiswanger, et al. (2024) Targeted materials discovery using bayesian algorithm execution. npj Computational Materials 10(1), pp. 156.
* K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
* J. Choi and B. Lee (2024) Accelerating materials language processing with large language models. Communications Materials 5(1), pp. 13.
* K. Choudhary, B. DeCost, C. Chen, et al. (2022) Recent advances and applications of deep learning methods in materials science. npj Computational Materials 8(59).
* K. Choudhary and B. DeCost (2021a) Atomistic line graph neural network for improved materials property predictions. npj Computational Materials 7(185). Note: Published 15 November 2021.
* K. Choudhary and B. DeCost (2021b) Atomistic line graph neural network for improved materials property predictions. npj Computational Materials 7(1), pp. 185.
* K. Choudhary (2025) MicroscopyGPT: generating atomic-structure captions from microscopy images of 2d materials with vision-language transformers. The Journal of Physical Chemistry Letters 16, pp. 7028–7035.
* J. Chung, J. Zhang, A. I. Saimon, Y. Liu, B. N. Johnson, and Z. Kong (2024) Imbalanced spectral data analysis using data augmentation based on the generative adversarial network. Scientific Reports 14(1), pp. 13230.
* C. Cleeton and L. Sarkisov (2025) Inverse design of metal-organic frameworks using deep dreaming approaches. Nature Communications 16(1), pp. 4806.
* K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman (2021) Training verifiers to solve math word problems. ArXiv abs/2110.14168.
* N. Computation (2016) Long short-term memory. Neural Comput 9, pp. 1735–1780.
* C. Cortes and V. Vapnik (1995) Support-vector networks. Machine learning 20(3), pp. 273–297.
* C. J. Court and J. M. Cole (2020) Magnetic and superconducting phase diagrams and transition temperatures predicted using text mining and machine learning. npj Computational Materials 6(1), pp. 18.
* H. Cui and T. Yasseri (2024) AI-enhanced collective intelligence. Patterns 5(11).
* V. T. da Silva, A. Rademaker, K. Lionti, R. Giro, G. Lima, S. Fiorini, M. Archanjo, B. W. Carvalho, R. Neumann, A. Souza, et al. (2024) Automated, llm enabled extraction of synthesis details for reticular materials from scientific literature. arXiv preprint arXiv:2411.03484.
* J. Dagdelen, A. Dunn, S. Lee, N. Walker, A. S. Rosen, G. Ceder, K. A. Persson, and A. Jain (2024) Structured information extraction from scientific text with large language models. Nature Communications 15(1), pp. 1418.
* Y. Dan, D. Jha, A. Gupta, and A. Allu (2024) Pretraining strategies for structure agnostic material property prediction. Journal of Chemical Information and Modeling 64(5), pp. 1625–1635.
* K. Das, B. Samanta, P. Goyal, S. Lee, S. Bhattacharjee, and N. Ganguly (2022) CrysXPP: an explainable property predictor for crystalline materials. npj Computational Materials 8(1), pp. 43.
* P. De Breuck, M. L. Evans, and G. Rignanese (2022) Accurate experimental band gap predictions with multifidelity correction learning. Journal of Materials Informatics 2(1), pp. 10.
* R. Dehghannasiri, D. Xue, P. V. Balachandran, M. R. Yousefi, L. A. Dalton, T. Lookman, and E. R. Dougherty (2017) Optimal experimental design for materials discovery. Computational Materials Science 129, pp. 311–322.
* B. Deng, P. Zhong, K. Jun, J. Riebesell, K. Han, C. J. Bartel, and G. Ceder (2023) CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence 5(9), pp. 1031–1041.
* J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186.
* Q. Ding, S. Miret, and B. Liu (2025) MatExpert: decomposing materials discovery by mimicking human experts. In The Thirteenth International Conference on Learning Representations.
* F. Dinic, Z. Wang, I. Neporozhnii, U. B. Salim, R. Bajpai, N. Rajiv, V. Chavda, V. Radhakrishnan, and O. Voznyy (2023) Strain data augmentation enables machine learning of inorganic crystal geometry optimization. Patterns 4(2).
* S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mahmood, and R. S. Sutton (2024) Loss of plasticity in deep continual learning. Nature 632(8026), pp. 768–774.
* Z. Du, L. Jin, L. Shu, Y. Cen, Y. Xu, Y. Mei, and H. Zhang (2024) CTGNN: crystal transformer graph neural network for crystal material property prediction. arXiv preprint arXiv:2405.11502.
* A. Dunn, Q. Wang, A. Ganose, D. Dopp, and A. Jain (2020) Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. npj Computational Materials 6(1), pp. 138.
* N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, P. West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi (2023) Faith and fate: limits of transformers on compositionality. External Links: 2305.18654.
* C. Edwards, C. Han, G. Lee, T. Nguyen, S. Szymkuc, C. K. Prasad, B. Jin, J. Han, Y. Diao, G. Liu, H. Peng, B. A. Grzybowski, M. D. Burke, and H. Ji (2025) MCLM: a modular chemical language model that generates functional and makeable molecules. In arxiv.
* C. Edwards, T. Lai, K. Ros, G. Honke, K. Cho, and H. Ji (2022) Translation between molecules and natural language. In Proc. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP2022).
* R. W. Epps, M. S. Bowen, A. A. Volk, K. Abdel-Latif, S. Han, K. G. Reyes, A. Amassian, and M. Abolhasani (2020) Artificial chemist: an autonomous quantum dot synthesis bot. Advanced Materials 32(30), pp. 2001626.
* D. Eriksson, M. Pearce, J. R. Gardner, R. Turner, and M. Poloczek (2019) Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
* F. Faber, A. Lindmaa, O. A. Von Lilienfeld, and R. Armiento (2015) Crystal structure representations for machine learning models of formation energies. International Journal of Quantum Chemistry 115(16), pp. 1094–1101.
* Q. Fang, G. Xiong, M. Zhou, T. S. Tamir, C. Yan, H. Wu, Z. Shen, and F. Wang (2022a) Process monitoring, diagnosis and control of additive manufacturing. IEEE Transactions on Automation Science and Engineering 21(1), pp. 1041–1067.
* Y. Fang, Q. Zhang, H. Yang, X. Zhuang, S. Deng, W. Zhang, M. Qin, Z. Chen, X. Fan, and H. Chen (2022b) Molecular contrastive learning with chemical element knowledge graph. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36, pp. 3968–3976.
* Y. Fang, Q. Zhang, N. Zhang, Z. Chen, X. Zhuang, X. Shao, X. Fan, and H. Chen (2023) Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence 5(5), pp. 542–553.
* Z. Fang and J. Zhan (2019) Deep physical informed neural networks for metamaterial design. Ieee Access 8, pp. 24506–24513.
* C. Fare, P. Fenner, M. Benatan, A. Varsi, and E. O. Pyzer-Knapp (2022) A multi-fidelity machine learning approach to high throughput materials screening. npj Computational Materials 8(1), pp. 257.
* C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135.
* G. First Author, G. Second Author, et al. (2025) MatVQA: a visual question answering benchmark for materials science. External Links: 2505.18319.
* V. Fung, J. Zhang, G. Hu, P. Ganesh, and B. G. Sumpter (2021a) Inverse design of two-dimensional materials with invertible neural networks. External Links: 2106.03013.
* V. Fung, J. Zhang, E. Juarez, and B. G. Sumpter (2021b) Benchmarking graph neural networks for materials chemistry. npj Computational Materials 7(1), pp. 84.
* J. Gan, P. Zhong, Y. Du, Y. Zhu, C. Duan, H. Wang, D. Schwalbe-Koda, C. P. Gomes, K. Persson, and W. Wang (2025) Large language models are innate crystal structure generators. In AI for Accelerated Materials Design - ICLR 2025.
* A. M. Ganose, A. J. Jackson, and D. O. Scanlon (2019) Robocrystallographer: automated crystal structure text descriptions and analysis. MRS Communications 9(3), pp. 874–881.
* H. Gao, J. Geng, W. Hua, M. Hu, X. Juan, H. Liu, S. Liu, J. Qiu, X. Qi, Y. Wu, H. Wang, H. Xiao, Y. Zhou, S. Zhang, J. Zhang, J. Xiang, Y. Fang, Q. Zhao, D. Liu, Q. Ren, C. Qian, Z. Wang, M. Hu, H. Wang, Q. Wu, H. Ji, and M. Wang (2025) A survey of self-evolving agents: on path to artificial super intelligence. In arxiv.
* L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, J. Callan, and G. Neubig (2023) PAL: program-aided language models. In Proceedings of the 40th International Conference on Machine Learning, ICML’23.
* C. E. Garcia, D. M. Prett, and M. Morari (1989) Model predictive control: theory and practice—a survey. Automatica 25(3), pp. 335–348.
* S. Garg, D. Tsipras, P. Liang, and G. Valiant (2022) What can transformers learn in-context? a case study of simple function classes. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA.
* Gemini Robotics Team, A. Abdolmaleki, S. Abeyruwan, J. Ainslie, J. Alayrac, M. G. Arenas, A. Balakrishna, N. Batchelor, A. Bewley, J. Bingham, M. Bloesch, K. Bousmalis, P. Brakel, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, C. Chan, O. Chang, L. Chappellet-Volpini, J. E. Chen, X. Chen, H. L. Chiang, K. Choromanski, A. Collister, D. B. D’Ambrosio, S. Dasari, T. Davchev, M. K. Dave, C. Devin, N. Di Palo, T. Ding, C. Doersch, A. Dostmohamed, Y. Du, D. Dwibedi, S. T. Egambaram, M. Elabd, T. Erez, X. Fang, C. Fantacci, C. Fong, E. Frey, C. Fu, R. Gao, M. Giustina, K. Gopalakrishnan, L. Graesser, O. Groth, A. Gupta, R. Hafner, S. Hansen, L. Hasenclever, S. Haves, N. Heess, B. Hernaez, A. Hofer, J. Hsu, L. Huang, S. H. Huang, A. Iscen, M. G. Jacob, D. Jain, S. Jesmonth, A. Jindal, R. Julian, D. Kalashnikov, M. E. Karagozler, S. Karp, M. Kecman, J. C. Kew, D. Kim, F. Kim, J. Kim, T. Kipf, S. Kirmani, K. Konyushkova, L. Y. Ku, Y. Kuang, T. Lampe, A. Laurens, T. A. Le, I. Leal, A. X. Lee, T. E. Lee, G. Lever, J. Liang, L. Lin, F. Liu, S. Long, C. Lu, S. Maddineni, A. Majumdar, K. Maninis, A. Marmon, S. Martinez, A. H. Michaely, N. Milonopoulos, J. Moore, R. Moreno, M. Neunert, F. Nori, J. Ortiz, K. Oslund, C. Parada, E. Parisotto, A. Paryag, A. Pooley, T. Power, A. Quaglino, H. Qureshi, R. V. Raju, H. Ran, D. Rao, K. Rao, I. Reid, D. Rendleman, K. Reymann, M. Rivas, F. Romano, Y. Rubanova, P. P. Sampedro, P. R. Sanketi, D. Shah, M. Sharma, K. Shea, M. Shridhar, C. Shu, V. Sindhwani, S. Singh, R. Soricut, R. Sterneck, I. Storz, R. Surdulescu, J. Tan, J. Tompson, S. Tunyasuvunakool, J. Varley, G. Vesom, G. Vezzani, M. B. Villalonga, O. Vinyals, R. Wagner, A. Wahid, S. Welker, P. Wohlhart, C. Wu, M. Wulfmeier, F. Xia, T. Xiao, A. Xie, J. Xie, P. Xu, S. Xu, Y. Xu, Z. Xu, J. Yan, S. Yang, S. Yang, Y. Yang, H. H. Yu, W. Yu, W. Yuan, Y. Yuan, J. Zhang, T. Zhang, Z. Zhang, A. Zhou, G. Zhou, and Y. Zhou (2025) Gemini robotics 1.5: pushing the frontier of generalist robots with advanced embodied reasoning, thinking, and motion transfer. External Links: 2510.03342.
* A. Ghafarollahi and M. J. Buehler (2024a) SciAgents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning. Advanced Materials 37(22), pp. e2413523.
* A. Ghafarollahi and M. J. Buehler (2024b) SciAgents: automating scientific discovery through multi-agent intelligent graph reasoning. External Links: 2409.05556.
* L. M. Ghiringhelli, C. Baldauf, T. Bereau, S. Brockhauser, C. Carbogno, J. Chamanara, S. Cozzini, S. Curtarolo, C. Draxl, S. Dwaraknath, et al. (2023) Shared metadata for data-centric materials science. Scientific data 10(1), pp. 626.
* S. Ghosh and A. Tewari (2025) Automated extraction of material properties using llm-based ai agents. arXiv preprint arXiv:2510.01235.
* J. Gibson, A. Hire, and R. G. Hennig (2022) Data-augmentation for graph neural network learning of the relaxed energies of unrelaxed structures. npj Computational Materials 8(1), pp. 211.
* J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272.
* A. Gladstone, G. Nanduru, M. M. Islam, P. Han, H. Ha, A. Chadha, Y. Du, H. Ji, J. Li, and T. Iqbal (2025) Energy-based transformers are scalable learners and thinkers. External Links: 2507.02092.
* R. E. Goodall and A. A. Lee (2020) Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nature communications 11(1), pp. 6280.
* R. E. Goodall and A. A. Lee (2024) Pretraining strategies for structure agnostic material property prediction. Journal of Chemical Information and Modeling.
* I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press.
* I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. External Links: 1406.2661.
* P. Govindarajan, M. Reymond, A. Clavaud, M. Phielipp, S. Miret, and S. Chandar (2025) CrystalGym: a new benchmark for materials discovery using reinforcement learning. arXiv preprint arXiv:2509.23156.
* S. Gravitas (2023) AutoGPT: Build, deploy, and run AI agents. Note: Latest release: October 22, 2025.
* R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukošiūtė, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, and S. R. Bowman (2023a) Studying large language model generalization with influence functions. External Links: 2308.03296.
* R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukosiute, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, and S. R. Bowman (2023b) Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296.
* N. Gruver, A. Sriram, A. Madotto, A. G. Wilson, C. L. Zitnick, and Z. W. Ulissi (2024) Fine-tuned language models generate stable inorganic materials as text. In The Twelfth International Conference on Learning Representations.
* A. Gu and T. Dao (2024) Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752.
* A. Gu, K. Goel, and C. Ré (2021) Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396.
* S. Guha, A. Mullick, J. Agrawal, S. Ram, S. Ghui, S. Lee, S. Bhattacharjee, and P. Goyal (2021) MatScIE: an automated tool for the generation of databases of methods and parameters used in the computational materials science literature. Computational Materials Science 192, pp. 110325.
* T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang (2024) Large language model based multi-agents: a survey of progress and challenges. arXiv preprint arXiv:2402.01680.
* Z. Guo, C. Zhang, W. Yu, J. Herr, O. Wiest, M. Jiang, and N. V. Chawla (2021) Few-shot graph learning for molecular property prediction. In Proceedings of the web conference 2021, pp. 2559–2567.
* S. Gupta, A. Mahmood, P. Shetty, A. Adeboye, and R. Ramprasad (2024) Data extraction from polymer literature using large language models. Communications materials 5(1), pp. 269.
* T. Gupta, M. Zaki, N. A. Krishnan, and Mausam (2022) MatSciBERT: a materials domain language model for text mining and information extraction. npj Computational Materials 8(1), pp. 102.
* B. He, H. Li, Y. K. Jang, M. Jia, X. Cao, A. Shah, A. Shrivastava, and S. Lim (2024) Ma-lmm: memory-augmented large multimodal model for long-term video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13504–13514.
* T. He, H. Huo, C. J. Bartel, Z. Wang, K. Cruse, and G. Ceder (2023) Precursor recommendation for inorganic synthesis by machine learning materials similarity from scientific literature. Science advances 9(23), pp. eadg8180.
* K. Hira, M. Zaki, D. Sheth, N. A. Krishnan, et al. (2024) Reconstructing the materials tetrahedron: challenges in materials information extraction. Digital Discovery 3(5), pp. 1021–1037.
* A. Holzinger, A. Carrington, and H. Müller (2020) Measuring the quality of explanations: the system causability scale (scs). KI - Künstliche Intelligenz 34(2), pp. 193–198.
* M. J. Hooshmand, C. Sakib-Uz-Zaman, and M. A. H. Khondoker (2023) Machine learning algorithms for predicting mechanical stiffness of lattice structure-based polymer foam. Materials 16(22), pp. 7173.
* G. Houchins, Z. Yuan, and R. Arroyave (2024) Formation energy prediction of material crystal structures using deep learning. arXiv preprint arXiv:2401.00859.
* C. Huang, C. Chen, L. Shi, and C. Chen (2024a) Material property prediction with element attribute knowledge graphs and multimodal representation learning. arXiv preprint arXiv:2411.08414.
* H. Huang, R. Magar, and A. Barati Farimani (2024b) Pretraining strategies for structure agnostic material property prediction. Journal of Chemical Information and Modeling 64(3), pp. 627–637.
* S. Huang and J. M. Cole (2020) A database of battery materials auto-generated using chemdataextractor. Scientific Data 7(1), pp. 260.
* S. Huang and J. M. Cole (2022) BatteryBERT: a pretrained language model for battery database enhancement. Journal of chemical information and modeling 62(24), pp. 6365–6377.
* H. Huo, C. J. Bartel, T. He, A. Trewartha, A. Dunn, B. Ouyang, A. Jain, and G. Ceder (2022) Machine-learning rationalization and prediction of solid-state synthesis conditions. Chemistry of Materials 34(16), pp. 7323–7336.
* S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456.
* T. V. Ivanisenko, P. S. Demenkov, and V. A. Ivanisenko (2024) An accurate and efficient approach to knowledge extraction from scientific publications using structured ontology models, graph neural networks, and large language models. International Journal of Molecular Sciences 25(21), pp. 11811.
* R. Jacobs, P. E. Goins, and D. Morgan (2023) Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap. Machine Learning: Science and Technology 4(4), pp. 045060.
* A. Jain et al. (2013) The materials project: a materials genome approach to accelerating materials innovation. APL Materials 1(1), pp. 011002.
* S. Jain and B. C. Wallace (2019) Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio (Eds.), Minneapolis, Minnesota, pp. 3543–3556.
* D. Jha, L. Ward, A. Paul, W. Liao, A. Choudhary, C. Wolverton, and A. Agrawal (2018) ElemNet: deep learning the chemistry of materials from only elemental composition. Scientific reports 8(1), pp. 17593.
* Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung (2023) Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12).
* Q. Jiang and G. Karniadakis (2025) AgenticSciML: collaborative multi-agent systems for emergent discovery in scientific machine learning. arXiv preprint arXiv:2511.07262.
* S. Jiang and M. A. Webb (2025) Generative active learning across polymer architectures and solvophobicities for targeted rheological behaviour. npj Computational Materials 11, pp. 90.
* X. Jiang, W. Wang, and Y. Su (2025a) Applications of natural language processing and large language models in materials discovery. npj Computational Materials 11(1), pp. 79.
* X. Jiang, W. Wang, S. Tian, H. Wang, T. Lookman, and Y. Su (2025b) Applications of natural language processing and large language models in materials discovery. npj Computational Materials 11(1), pp. 79.
* H. Jiequn, L. Zhang, C. Roberto, and E. Weinan (2018) Deep potential: a general representation of a many-body potential energy surface. Communications in Computational Physics 23(3), pp. 629–639.
* L. Jin, Z. Du, L. Shu, Y. Cen, Y. Xu, Y. Mei, and H. Zhang (2025) Transformer-generated atomic embeddings to enhance prediction accuracy of crystal properties with machine learning. Nature Communications 16(1), pp. 1210.
* L. Jin, Z. Du, L. Shu, Y. Mei, and H. Zhang (2024) Crystal transformer based universal atomic embedding for accurate and transferable prediction of materials properties. arXiv preprint arXiv:2401.09755.
* J. Jo, E. Choi, M. Kim, and K. Min (2021) Machine learning-aided materials design platform for predicting the mechanical properties of na-ion solid-state electrolytes. ACS Applied Energy Materials 4(8), pp. 7862–7869.
* A. Jolicoeur-Martineau (2025) Less is more: recursive reasoning with tiny networks. arXiv preprint arXiv:2510.04871.
* R. P. Joshi and N. Kumar (2021) Artificial intelligence based autonomous molecular design for medical therapeutic: a perspective. External Links: 2102.06045.
* J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), pp. 583–589.
* D. Kahneman (2011) Thinking, fast and slow. Farrar, Straus and Giroux.
* B. Kailkhura, B. Gallagher, S. Kim, A. Hiszpanski, and T. Y. Han (2019) Reliable and explainable machine-learning methods for accelerated material discovery. npj Computational Materials 5(1), pp. 108.
* S. R. Kalidindi, M. Buzzy, B. L. Boyce, and R. Dingreville (2022) Digital twins for materials. Frontiers in Materials 9, pp. 818535.
* M. Kandavalli, A. Agarwal, A. Poonia, M. Kishor, and K. P. R. Ayyagari (2023) Design of high bulk moduli high entropy alloys using machine learning. Scientific Reports 13(1), pp. 20504.
* C. Kang, X. Liu, and F. Guo Retrointext: a multimodal large language model enhanced framework for retrosynthetic planning via in-context representation learning. In The Thirteenth International Conference on Learning Representations.
* J. Kaplan, S. McCandlish, T. J. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei (2020) Scaling laws for neural language models. ArXiv abs/2001.08361.
* N. Kazeev, W. Nong, I. Romanov, R. Zhu, A. Ustyuzhanin, S. Yamazaki, and K. Hippalgaonkar (2025) Wyckoff transformer: generation of symmetric crystals. arXiv preprint arXiv:2503.02407.
* A. Kendall and Y. Gal (2017) What uncertainties do we need in bayesian deep learning for computer vision?. Advances in neural information processing systems 30.
* K. Khan, S. U. Rehman, K. Aziz, S. Fong, and S. Sarasvady (2014) DBSCAN: past, present and future. In The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), pp. 232–238.
* B. Kim, S. Lee, and J. Kim (2020) Inverse design of porous materials using artificial neural networks. Science Advances 6(1), pp. eaax9324.
* E. Kim, K. Huang, A. Tomala, S. Matthews, E. Strubell, A. Saunders, A. McCallum, and E. Olivetti (2017) Machine-learned and codified synthesis parameters of oxide materials. Scientific data 4(1), pp. 1–9.
* H. Kim, H. Choi, D. Kang, W. B. Lee, and J. Na (2024a) Materials discovery with extreme properties via reinforcement learning-guided combinatorial chemistry. Chemical Science 15(21), pp. 7908–7925.
* H. Kim, J. Na, and W. B. Lee (2021) Generative chemical transformer: neural machine learning of molecular geometric structures from chemical language via attention. Journal of chemical information and modeling 61(12), pp. 5804–5814.
* J. Kim, Y. Kim, J. Park, Y. Oh, S. Kim, and S. Lee (2024b) MELT: materials-aware continued pre-training for language model adaptation to materials science. arXiv preprint arXiv:2410.15126.
* R. D. King, J. Rowland, S. G. Oliver, M. Young, W. Aubrey, E. Byrne, M. Liakata, M. Markham, P. Pir, L. N. Soldatova, A. Sparkes, K. E. Whelan, and A. Clare (2009) The automation of science. Science 324(5923), pp. 85–89.
* D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
* V. Konda and J. Tsitsiklis (1999) Actor-critic algorithms. Advances in neural information processing systems 12.
* M. M. Korop and A. V. Prybyla (2025) Application of llm to search and systematize the properties of thermoelectric materials in scientific literature. Journal of Thermoelectricity, pp. 16–25.
* A. Kotobi, K. Singh, D. Höche, S. Bari, R. H. Meißner, and A. Bande (2023) Integrating explainability into graph neural network models for the prediction of x-ray absorption spectra. Journal of the American Chemical Society 145(41), pp. 22584–22598.
* A. Krizhevsky, I. Sutskever, and G. Hinton (2012) ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems 25, pp..
* C. Kuenneth and R. Ramprasad (2023) PolyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nature communications 14(1), pp. 4099.
* M. Kulichenko, B. Nebgen, N. Lubbers, J. S. Smith, K. Barros, A. E. Allen, A. Habib, E. Shinkle, N. Fedik, Y. W. Li, et al. (2024) Data generation for machine learning interatomic potentials and beyond. Chemical Reviews 124(24), pp. 13681–13714.
* P. Kumar, S. Kabra, and J. M. Cole (2024) A database of stress-strain properties auto-generated from the scientific literature using chemdataextractor. Scientific Data 11(1), pp. 1273.
* P. Kumar, S. Kabra, and J. M. Cole (2025) MechBERT: language models for extracting chemical and property relationships about mechanical stress and strain. Journal of Chemical Information and Modeling 65(4), pp. 1873–1888.
* S. Kumbhar, V. Mishra, K. Coutinho, D. Handa, A. Iquebal, and C. Baral (2025) Hypothesis generation for materials discovery and design using goal-driven and constraint-guided llm agents. arXiv preprint arXiv:2501.13299.
* A. G. Kusne, H. Yu, C. Wu, H. Zhang, J. Hattrick-Simpers, B. DeCost, S. Sarker, C. Oses, C. Toher, S. Curtarolo, A. V. Davydov, R. Agarwal, L. A. Bendersky, M. Li, A. Mehta, and I. Takeuchi (2020) On-the-fly closed-loop materials discovery via bayesian active learning. Nature Communications 11(1), pp. 5966.
* T. M. Lai, C. Zhai, and H. Ji (2023) Knowledge-enhanced biomedical language models. In Journal of Biomedical Informatics.
* B. Lakshminarayanan, A. Pritzel, and C. Blundell (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp..
* T. Lan, H. Wang, and Q. An (2024) Enabling high throughput deep reinforcement learning with first principles to investigate catalytic reaction mechanisms. Nature Communications 15(1), pp. 6281.
* P. Langley (1987) Scientific discovery: computational explorations of the creative processes. MIT press.
* Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521(7553), pp. 436–444.
* J. Lee, D. Kim, S. Kim, S. Lee, and H. Kim (2022) Evaluation of principal features for predicting bulk and shear modulus of inorganic solids with machine learning. Scientific Reports 12(1), pp. 1–11.
* G. Lei, R. Docherty, and S. J. Cooper (2024) Materials science in the era of large language models: a perspective. Digital Discovery 3(7), pp. 1257–1272.
* P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA.
* A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra (2022) Solving quantitative reasoning problems with language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA.
* J. Li, D. Li, C. Xiong, and S. C. H. Hoi (2022a) BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the 39th International Conference on Machine Learning (ICML), Vol. 162, pp. 12888–12900.
* J. Li, J. Tang, W. X. Zhao, J. Nie, and J. Wen (2021) M6: multi-modality to multi-modality multitask mega-transformer for unified pretraining. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3251–3261.
* Q. Li, N. Fu, S. S. Omee, and J. Hu (2024a) MD-hit: machine learning for material property prediction with dataset redundancy control. npj Computational Materials 10(1), pp. 245.
* W. Li, W. Wu, M. Chen, J. Liu, X. Xiao, and H. Wu (2022b) Faithfulness in natural language generation: a systematic survey of analysis, evaluation and optimization methods. ArXiv abs/2203.05227.
* W. Li, Y. Chen, J. Qiu, and X. Wang (2025a) MatWheel: addressing data scarcity in materials science through synthetic data. arXiv preprint arXiv:2504.09152.
* X. Li, L. Wang, Y. Luo, C. Edwards, S. Gui, Y. Lin, H. Ji, and S. Ji (2024b) Geometry informed tokenization of molecules for language model generation. External Links: 2408.10120.
* Y. Li, F. Cloutier, S. Wu, A. Parviz, B. Knyazev, Y. Zhang, G. Berseth, and B. Liu (2026) M4̂olGen: multi-agent, multi-stage molecular generation under precise multi-property constraints. External Links: 2601.10131.
* Y. Li, V. Gupta, M. N. T. Kilic, K. Choudhary, D. Wines, W. Liao, A. Choudhary, and A. Agrawal (2025b) Hybrid-llm-gnn: integrating large language models and graph neural networks for enhanced materials property prediction. Digital Discovery 4(2), pp. 376–383.
* Y. Li, K. Han, Z. Zhang, X. Chen, Y. Wang, Y. Rong, J. E. Gonzalez, and Y. You (2023) Materials informatics transformer: a language model for interpretable materials properties prediction. arXiv preprint arXiv:2308.16259.
* Z. Li, W. Zhao, Y. Li, and J. Sun (2024c) Do influence functions work on large language models?. External Links: 2409.19998.
* J. Lin, H. Yin, W. Ping, Y. Lu, P. Molchanov, A. Tao, H. Mao, J. Kautz, M. Shoeybi, and S. Han (2024) VILA: on pre-training for visual language models. In Proceedings of CVPR 2024.
* Z. Lipton (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery.. Queue 16, pp..
* G. Liu, M. Sun, W. Matusik, M. Jiang, and J. Chen (2024a) Multimodal large language models for inverse molecular design with retrosynthetic planning. arXiv preprint arXiv:2410.04223.
* S. Liu, T. Wen, B. Ye, Z. Li, and D. J. Srolovitz (2024b) Large language models for material property predictions: elastic constant tensor prediction and materials design. arXiv preprint arXiv:2405.11975.
* Y. Liu, M. Checa, and R. K. Vasudevan (2024c) Synergizing human expertise and ai efficiency with language model for microscopy operation and automated experiment design. Machine Learning: Science and Technology 5(2), pp. 02LT01.
* LLaMP: large language model made powerful for high-fidelity materials knowledge retrieval and distillation. arXiv preprint arXiv:2401.17244.
* S. Lo Piano (2020) Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward. Humanities and Social Sciences Communications 7(1), pp. 1–7.
* C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha (2024a) The ai scientist: towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292.
* C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha (2024b) The ai scientist: towards fully automated open-ended scientific discovery. External Links: 2408.06292.
* C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha (2024c) The ai scientist: towards fully automated open-ended scientific discovery. External Links: 2408.06292.
* S. Lu, Z. Wang, H. Zhang, Q. Wu, L. Gan, C. Zhuang, J. Gu, and T. Lin (2025) Don’t just fine-tune the agent, tune the environment. arXiv preprint arXiv:2510.10197.
* S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, pp..
* A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller (2024) Augmenting large language models with chemistry tools. Nature Machine Intelligence 6(5), pp. 525–535.
* L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9(Nov), pp. 2579–2605.
* B. P. MacLeod, F. G. Parlane, C. C. Rupnow, K. E. Dettelbach, M. S. Elliott, T. D. Morrissey, T. H. Haley, O. Proskurin, M. B. Rooney, N. Taherimakhsousi, et al. (2022) A self-driving laboratory advances the pareto front for material properties. Nature communications 13(1), pp. 995.
* M. Madani, V. Lacivita, Y. Shin, and A. Tarakanova (2025) Accelerating materials property prediction via a hybrid transformer graph framework that leverages four body interactions. npj Computational Materials 11(1), pp. 15.
* B. Madika, A. Saha, C. Kang, B. Buyantogtokh, J. Agar, C. M. Wolverton, P. Voorhees, P. Littlewood, S. Kalinin, and S. Hong (2025) Artificial intelligence for materials discovery, development, and optimization. ACS Nano 19(30), pp. 27116–27158.
* S. A. Malik, T. Doherty, P. Tigas, M. Razzak, Y. Gal, and A. Walsh (2025) Towards dynamic benchmarks for autonomous materials discovery. In AI for Accelerated Materials Design - NeurIPS 2025.
* I. Mandal, J. Soni, M. Zaki, M. M. Smedskjaer, K. Wondraczek, L. Wondraczek, N. N. Gosvami, and N. M. A. Krishnan (2025) Evaluating large language model agents for automation of atomic force microscopy. Nature Communications 16(1), pp. 9104.
* D. Martonová, A. Goriely, and E. Kuhl (2025) Generalized invariants meet constitutive neural networks: a novel framework for hyperelastic materials. Journal of the Mechanics and Physics of Solids, pp. 106352.
* M. J. McDermott, S. S. Dwaraknath, and K. A. Persson (2021) A graph-based network for predicting chemical reaction pathways in solid-state materials synthesis. Nature communications 12(1), pp. 3097.
* J. B. McQueen (1967) Some methods of classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281–297.
* J. Medina, A. W. Ziaullah, H. Park, I. E. Castelli, A. Shaon, H. Bensmail, and F. El-Mellouhi (2022) Accelerating the adoption of research data management strategies. Matter 5(11), pp. 3614–3642.
* A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk (2023) Scaling deep learning for materials discovery. Nature 624(7990), pp. 80–85.
* H. Metni, L. Ruple, L. N. Walters, L. Torresi, J. Teufel, H. Schopmans, J. Östreicher, Y. Zhang, M. Neubert, Y. Koide, K. Steiner, P. Link, L. Bär, M. Petrova, G. Ceder, and P. Friederich (2025) Generative models for crystalline materials. Note: v1 submitted 2025.
* METR (2025) Recent frontier models are reward hacking. Note: Blog post [https://metr.org/blog/2025-06-05-recent-reward-hacking/].
* A. Mirza, L. Yang, A. K. Chandran, J. Östreicher, S. Bompas, B. Kazimi, S. Kesselheim, P. Friederich, S. Sandfeld, and K. M. Jablonka (2025) MatBind: probing the multimodality of materials science with contrastive learning. In ICLR AI4Mat Workshop (ICLR 2025). Note: OpenReview preprint, CC BY 4.0.
* V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
* J. H. Montoya, K. T. Winther, R. A. Flores, T. Bligaard, J. S. Hummelshøj, and M. Aykol (2020) Autonomous intelligent agents for accelerated materials discovery. Chemical Science 11(32), pp. 8517–8532.
* V. Moro, C. Loh, R. Dangovski, A. Ghorashi, A. Ma, Z. Chen, S. Kim, P. Y. Lu, T. Christensen, and M. Soljačić (2025) Multimodal foundation models for material property prediction and discovery. Newton 1(1), pp. 100016.
* Multi-modal conditional diffusion model using signed distance functions for metal-organic frameworks generation. Nature Communications 16(1). Note: Published 2025-01-01/02; add full author list and article number/pages from the publisher citation export if required.
* A. Musaelian, S. Batzner, A. Johansson, L. Sun, C. J. Owen, M. Kornbluth, and B. Kozinsky (2023) Learning local equivariant representations for large-scale atomistic dynamics. Nature communications 14(1), pp. 579.
* M. Muthyala, F. Sorourifar, and J. A. Paulson (2024) TorchSISSO: a pytorch-based implementation of the sure independence screening and sparsifying operator for efficient and interpretable model discovery. Digital Chemical Engineering 13, pp. 100198.
* R. R. Naik, A. Tiihonen, J. Thapa, C. Batali, Z. Liu, S. Sun, and T. Buonassisi (2022) Discovering equations that govern experimental materials stability under environmental stress using scientific machine learning. npj Computational Materials 8(1), pp. 72.
* A. S. Nair, L. Foppa, and M. Scheffler (2025) Materials-discovery workflow guided by symbolic regression for identifying acid-stable oxides for electrocatalysis. npj Computational Materials 11(1), pp. 150.
* R. Nakano, J. Hilton, S. Balaji, J. Wu, O. Long, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman (2021) WebGPT: browser-assisted question-answering with human feedback. ArXiv abs/2112.09332.
* T. Nguyen, T. Torres-Flores, C. Hwang, C. Edwards, Y. Diao, and H. Ji (2024) GLaD: synergizing molecular graphs and language descriptors for enhanced power conversion efficiency prediction in organic photovoltaic devices. In Proc. 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024).
* A. Nigam, R. Pollice, M. Krenn, G. d. P. Gomes, and A. Aspuru-Guzik (2021) Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (stoned) algorithm for molecules using selfies. Chemical Science 12(20), pp. 7079–7090.
* E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y. Zhou, S. Savarese, and C. Xiong (2022) CodeGen: an open large language model for code with multi-turn program synthesis. In International Conference on Learning Representations.
* P. Nikolaev, D. Hooper, F. Webber, R. Rao, K. Decker, M. Krein, J. Poleski, R. Barto, and B. Maruyama (2016) Autonomy in materials research: a case study in carbon nanotube growth. npj Computational Materials 2(1), pp. 16031.
* C. Novelli, M. Taddeo, and L. Floridi (2024) Accountability in artificial intelligence: what it is and how it works. Ai & Society 39(4), pp. 1871–1882.
* NVIDIA, A. Azzolini, H. Brandon, P. Chattopadhyay, H. Chen, J. Chu, Y. Cui, J. Diamond, Y. Ding, F. Ferroni, R. Govindaraju, J. Gu, S. Gururani, I. El Hanafi, Z. Hao, J. Huffman, J. Jin, B. Johnson, R. Khan, G. Kurian, E. Lantz, N. Lee, Z. Li, X. Li, T. Lin, Y. Lin, M. Liu, A. Mathau, Y. Ni, L. Pavao, W. Ping, D. W. Romero, M. Smelyanskiy, S. Song, L. Tchapmi, A. Z. Wang, B. Wang, H. Wang, F. Wei, J. Xu, Y. Xu, X. Yang, Z. Yang, X. Zeng, and Z. Zhang (2025) Cosmos-reason1: from physical common sense to embodied reasoning.
* J. Ock, J. Montoya, D. Schweigert, L. Hung, S. K. Suram, and W. Ye (2024) UniMat: unifying materials embeddings through multi-modal learning. arXiv preprint arXiv:2411.08664.
* R. Odobesku, K. Romanova, S. Mirzaeva, O. Zagorulko, R. Sim, R. Khakimullin, J. Razlivina, A. Dmitrenko, and V. Vinogradov (2025) NanoMINER: multimodal information extraction for nanomaterials. In AI for Accelerated Materials Design-ICLR 2025.
* M. O. Oduoye, B. Javed, N. Gupta, and C. M. V. Sih (2023) Algorithmic bias and research integrity; the role of nonhuman authors in shaping scientific knowledge with respect to artificial intelligence: a perspective. International Journal of Surgery 109(10), pp. 2987–2990.
* J. Oh, G. Farquhar, I. Kemaev, D. A. Calian, M. Hessel, L. Zintgraf, S. Singh, H. van Hasselt, and D. Silver (2025) Discovering state-of-the-art reinforcement learning algorithms. Nature (en).
* M. Omidvar, H. Zhang, A. A. Ihalage, T. G. Saunders, H. Giddens, M. Forrester, S. Haq, and Y. Hao (2024) Accelerated discovery of perovskite solid solutions through automated materials synthesis and characterization. Nature Communications 15(1), pp. 6554.
* K. Oono and T. Suzuki (2019) Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947.
* L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. (2022) Training language models to follow instructions with human feedback. Advances in neural information processing systems 35, pp. 27730–27744.
* R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, and L. M. Ghiringhelli (2018) SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2, pp. 083802.
* F. Oviedo, J. L. Ferres, T. Buonassisi, and K. T. Butler (2022) Interpretable and explainable machine learning for materials science and chemistry. Accounts of Materials Research 3(6), pp. 597–607.
* C. Pai, H. HSUEH, and S. Hsu (2025) Reliability of deep learning models for scanning electron microscopy analysis. In AI for Accelerated Materials Design - ICLR 2025.
* P. Pak and A. Barati Farimani (2025) Additivellm: large language models predict defects in metals additive manufacturing. Available at SSRN 5144227.
* E. Pan, C. Karpovich, and E. Olivetti (2022) Deep reinforcement learning for inverse inorganic materials design. External Links: 2210.11931.
* G. Panapitiya, E. Saldanha, H. Job, and O. Hess (2025) AutoLabs: cognitive multi-agent systems with self-correction for autonomous chemical experimentation.
* A. Paruchuri, Y. Wang, X. Gu, and A. Jayaraman (2024) Machine learning for analyzing atomic force microscopy (afm) images generated from polymer blends. Digital Discovery 3(12), pp. 2533–2550.
* A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32.
* H. Paulheim (2016) Knowledge graph refinement: a survey of approaches and evaluation methods. Semantic web 8(3), pp. 489–508.
* I. Peivaste, S. Belouettar, F. Mercuri, N. Fantuzzi, H. Dehghani, R. Izadi, H. Ibrahim, J. Lengiewicz, M. Belouettar-Mathis, K. Bendine, A. Makradi, M. Horsch, P. Klein, M. El-Hachemi, H. A. Preisig, et al. (2025) Artificial intelligence in materials science and engineering: current landscape, key challenges, and future trajectories. Composite Structures 372, pp. 119419.
* Z. Peng, W. Wang, L. Dong, Y. Hao, S. Huang, S. Ma, and F. Wei (2023) KOSMOS-2: grounding multimodal large language models to the world. arXiv preprint arXiv:2306.14824.
* G. Pilania, J. Gubernatis, and T. Lookman (2017) Multi-fidelity machine learning models for accurate bandgap predictions of solids. Computational Materials Science 129, pp. 156–163.
* A. Plaat, M. van Duijn, N. van Stein, M. Preuss, P. van der Putten, and K. J. Batenburg (2025) Agentic large language models, a survey. arXiv preprint arXiv:2503.23037.
* M. P. Polak and D. Morgan (2024) Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nature Communications 15(1), pp. 1569.
* C. Polat, H. KURBAN, E. Serpedin, and M. Kurban (2025) TDCM25: a multi-modal multi-task benchmark for temperature-dependent crystalline materials. In AI for Accelerated Materials Design - ICLR 2025.
* T. Prein, E. Pan, T. Doerr, E. Olivetti, and J. L. Rupp (2023) Mtencoder: a multi-task pretrained transformer encoder for materials representation learning. In AI for Accelerated Materials Design-NeurIPS 2023 Workshop.
* T. Prein, E. Pan, S. Haddouti, M. Lorenz, J. Jehkul, T. Wilk, C. Moran, M. P. Fotiadis, A. P. Toshev, E. Olivetti, et al. (2025) Retro-rank-in: a ranking-based approach for inorganic materials synthesis planning. arXiv preprint arXiv:2502.04289.
* M. H. Prince, H. Chan, A. Vriza, T. Zhou, V. K. Sastry, Y. Luo, M. T. Dearing, R. J. Harder, R. K. Vasudevan, and M. J. Cherukara (2024) Opportunities for retrieval and tool augmented large language models in scientific facilities. npj Computational Materials 10(1), pp. 251.
* E. O. Pyzer-Knapp, M. Manica, P. Staar, L. Morin, P. Ruch, T. Laino, J. R. Smith, and A. Curioni (2025) Foundation models for materials discovery –current state and future directions. npj Computational Materials 11(1), pp. 61.
* E. O. Pyzer-Knapp, J. W. Pitera, P. W. J. Staar, S. Takeda, T. Laino, D. P. Sanders, J. Sexton, J. R. Smith, and A. Curioni (2022) Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Computational Materials 8(1), pp. 84.
* X. Qian, B. Ju, P. Shen, K. Yang, L. Li, and Q. Liu (2024) Meta learning with attention based fp-gnns for few-shot molecular property prediction. ACS omega 9(22), pp. 23940–23948.
* X. Qian, B. Yoon, R. Arróyave, X. Qian, and E. R. Dougherty (2023) Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery. Patterns 4(11), pp. 100863.
* Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang, B. Qian, S. Zhao, R. Tian, R. Xie, J. Zhou, M. Gerstein, D. Li, Z. Liu, and M. Sun (2023) ToolLLM: facilitating large language models to master 16000+ real-world apis. External Links: 2307.16789.
* H. Qiu and Z. Sun (2024) On-demand reverse design of polymers with polytao. npj Computational Materials 10(1), pp. 273.
* A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. (2021) Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763.
* M. Ragone, R. Shahbazian-Yassar, F. Mashayek, and V. Yurkiv (2023) Deep learning modeling in microscopy imaging: a review of materials science applications. Progress in Materials Science 138, pp. 101165.
* R. Ramakrishnan, P. O. Dral, P. O. Dral, M. Rupp, and O. A. von Lilienfeld (2014) Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1.
* A. Ramlaoui, M. Siron, I. Djafar, J. Musielewicz, A. Rossello, V. Schmidt, and A. Duval (2025) LeMat-Traj: a scalable and unified dataset of materials trajectories for atomistic modeling. arXiv preprint arXiv:2508.20875.
* G. Ramos, C. Meek, P. Simard, J. Suh, and S. Ghorashi (2020) Interactive machine teaching: a human-centered approach to building machine-learned models. Human–Computer Interaction 35(5-6), pp. 413–451.
* M. C. Ramos, C. J. Collison, and A. D. White (2025) A review of large language models and autonomous agents in chemistry. Chemical Science 16(6), pp. 2514–2572.
* R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and C. Kim (2017) Machine learning in materials informatics: recent applications and prospects. npj Computational Materials 3(1), pp. 54.
* L. Regenwetter, Y. A. Obaideh, and F. Ahmed (2024) MCD: a model-agnostic counterfactual search method for multi-modal design modifications. External Links: 2305.11308.
* M. Ribeiro, S. Singh, and C. Guestrin (2016) “Why should I trust you?”: explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, J. DeNero, M. Finlayson, and S. Reddy (Eds.), San Diego, California, pp. 97–101.
* J. Riebesell, R. E. Goodall, P. Benner, Y. Chiang, B. Deng, G. Ceder, M. Asta, A. A. Lee, A. Jain, and K. A. Persson (2025) A framework to evaluate machine learning crystal stability predictions. Nature Machine Intelligence 7(6), pp. 836–847.
* L. M. Roch, F. Häse, and A. Aspuru-Guzik (2020) ChemOS: an orchestration software to democratize autonomous discovery. PLoS ONE 15(4), pp. e0229862.
* R. Roscher, B. Bohn, M. F. Duarte, and J. Garcke (2020) Explainable machine learning for scientific insights and discoveries. IEEE Access 8(), pp. 42200–42216.
* A. N. Rubungo, C. Arnold, B. P. Rand, and A. B. Dieng (2023) Llm-prop: predicting physical and electronic properties of crystalline solids from their text descriptions. arXiv preprint arXiv:2310.14029.
* S. Ruder (2017) An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.
* S. Sadeghi, F. Bateni, T. Kim, D. Y. Son, J. A. Bennett, N. Orouji, V. S. Punati, C. Stark, T. D. Cerra, R. Awad, et al. (2024) Autonomous nanomanufacturing of lead-free metal halide perovskite nanocrystals using a self-driving fluidic lab. Nanoscale 16(2), pp. 580–591.
* S. Sanyal, J. Balachandran, N. Yadati, A. Kumar, P. Rajagopalan, S. Sanyal, and P. Talukdar (2018) Mt-cgcnn: integrating crystal graph convolutional neural network with multitask learning for material property prediction. arXiv preprint arXiv:1811.05660.
* M. C. Scharber, D. Mühlbacher, M. Koppe, P. Denk, C. Waldauf, A. J. Heeger, and C. J. Brabec (2006) Design rules for donors in bulk-heterojunction solar cells—towards 10% energy-conversion efficiency. Advanced materials 18(6), pp. 789–794.
* T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom (2023) Toolformer: language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
* M. Schilling-Wilhelmi, M. Ríos-García, S. Shabih, M. V. Gil, S. Miret, C. T. Koch, J. A. Márquez, and K. M. Jablonka (2025) From text to insight: large language models for chemical data extraction. Chemical Society Reviews.
* J. Schmidt, J. Shi, P. Borlido, L. Chen, S. Botti, and M. A. Marques (2017) Predicting the thermodynamic stability of solids combining density functional theory and machine learning. Chemistry of Materials 29(12), pp. 5090–5103.
* J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
* K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. Tkatchenko (2017) Quantum-chemical insights from deep tensor neural networks. Nature communications 8(1), pp. 13890.
* K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K. Müller, and E. K. Gross (2014) How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Physical Review B 89(20), pp. 205118.
* M. Seifrid, R. Pollice, A. Aguilar-Granda, Z. Morgan Chan, K. Hotta, C. T. Ser, J. Vestfrid, T. C. Wu, and A. Aspuru-Guzik (2022) Autonomous chemical experiments: challenges and perspectives on establishing a self-driving lab. Accounts of Chemical Research 55(17), pp. 2454–2466.
* F. Shahhosseini, A. Marioriyad, A. Momen, M. S. Baghshah, M. H. Rohban, and S. H. Javanmard (2025) Large language models for scientific idea generation: a creativity-centered survey.
* F. X. Shaw (2025) Microsoft build 2025: the age of ai agents and building the open agentic web. Note: Blog post, Microsoft.
* P. Shetty, A. C. Rajan, C. Kuenneth, S. Gupta, L. P. Panchumarti, L. Holm, C. Zhang, and R. Ramprasad (2023) A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Computational Materials 9(1), pp. 52.
* Y. Shi, Z. Zheng, Y. Wu, L. Gan, C. Zhuang, J. Gu, and T. Lin (2025) Graph attention neural networks for interpretable and generalizable prediction of janus iii–vi van der waals heterostructures. Advanced Intelligent Discovery, pp. 202500061.
* B. Shneiderman (2020) Human-centered artificial intelligence: three fresh ideas. AIS Transactions on Human-Computer Interaction 12(3), pp. 109–124.
* N. Shoghi, A. Kolluru, J. R. Kitchin, Z. W. Ulissi, C. L. Zitnick, and B. M. Wood (2023) From molecules to materials: pre-training large generalizable models for atomic property prediction. arXiv preprint arXiv:2310.16802.
* O. Sierepeklis and J. M. Cole (2022) A thermoelectric materials database auto-generated from the scientific literature using chemdataextractor. Scientific Data 9(1), pp. 648.
* D. Silver, S. Singh, D. Precup, and R. S. Sutton (2021) Reward is enough. Artificial Intelligence 299, pp. 103535.
* D. Silver and R. S. Sutton (2025) Welcome to the era of experience. Preprint.
* A. Singh, R. Hu, V. Goswami, G. Couairon, W. Galuba, M. Rohrbach, and D. Kiela (2022) FLAVA: a foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15617–15629.
* S. Singh, K. Hindriks, D. Heylen, and K. Baraka (2025) A systematic review of human-ai co-creativity.
* J. Snoek, H. Larochelle, and R. P. Adams (2012) Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25, pp..
* X. Song, X. Pan, X. Zhao, H. Ye, S. Zhang, J. Tang, and T. Yu (2025a) AOT*: efficient synthesis planning via llm-empowered and-or tree search. arXiv preprint arXiv:2509.20988.
* Y. Song, S. Miret, and B. Liu (2023) MatSci-nlp: evaluating scientific language models on materials science language tasks using text-to-schema modeling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, pp. 3570–3595.
* Z. Song, S. Lu, M. Ju, Q. Zhou, and J. Wang (2025b) Accurate prediction of synthesizability and precursors of 3d crystal structures via large language models. Nature Communications 16(1), pp. 6530.
* A. Souly, J. Rando, E. Chapman, X. Davies, B. Hasircioglu, E. Shereen, C. Mougan, V. Mavroudis, E. Jones, C. Hicks, et al. (2025) Poisoning attacks on llms require a near-constant number of poison samples. arXiv preprint arXiv:2510.07192.
* J. M. Springer, S. Goyal, K. Wen, T. Kumar, X. Yue, S. Malladi, G. Neubig, and A. Raghunathan (2025) Overtrained language models are harder to fine-tune. arXiv preprint arXiv:2503.19206.
* S. S. Srinivas and V. Runkana (2024) Cross-modal learning for chemistry property prediction: large language models meet graph machine learning. arXiv preprint arXiv:2408.14964.
* N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1), pp. 1929–1958.
* Structured information extraction from scientific text with large language models. Nature Communications.
* H. Su, R. Chen, S. Tang, Z. Yin, X. Zheng, J. Li, B. Qi, Q. Wu, H. Li, W. Ouyang, P. Torr, B. Zhou, and N. Dong (2025) Many heads are better than one: improved scientific idea generation by a llm-based multi-agent system. External Links: 2410.09403.
* M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. External Links: 1703.01365.
* R. S. Sutton, A. G. Barto, et al. (1998) Introduction to reinforcement learning. Vol. 135, MIT press Cambridge.
* M. C. Swain and J. M. Cole (2016) ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. Journal of chemical information and modeling 56(10), pp. 1894–1904.
* N. J. Szymanski, S. Fu, E. Persson, and G. Ceder (2024) Integrated analysis of x-ray diffraction patterns and pair distribution functions for machine-learned phase identification. Npj computational materials 10(1), pp. 45.
* N. J. Szymanski, P. Nevatia, C. J. Bartel, Y. Zeng, and G. Ceder (2023a) Autonomous and dynamic precursor selection for solid-state materials synthesis. Nature communications 14(1), pp. 6956.
* N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, et al. (2023b) An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624(7990), pp. 86–91.
* N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y. Zeng, and G. Ceder (2023c) An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624(7990), pp. 86–91.
* I. Takahara, T. Mizoguchi, and B. Liu (2025) Accelerated inorganic materials design with generative ai agents. External Links: 2504.00741.
* S. Takeda, I. Priyadarsini, A. Kishimoto, H. Shinohara, L. Hamada, H. Masataka, J. Fuchiwaki, and D. Nakano (2023a) Multi-modal foundation model for material design. In AI for Accelerated Materials Design-NeurIPS 2023 Workshop.
* S. Takeda, I. Priyadarsini, A. Kishimoto, H. Shinohara, L. Hamada, H. Masataka, J. Fuchiwaki, and D. Nakano (2023b) Multi-modal foundation model for material design. In AI for Accelerated Materials Design - NeurIPS 2023 Workshop.
* A. Talapatra, S. Boluki, T. Duong, X. Qian, E. Dougherty, and R. Arróyave (2018) Autonomous efficient experiment design for materials discovery with bayesian model averaging. Phys. Rev. Mater. 2, pp. 113803.
* Z. Tan, Q. Yang, and S. Luo (2025) AI molecular catalysis: where are we now?. Organic Chemistry Frontiers 12(8), pp. 2759–2776.
* Y. Tang, W. Xu, J. Cao, W. Gao, S. Farrell, B. Erichson, M. W. Mahoney, A. Nonaka, and Z. Yao (2025a) Matterchat: a multi-modal llm for material science. arXiv preprint arXiv:2502.13107.
* Y. Tang, W. Xu, J. Cao, W. Gao, S. Farrell, B. Erichson, M. W. Mahoney, A. Nonaka, and Z. Yao (2025b) Matterchat: a multi-modal llm for material science. arXiv preprint arXiv:2502.13107.
* F. Tavazza, B. DeCost, and K. Choudhary (2021) Uncertainty prediction for machine learning models of material properties. ACS omega 6(48), pp. 32431–32440.
* S. Tian, X. Jiang, W. Wang, Z. Jing, C. Zhang, C. Zhang, T. Lookman, and Y. Su (2025) Steel design based on a large language model. Acta Materialia 285, pp. 120663.
* G. Tom, B. Burger, A. Ghaffari, et al. (2024a) Self-driving laboratories for chemistry and materials science. Chemical Reviews 124, pp. 429–480.
* G. Tom, S. P. Schmid, S. G. Baird, Y. Cao, K. Darvish, H. Hao, S. Lo, S. Pablo-García, E. M. Rajaonson, M. Skreta, et al. (2024b) Self-driving laboratories for chemistry and materials science. Chemical Reviews 124(16), pp. 9633–9732.
* G. Tom, S. P. Schmid, S. G. Baird, Y. Cao, K. Darvish, H. Hao, S. Lo, S. Pablo-García, E. M. Rajaonson, M. Skreta, N. Yoshikawa, S. Corapi, G. D. Akkoc, F. Strieth-Kalthoff, M. Seifrid, and A. Aspuru-Guzik (2024c) Self-driving laboratories for chemistry and materials science. Chemical Reviews 124(16), pp. 9633–9732.
* G. Tom, S. P. Schmid, S. G. Baird, Y. Cao, K. Darvish, H. Hao, S. Lo, S. Pablo-García, E. M. Rajaonson, M. Skreta, N. Yoshikawa, S. Corapi, G. D. Akkoc, F. Strieth-Kalthoff, M. Seifrid, and A. Aspuru-Guzik (2024d) Self-driving laboratories for chemistry and materials science. Chemical Reviews 124(16), pp. 9633–9732.
* K. Tran, D. Dao, M. Nguyen, Q. Pham, B. O’Sullivan, and H. D. Nguyen (2025) Multi-agent collaboration mechanisms: a survey of llms. arXiv preprint arXiv:2501.06322.
* N. J. Treloar, N. Braniff, B. Ingalls, and C. P. Barnes (2022) Deep reinforcement learning for optimal experimental design in biology. PLOS Computational Biology 18(11), pp. e1010695.
* V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K. A. Persson, G. Ceder, and A. Jain (2019) Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763), pp. 95–98.
* S. Udrescu and M. Tegmark (2020) AI feynman: a physics-inspired method for symbolic regression. Science Advances 6(16), pp. eaay2631.
* M. Vaccaro, A. Almaatouq, and T. Malone (2024) When combinations of humans and ai are useful: a systematic review and meta-analysis. Nature Human Behaviour 8(12), pp. 2293–2303.
* J. Van Herck, M. V. Gil, K. M. Jablonka, A. Abrudan, A. S. Anker, M. Asgari, B. Blaiszik, A. Buffo, L. Choudhury, C. Corminboeuf, et al. (2025) Assessment of fine-tuned large language models for real-world chemistry and material science applications. Chemical science 16(2), pp. 670–684.
* M. Van, P. Verma, C. Zhao, and X. Wu (2025) A survey of ai for materials science: foundation models, llm agents, datasets, and tools.
* R. Vasu, P. Jansen, P. Siangliulue, C. Sarasua, A. Bernstein, P. Clark, and B. D. Mishra (2025) HARPA: a testability-driven, literature-grounded framework for research ideation.
* A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017a) Attention is all you need. Advances in neural information processing systems 30.
* A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017b) Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 5998–6008.
* V. Venugopal and E. Olivetti (2024) MatKG: an autonomously generated knowledge graph in material science. Scientific Data 11(1), pp. 217.
* A. A. Volk and M. Abolhasani (2024) Performance metrics to unleash the power of self-driving labs in chemistry and materials science. Nature Communications 15, pp. 1378.
* A. Y. Wang, S. K. Kauwe, R. J. Murdock, and T. D. Sparks (2021a) Compositionally restricted attention-based network for materials property predictions. npj Computational Materials 7(1), pp. 77.
* A. Y. Wang, S. K. Kauwe, R. J. Murdock, and T. D. Sparks (2021b) Compositionally restricted attention-based network for materials property predictions. npj Computational Materials 7(1), pp. 77.
* C. Wang, Y. Zhang, C. Wen, M. Yang, T. Lookman, Y. Su, and T. Zhang (2022a) Symbolic regression in materials science via dimension-synchronous-computation. Journal of Materials Science & Technology 122, pp. 77–83.
* G. Wang, Y. Xie, Y. Wu, L. Gan, C. Zhuang, J. Gu, and T. Lin (2023a) Voyager: an open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
* H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Van Katwyk, A. Deac, A. Anandkumar, K. Bergen, C. P. Gomes, S. Ho, P. Kohli, J. Lasenby, J. Leskovec, T. Liu, A. Manrai, D. Marks, B. Ramsundar, L. Song, J. Sun, J. Tang, P. Veličković, M. Welling, L. Zhang, C. W. Coley, Y. Bengio, and M. Zitnik (2023b) Scientific discovery in the age of artificial intelligence. Nature 620(7972), pp. 47–60.
* H. Wang, J. Guo, L. Kong, R. Ramprasad, P. Schwaller, Y. Du, and C. Zhang (2025) LLM-augmented chemical synthesis and design decision programs. arXiv preprint arXiv:2505.07027.
* H. Wang, W. Li, X. Jin, K. Cho, H. Ji, J. Han, and M. Burke (2022b) Chemical-reaction-aware molecule representation learning. In Proc. The International Conference on Learning Representations (ICLR2022).
* K. Wang, V. Gupta, C. S. Lee, Y. Mao, M. N. T. Kilic, Y. Li, Z. Huang, W. Liao, A. Choudhary, and A. Agrawal (2024a) XElemNet: towards explainable ai for deep neural networks in materials science. Scientific Reports 14(1), pp. 25178.
* W. Wang, X. Jiang, S. Tian, P. Liu, T. Lookman, Y. Su, and J. Xie (2023c) Alloy synthesis and processing by semi-supervised text mining. npj Computational Materials 9(1), pp. 183.
* X. Wang, Y. Sheng, J. Ning, J. Xi, L. Xi, D. Qiu, J. Yang, and X. Ke (2023d) A critical review of machine learning techniques on thermoelectric materials. The Journal of Physical Chemistry Letters 14(7), pp. 1808–1822.
* Y. Wang, N. Wagner, and J. M. Rondinelli (2019) Symbolic regression in materials science. MRS communications 9(3), pp. 793–805.
* Z. Wang, A. Chen, K. Tao, Y. Han, and J. Li (2024b) MatGPT: a vane of materials informatics from past, present, to future. Advanced Materials 36(6), pp. 2306733.
* Z. Wang, L. Hou, T. Lu, Y. Wu, Y. Li, H. Yu, and H. Ji (2024c) Enabling language models to implicitly learn self-improvement.
* Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas (2016) Bayesian optimization in a billion dimensions via random embeddings. arXiv preprint arXiv:1301.1942.
* L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials 2, pp. 16028.
* L. Ward, A. Dunn, A. Faghaninia, N. E. Zimmermann, S. Bajaj, Q. Wang, J. H. Montoya, J. Chen, K. Bystrom, M. Dylla, et al. (2018) Matminer: an open source toolkit for materials data mining. Computational Materials Science 152, pp. 60–69.
* J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou (2022) Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35, pp. 24824–24837.
* L. Wei, Q. Li, Y. Song, S. Stefanov, R. Dong, N. Fu, E. M. Siriwardane, F. Chen, and J. Hu (2024) Crystal composition transformer: self-learning neural language model for generative and tinkering design of materials. Advanced Science 11(36), pp. 2304305.
* D. Weininger (1988) SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), pp. 31–36.
* M. Wen, W. Huang, J. Dai, and S. Adhikari (2025) Cartesian atomic moment machine learning interatomic potentials. npj Computational Materials 11(1), pp. 128.
* B. Weng, Z. Song, R. Zhu, Q. Yan, Q. Sun, C. G. Grice, Y. Yan, and W. Yin (2020) Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nature Communications 11(1), pp. 3513.
* L. Weston, V. Tshitoyan, J. Dagdelen, O. Kononova, A. Trewartha, K. A. Persson, G. Ceder, and A. Jain (2019) Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. Journal of chemical information and modeling 59(9), pp. 3692–3702.
* H. Wijk, T. Lin, J. Becker, S. Jawhar, N. Parikh, T. Broadley, L. Chan, M. Chen, J. Clymer, J. Dhyani, E. Ericheva, K. Garcia, B. Goodrich, N. Jurkovic, H. Karnofsky, M. Kinniment, A. Lajko, S. Nix, L. Sato, W. Saunders, M. Taran, B. West, and E. Barnes (2024) RE-Bench: evaluating frontier AI R&D capabilities of language model agents against human experts. External Links: 2411.15114.
* Y. Wu, M. Ding, H. He, Q. Wu, S. Jiang, P. Zhang, and J. Ji (2025) A versatile multimodal learning framework bridging multiscale knowledge for material design. npj Computational Materials 11(1), pp. xxx. Note: Article number pending pagination at time of citation.
* T. Xie, X. Fu, O. Ganea, R. Barzilay, and T. Jaakkola (2022) Crystal diffusion variational autoencoder for periodic material generation. External Links: 2110.06197.
* T. Xie and J. C. Grossman (2018a) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, pp. 145301.
* T. Xie and J. C. Grossman (2018b) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, pp. 145301.
* Y. Xie, K. He, and A. Castellanos-Gomez (2025) Toward full autonomous laboratory instrumentation control with large language models. Small Structures, pp. 2500173.
* E. Ximendes, R. Marin, L. D. Carlos, et al. (2022) Less is more: dimensionality reduction as a general strategy for more precise luminescence thermometry. Light: Science & Applications 11, pp. 237.
* R. Xing, H. Yao, Z. Xi, M. Sun, Q. Li, J. Tian, H. Wang, D. Xu, Z. Ma, and L. Zhao (2025) Interpretable x-ray diffraction spectra analysis using confidence evaluated deep learning enhanced by template element replacement. npj Computational Materials 11(281).
* M. Xiong, A. Santilli, M. Kirchhof, A. Golinski, and S. Williamson (2024) Efficient and effective uncertainty quantification for llms. In Neurips Safe Generative AI Workshop 2024.
* P. Xu, X. Ji, M. Li, and W. Lu (2023) Small data machine learning in materials science. npj Computational Materials 9(1), pp. 42.
* W. Xu, K. Mei, H. Gao, J. Tan, Z. Liang, and Y. Zhang (2025) A-mem: agentic memory for llm agents. arXiv preprint arXiv:2502.12110.
* Y. Xu and Q. Qian (2022) I-sisso: mutual information-based improved sure independent screening and sparsifying operator algorithm. Engineering Applications of Artificial Intelligence 116, pp. 105442.
* R. Yan, X. Jiang, W. Wang, D. Dang, and Y. Su (2022) Materials information extraction via automatically generated corpus. Scientific Data 9(1), pp. 401.
* S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao (2023) ReAct: synergizing reasoning and acting in language models. In International Conference on Learning Representations.
* Y. Yao, J. Zhu, Y. Liu, G. Ren, X. Li, and P. Ou (2025) Large language models for heterogeneous catalysis. Wiley Interdisciplinary Reviews: Computational Molecular Science 15(5), pp. e70046.
* G. H. Yi, J. Choi, H. Song, O. Miano, J. Choi, K. Bang, B. Lee, S. S. Sohn, D. Buttler, A. Hiszpanski, et al. (2025) MaTableGPT: gpt-based table data extractor from materials science literature. Advanced Science 12(16), pp. 2408221.
* H. Yu, T. Chen, J. Feng, J. Chen, W. Dai, Q. Yu, Y. Zhang, W. Ma, J. Liu, M. Wang, et al. (2025) MemAgent: reshaping long-context llm with multi-conv rl-based memory agent. arXiv preprint arXiv:2507.02259.
* S. Yu, N. Ran, and J. Liu (2024) Large-language models: the game-changers for materials science research. Artificial Intelligence Chemistry 2(2), pp. 100076.
* L. Yuan, Y. Yu, Y. Wei, Y. Wang, Z. Wang, and F. Wu (2024) Active retrosynthetic planning aware of route quality. In The Twelfth International Conference on Learning Representations.
* S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim (2019) Graph transformer networks. Advances in neural information processing systems 32.
* C. Zeni, R. Pinsler, D. Zügner, A. Fowler, M. Horton, X. Fu, S. Shysheya, J. Crabbé, L. Sun, J. Smith, B. Nguyen, H. Schulz, S. Lewis, C. Huang, Z. Lu, Y. Zhou, H. Yang, H. Hao, J. Li, R. Tomioka, and T. Xie (2024) MatterGen: a generative model for inorganic materials design. External Links: 2312.03687.
* E. Zhang, M. Dao, G. E. Karniadakis, and S. Suresh (2022) Analyses of internal structures and defects in materials using physics-informed neural networks. Science advances 8(7), pp. eabk0644.
* J. Zhang, B. Liu, Z. Liu, J. Wu, S. Arnold, H. Shi, T. Osterrieder, J. A. Hauch, Z. Wu, J. Luo, et al. (2023a) Optimizing perovskite thin-film parameter spaces with machine learning-guided robotic platform for high-performance perovskite solar cells. Advanced Energy Materials 13(48), pp. 2302594.
* L. Zhang and M. Stricker (2023) MatNexus: a comprehensive text mining and analysis suite for materials discover. arXiv preprint arXiv:2311.06303.
* R. Zhang, J. Zhang, Q. Chen, B. Wang, Y. Liu, Q. Qian, D. Pan, J. Xia, Y. Wang, and Y. Han (2023b) A literature-mining method of integrating text and table extraction for materials science publications. Computational Materials Science 230, pp. 112441.
* R. Zhang, C. Wu, Q. Yang, C. Liu, Y. Wang, K. Li, L. Huang, and F. Zhou (2024a) MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning. Bioinformatics 40(4), pp. btae118.
* T. Zhang, G. Cai, S. Liu, and A. J. Puppala (2017) Investigation on thermal characteristics and prediction models of soils. International Journal of Heat and Mass Transfer 106, pp. 1074–1086.
* T. Zhang and D. Yang (2025) Multimodal machine learning with large language embedding model for polymer property prediction. Chemistry of Materials 37(18), pp. 7002–7013.
* W. Zhang, Q. Wang, X. Kong, J. Xiong, S. Ni, D. Cao, B. Niu, M. Chen, Y. Li, R. Zhang, et al. (2024b) Fine-tuning large language models for chemical text mining. Chemical science 15(27), pp. 10600–10611.
* Y. Zhang, X. He, S. Gao, A. Zhou, and H. Hao (2024c) Evolutionary retrosynthetic route planning [research frontier]. IEEE Computational Intelligence Magazine 19(3), pp. 58–72.
* Y. Zhang, S. A. Khan, A. Mahmud, H. Yang, A. Lavin, M. Levin, J. Frey, J. Dunnmon, J. Evans, A. Bundy, S. Dzeroski, J. Tegner, and H. Zenil (2025) Exploring the role of large language models in the scientific method: from hypothesis to discovery. npj Artificial Intelligence 1(1), pp. 14.
* Y. Zhang, F. Chen, Z. Liu, Y. Ju, D. Cui, J. Zhu, X. Jiang, X. Guo, J. He, L. Zhang, et al. (2024d) A materials terminology knowledge graph automatically constructed from text corpus. Scientific Data 11(1), pp. 600.
* D. Zhao, Z. Shi, Z. Liu, W. Gao, W. Wang, B. Jiang, J. Chen, M. Hu, J. Li, Y. Yang, X. Yuan, Z. Wang, Y. Zhang, N. Liu, Y. Zhao, Y. Li, Z. Wang, J. Li, Q. Zhang, S. Lu, H. Yu, Y. Wang, C. Zhang, J. Sun, X. Yang, Z. Ma, Y. Liu, W. Ye, Z. Chai, X. Li, L. Zhang, X. Zhu, G. Li, K. Song, P. Li, Y. Xiong, K. Xu, H. Li, W. Li, X. Li, and Z. Wang (2025a) DiffractGPT: atomic structure determination directly from powder x-ray diffraction patterns. The Journal of Physical Chemistry Letters 16(xx), pp. xxxx–xxxx. Note: Advance Article; volume/issue/page pending at time of citation.
* J. Zhao, S. Huang, and J. M. Cole (2023) OpticalBERT and opticaltable-sqa: text-and table-based language models for the optical-materials domain. Journal of chemical information and modeling 63(7), pp. 1961–1981.
* X. Zhao, J. Greenberg, S. McClellan, Y. Hu, S. Lopez, S. K. Saikin, X. Hu, and Y. An (2021) Knowledge graph-empowered materials discovery. In 2021 IEEE International Conference on Big Data (Big Data), pp. 4628–4632.
* Z. Zhao, D. Ma, L. Chen, L. Sun, Z. Li, Y. Xia, B. Chen, H. Xu, Z. Zhu, S. Zhu, et al. (2025b) Developing chemdfm as a large language foundation model for chemistry. Cell Reports Physical Science 6(4).
* X. Zhong, B. Gallagher, S. Liu, B. Kailkhura, A. Hiszpanski, and T. Y. Han (2022a) Explainable machine learning in materials science. npj Computational Materials 8(1), pp. 204.
* X. Zhong, B. Gallagher, S. Liu, B. Kailkhura, A. Hiszpanski, and T. Y. Han (2022b) Explainable machine learning in materials science. npj Computational Materials 8(1), pp. 204.
* M. Zhou, Y. Fung, L. Chen, C. Thomas, H. Ji, and S. Chang (2023) Enhance chart understanding via visual language pre-training on plot table pairs. In Proc. The 61st Annual Meeting of the Association for Computational Linguistics (ACL2023).
* A. Ziletti, D. Kumar, M. Scheffler, and L. M. Ghiringhelli (2018) Insightful classification of crystal structures using deep learning. Nature communications 9(1), pp. 2775.