Zhehao Zhang

I am a first year Master student in Computer Science at Dartmouth College. Currently, I am a research intern at Stanford SALT Lab under the supervision of Diyi Yang. Previously, I worked as a Research Intern at Microsoft Research Lab – Asia. I received my bachelor's degree in Artificial Intelligence Honor Class at Shanghai Jiao Tong University.

My research interests lie in Natural Language Processing (NLP).

Please feel free to contact me by email!

Mail  /  Résumé  /  Google Scholar  /  Github 

profile photo
News📢

  • 2024 June 25th: A new preprint is released! Please check DARG which is a dynamic evaluation framework aiming to augment current reasoning benchmarks from the level of reasoning graphs.
  • 2024 Mar 13th: One first-author paper on LLMs for hierarchical table analysis is accepted to NAACL 2024. See you in Mexico City!
  • 2024 Mar: Honored to give a talk on Augmented Language Models at TRIP Lab at Dartmouth, hosted by Prof. Yaoqing Yang. Recording and the slide are available.
  • 2024 Feb: I will join Adobe Research as Research Intern in this summer. See you in San Jose and bay area!
  • 2023 Dec 14th: One first-author papers from my undergraduate is accepted to ICASSP 2024.
  • 2023 Oct 27th: The paper titled "Can Large Language Models Transform Computational Social Science?" is accepted to Computational Linguistics.
  • 2023 Oct 7th: Two first-author papers from my undergraduate are accepted to EMNLP 2023. See you in Singapore!

Research🔍

My current research interests lie in multiple fields in NLP including:

  • Tool-augmented LLMs
  • Dynamic evaluation of LLMs
  • Computational social science (NLP for social good)
  • LLM-based agents
  • Vison-Language Models
I also have a broad interest in other topics in NLP and Machine Learning.

DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang, Jiaao Chen, Diyi Yang
Arxiv Pre-print, 2024
Code / Project / Paper

TL; DR: We propose a dynamic evaluation framework named DARG, aimed at augmenting current reasoning benchmarks from the level of reasoning graphs. We evaluate 15 SOTA LLMs and observe a consistent performance decrease for all LLMs as the complexity level increases, along with increasing biases on some datasets.

E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit, and Extrapolate
Zhehao Zhang, Yan Gao Jian-Guang Lou
Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Code / Paper

TL; DR: We propose a tool-augmented LLM framework named $E^5$ that contains 5 stages to solve the challenging real-life hierarchical table analysis task which achieves an 85.08 exact match score and 93.11 GPT-4-Eval Score. We also introduce $F^3$ which significantly reduces the token length while maintaining useful information to analyze such huge tables.

CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data
Zhehao Zhang, Xitao Li, Yan Gao Jian-Guang Lou
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Code / Paper

TL; DR: We systematically evaluate LLMs' reasoning ability on tabular data and establish a comprehensive taxonomy on operation and reasoning types for table analysis. hen, we propose CRT-QA, a dataset of complex reasoning QA over tables. We propose ARC which effectively utilizes table analysis tools to solve table reasoning tasks without manually-annotated exemplars.

Mitigating Biases in Hate Speech Detection from A Causal Perspective
Zhehao Zhang, Jiaao Chen, Diyi Yang
Findings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Code / Paper

TL; DR: We analyze the generation process of HS biases from a causal view and identify two confounders that cause the biases. Propose Multi-Task Intervention and Data-Specific Intervention to mitigate them.

Can Large Language Models Transform Computational Social Science?
Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, Diyi Yang
Computational Linguistics, 2023
Code / Paper

TL; DR: We provide a road map for using LLMs as CSS tools and contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 24 representative CSS benchmarks.

Education🎓

Dartmouth College, Master of Science in Computer Science, 2023 - 2025 (expected)

Shanghai Jiao Tong University, B.Eng. in Artificial Intelligence (Honor Class), 2019 - 2023

Experiences🛠

Social and Language Technologies (SALT) lab at Stanford, Research Intern

Adobe Research, Research Intern

Data, Knowledge, and Intelligence group at Microsoft Research Lab – Asia, Research Intern

Selected Courses

AI courses: Natural language processing (94 points), Deep learning and application (92.95 points), Computer vision, Reinforcement learning (94 points), Machine learning, Machine learning project, Knowledge representation and reasoning (97 points), A practical course to intelligence perception and cognition (90 points), Brain-Inspired Intelligence (92 points), Artificial intelligence problem solving and practice (95 points), Intelligent speech recognition (92 points), Data mining (91 points), Game theory and multi-agent learning, Programming practices of artificial intelligence, Lab practice (A+)

Other CS courses: Data structure(Honor)(92 points), Thinking and approach of programming (C++)(Honor), Data structure (C++)(Honor), Design and analysis of algorithms, Computer architecture(91 points), Operating system(91 points), Internet of thing(95.5 points)

Math courses: Stochastic process (95 points), Mathematical analysis(Honor), Linear algebra(Honor), Discrete mathematics(Honor), Complex analysis(Honor), Probability and Statistics, Convex and linear optimization, Signals and Systems, Digital signal and image processing


Website template borrowed from Jon Barron's personal page Here