[U.S patent]
Online Shopping with AI assistant
Wrong! More than $30, polyester.
Every time you need an answer with many constraints
Goal: enable LLMs to generate correct answers,
even with many constraints
Formal solvers solve statements with many constraints.
For example finding \(x\), \(y\), \(z\) to solve:
\(x*z=y\land(z<0\lor y=0)\)
Used in applications as automated theorem proving and software testing.
Many constraints: formal solvers
Query: "I want a pet. Hyena or cat?"
Needs lot of additional information:
\(\Rightarrow\)needs an extensive ontology.
But also the point of the question is fuzzy, answering needs commonsense.
LLMs already have commonsense grounded in them.
Formal solvers for language is not scalable
Method idea: merge LLMs with formal solvers
1. Formal solver chooses which constraints satisfy, e.g .:
[entree] and [side] are vegetarian. [entree] is heavy. [side] is light.
2. LLM:
[entree] is "Falafel".
[side] is "Salad".
First simplify, then solve
Find vegetarian entree and side.
If the entree is heavy then the side is light.
Query:
Constraint-explicit:
[entree] and [side] are vegetarian.
IF [entree] is heavy THEN [side] is light.
\(\rightarrow \) Constraint Satisfiability Problem in NL
For evaluation:
The LLM replies by yielding \(n\) items from a list of items \(\mathcal{V}\).
Prompt: \(T_{1, ... ,m} = T(c_1, ... , c_m)\), \(c_i(x_1,...,x_n)\)
Human: [x_1] and [x_2] are vegetarian. [x_1] is high in calories.
LLM: \(\underset{x_1,x_2\in\mathcal{V}}{\min}-\log p(x_1,x_2|T_{1,...,m})\)
Answer: [x_1] is "rice".[x_2] is "salad".
Instead of:
Answer: [x_1] is "Falafel". [x_2] is "salad".
Failure can come from relying on spurious correlations between input and pre-training corpus (Token Bias).
\(\rightarrow\)can be caused by relying on position / value of tokens in the prompt or in the answer
\(\rightarrow\) instead of truly answering the question
Solution: to mitigate Token Bias, we reweight the logits \(-\log p(x_1,x_2|T_{1,...,m})\)
Metric: accuracy of smallest-loss solution \((x_1,x_2)\) over satisfying the \(m\) prompt constraints (%)
Results:
Poor base performance \(\sim 50\%\).
We can improve by 10-20% (relative terms).
Rewriting as constrained problem:
The sum of the prices for 7 people of [transportation_1][breakfast_1], [lunch_1], [dinner_1], [accommodation_1], ... , [accommodation_3] does not exceed 30,200.
[accommodation_1], [accommodation_2], [accommodation_3] must be suited for 7 people.
TravelPlanner: benchmark for real-world planning
with Language Agents
Query example (challenging):
Could you create a travel plan for 7 people from Ithaca to Charlotte spanning 3 days, from March 8th to March 14th, 2022, with a budget of $30,200?
\(\Rightarrow\) LLM replies with travel plan
Metrics: pass rate over satisfying all constraints (%)
Method | Pass rate (%) | Num calls (avg) |
---|---|---|
Direct prompting | 2.8 | 1 |
CoT | 2.8 | 1 |
React | 7.8 | 7 |
Ours | 7.2 |
2.5 |
Method: explicit constraints + self-refinement scheme
\(\Rightarrow\) We can improve Claude of more than 200%!
From 3% to 7%, less LLM calls than React
We propose a method to use LLMs as theory solvers for constrained language tasks, in tandem with formal solvers:
Conclusions
Limitations and Future steps
\(x+y =3\)
IF \(x > 2\):
\(y = 0\)
1. SAT solver chooses which statements satisfy, e.g . :
\(x+y = 3\) AND \(x> 2\) AND \(y= 0\)
2. Theory solver:
\(x=3\), \(y = 0\)
SMT solvers first simplify, then solve
"Find a first and second course without pork meat. The first course should be from Asian cuisine, and the second from Italian cuisine. If the first course is light in calories, the second should be heavy, and vice versa. If the first course is steamed, the second must also be steamed"
Failure:
"Vegetable dumplings (steamed)"
"Eggplant Parmigiana (baked)"
Answers more consistent with questions, increasing trust
[Claude 3 Sonnet]
"[first_course] and [second_course] are dishes without pork meat. If the calories of [first_course] are light, then the calories of [second_course] should be heavy, and vice versa if the calories of [first_course] are heavy, the calories of [second_course]
should be light. [first_course] must be part of the Asian cuisine. [second_course] must be part of the Italian cuisine. If [first_course] is steamed, then [second_course] must also be steamed."
Answers more consistent with questions, increasing trust
Correct:
"Vegetable dumplings (steamed)"
"Steamed Cheese and Spinach Ravioli"
[Claude 3 Sonnet]
Three problems:
1. Use a critic aligned with truth of language \(\Rightarrow\) LLM
2. How to combine the constraints
3. How to sample.
"[entree] and [side] are vegetarian"
...
"[side] is heavy"
\( \left\{ \begin{aligned} L[c_1(x_1, \ldots, x_n)] \\ \,\\ L[c_m(x_1, \ldots, x_n)] \end{aligned} \right.\)
...
The sum of the prices for 7 people of [transportation_1][breakfast_1], [lunch_1], [dinner_1], [accommodation_1], ... , [accommodation_3] does not exceed 30,200.
[accommodation_1], [accommodation_2], [accommodation_3] must be suited for 7 people.
Travel Plan:
Day 1:
[Current City_1]: from Ithaca to Charlotte
[Transportation_1]: Flight Number: F3633413...
[Breakfast_1]: Nagaland's Kitchen, Charlotte
....
LLM Replies (+ Self-refining procedure)
Results
Metrics: pass rate over satisfying commonsense and hard constraints with correct format (%)
Baselines: CoT, React
Results: from 3% to 7%, same as React with less LLM calls
Curse of dimensionality occurs when learning structureless data in high dimension \(d\):
VS
\(\varepsilon\sim P^{-\beta}\)
\(\Rightarrow\) Data must be structured and
Machine Learning should capture such structure.
Key questions motivating this thesis:
Reducing complexity with depth
Deep networks build increasingly abstract representations with depth (also in brain)
Intuition: reduces complexity of the task, ultimately beating curse of dimensionality.
Two ways for losing information
by learning invariances
Discrete
Continuous
[Zeiler and Fergus 14, Yosinski 15, Olah 17, Doimo 20,
Van Essen 83, Grill-Spector 04]
[Shwartz-Ziv and Tishby 17, Ansuini 19, Recanatesi 19, ]
[Bruna and Mallat 13, Mallat 16, Petrini 21]
Hierarchical structure
How many training points?
Quantitative predictions in a model of data
sofa
[Chomsky 1965]
[Grenander 1996]
Deep networks learn with a number of data polynomial in the \(d\)
\(P^*\sim n_c m^L\)
Generative technique: