Inria, the French national research institute for the digital sciences
Organisation/Company: Inria, the French national research institute for the digital sciences
Research Field: Computer science
Researcher Profile: First Stage Researcher (R1)
Country: France
Application Deadline: 10 Dec 2024 - 23:00 (UTC)
Type of Contract: Temporary
Job Status: Full-time
Hours Per Week: 38.5
Offer Starting Date: 1 Jan 2025
Is the job funded through the EU Research Framework Programme? Not funded by a EU programme
Reference Number: 2024-08347
Is the Job related to staff position within a Research Infrastructure? No
Offer Description Large Language Models (LLMs) are trained to predict missing words in many situations, which leads them to absorb knowledge, natural language structure, and some (brittle) algorithmic problem-resolution capabilities.
By contrast, symbolic AI matured efficient algorithms to reliably solve various narrow problems (first order logic, modal logics, planning, constraint satisfaction...), but it is challenging to successfully apply them in real world problems requiring natural language understanding and knowledge that is hard to formalize.
The goal of the Adada project is to construct reasoning examples to infuse symbolic AI into large language models. To do so, we will formalize a general problem generation framework and instantiate multiple types of symbolic problems generators. We will use existing symbolic solvers to obtain solutions and fine-tune language models to match the solver outputs.
This will enable an adaptive dataset generation that will prevent dataset obsolescence and personalize dataset generation to specific applications or to specific models (newer/larger models need harder tasks). This PhD student position will be supported by the Adada ANR project (Adaptive datasets for LLM reasoning enhancement).
This PhD student will collaborate with Damien Sileo and the Adada consortium (engineers and interns).
The PhD student should work on designing new methods for steerable problem generation (This is related to data value generation: https://arxiv.org/abs/1909.11671 ).
The core problem is to steer a sampling process to produce data points that are different from each other, and that are also interesting (good level of difficulty, close to real world tasks).
For example, it is easy to generate logic problems that are hard to solve for LLMs, e.g. parity problems at scale (does ~~~~~~~p entail p?). But these problems are difficult for LLMs but not very interesting.
Responsibilities Survey existing research
Participate in the construction of formal synthetic problem generators (starting with context free grammars, but also using language models for guidance, with efficiency considerations)
Formulate research questions, design, and conduct controlled experiments
Evaluate generation strategies on multiple external downstream tasks
Write articles and disseminate research results
Languages English (French not mandatory)
Programming language: Python
Deep learning and statistics background
Knowledge of logic and symbolic AI is appreciated
Specific Requirements Strong knowledge of deep learning and ideally reinforcement learning
Autonomy, critical thinking, willingness to tackle hard problems
Interest in formal algorithms
Strong scientific background
Knowledge of NLP
Languages:
French Level: Basic
English Level: Good
Additional Information Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage
Salary:
1st and 2nd year: 2100 € (gross monthly salary)
3rd year: 2190 € (gross monthly salary)
#J-18808-Ljbffr