Improving the Validity of Automatically Generated Feedback via Reinforcement Learning
Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid esp...
Gespeichert in:
| Veröffentlicht in: | arXiv.org |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Paper |
| Sprache: | Englisch |
| Veröffentlicht: |
Ithaca
Cornell University Library, arXiv.org
12.12.2024
|
| Schlagworte: | |
| ISSN: | 2331-8422 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and LLM-generated feedback. Second, we propose a framework for feedback generation that optimizes both correctness and alignment using reinforcement learning (RL). Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO). We show that our methods significantly increase the correctness and alignment of generated feedback with Llama 2, an open-source LLM, qualitatively analyze our generation and evaluation systems using case studies, and outline several areas for future work. |
|---|---|
| AbstractList | Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and LLM-generated feedback. Second, we propose a framework for feedback generation that optimizes both correctness and alignment using reinforcement learning (RL). Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO). We show that our methods significantly increase the correctness and alignment of generated feedback with Llama 2, an open-source LLM, qualitatively analyze our generation and evaluation systems using case studies, and outline several areas for future work. |
| Author | Scarlatos, Alexander Woodhead, Simon Lan, Andrew Smith, Digory |
| Author_xml | – sequence: 1 givenname: Alexander surname: Scarlatos fullname: Scarlatos, Alexander – sequence: 2 givenname: Digory surname: Smith fullname: Smith, Digory – sequence: 3 givenname: Simon surname: Woodhead fullname: Woodhead, Simon – sequence: 4 givenname: Andrew surname: Lan fullname: Lan, Andrew |
| BookMark | eNotj1FLwzAURoMoOOd-gG8BnzuTm2ZLHsdwc1AQZPo6bpNbzWxTbdPh_r0FffreznfODbuMbSTG7qSY50Zr8YDdTzjNIRdqLqQS-QWbgFIyMznANZv1_VEIAYslaK0m7HXXfHXtKcR3nj6Iv2EdfEhn3lZ8NaS2wRQc1vWZbylSh4k83xD5Et0nPwXkLxRi1XaOGoqJF4RdHFm37KrCuqfZ_07ZfvO4Xz9lxfN2t14VGWqwmbEWjKgQK196cFQCCJLeG28VeikcjEG6NE5pGJWdwrKyYiFQulIuwagpu__DjgnfA_XpcGyHLo6PB7Bqmeux26pfHg9Uiw |
| ContentType | Paper |
| Copyright | 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.48550/arxiv.2403.01304 |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland ProQuest Central Essentials - QC ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-a529-899280faafdbd2ceb220e1dd8d93ad10c28555b8c352002c3abf9060a1cb17283 |
| IEDL.DBID | BENPR |
| IngestDate | Mon Jun 30 09:29:48 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a529-899280faafdbd2ceb220e1dd8d93ad10c28555b8c352002c3abf9060a1cb17283 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| OpenAccessLink | https://www.proquest.com/docview/2937453319?pq-origsite=%requestingapplication% |
| PQID | 2937453319 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2937453319 |
| PublicationCentury | 2000 |
| PublicationDate | 20241212 |
| PublicationDateYYYYMMDD | 2024-12-12 |
| PublicationDate_xml | – month: 12 year: 2024 text: 20241212 day: 12 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2024 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 1.8924478 |
| SecondaryResourceType | preprint |
| Snippet | Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Alignment Annotations Data augmentation Distance learning Error analysis Feedback Large language models Machine learning Tutoring |
| Title | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning |
| URI | https://www.proquest.com/docview/2937453319 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ3LT8JAEMY3Cpp48h0fSPbgtdDdbkv3ZNRANFHSIBo8kX0aIqHYAtH_3tlS8GDixWPTSzNtZ377zeQbhC4FMVIBV3s-s5HHVCg8bin34OuxNALCD5c-sw-tbjceDHhSCm55OVa5yolFotapchp5E8pSiwGbEH41_fDc1ijXXS1XaGyiqnMqYxVUvWl3k95aZaFRC5g5WLYzC_Oupsg-R4uGs6FruK4d-5WEi8rS2f3vM-2haiKmJttHG2ZygLaLiU6VH6LntVyAgfHwC_C2BuLGqcXX81laOLWK8fgLL42nATxxByqZFOodL0YC90zhqaoK-RCXNqxvR6jfafdv77xyh4InQgg5nKZo7FshrJaaKjhGU98QrWPNA6GJryjEJJSxCpz9ElWBkJb7kS-Ikm5zVXCMKpN0Yk4Q5lZFTMdc-UIy5ztoCAXeYzwOLYm5PEW1VZCG5X-QD38idPb37XO0QwEX3KAIoTVUmWVzc4G21GI2yrN6-VrrbjLzCa6S-8fk9RsJcrAs |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT8JAEJ4gavTkOz5Q96DHSrt9sHswxqgEwiPEoNET2e7DEAkgBZQf5X90tjw8mHjz4LlJ0-1MZ75vZvoNwJnwdCwRVztuYCInkKFwuKHcQe8xNEKEH051ZquFep09PfFGBj7n_8LYscp5TEwDtepJWyPPY1oqBIhNPH7Vf3Ps1ijbXZ2v0Ji6RUVP3pGyJZflW7TvOaXFu-ZNyZltFXBEiA-B_IIy1whhVKyoRGJJXe0pxRT3hfJcSVkYhjGTvhUkotIXseFu5ApPxnaXk4-3XYLlAH2dZWG5Ua41nhdFHRoVEKL70-5pqhWWF4OP9vjCqt5d2CZh8CPmp4msuPHPXsEmHl309WALMrq7DavpvKpMduBhUQwhiGDJI7IJhXyC9Ay5Hg17qQ6t6HQmZCqrjbCaFDFPx0K-knFbkHudKsbKtDhKZiKzL7vQ_IuD7EG22-vqfSDcyChQjEtXxIFVVdQeRTQbcBYaj_H4AHJzm7RmX3nS-jbI4e-XT2Gt1KxVW9VyvXIE6xSBkR2J8WgOssPBSB_DihwP28ngZOZRBFp_bMAv3skJxA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+the+Validity+of+Automatically+Generated+Feedback+via+Reinforcement+Learning&rft.jtitle=arXiv.org&rft.au=Scarlatos%2C+Alexander&rft.au=Smith%2C+Digory&rft.au=Woodhead%2C+Simon&rft.au=Lan%2C+Andrew&rft.date=2024-12-12&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2403.01304 |