UVM Theses and Dissertations

Ask a Librarian

Threre are lots of ways to contact a librarian. Choose what works best for you.

HOURS TODAY

10:00 am - 4:00 pm

Reference Desk

(802) 656-2022

Voice

(802) 503-1703

Text

MAKE AN APPOINTMENT OR EMAIL A QUESTION

Email a Librarian

Submit a question for reply by e-mail.

Library Hours for Thursday, November 21st

All of the hours for today can be found below. We look forward to seeing you in the library.

HOURS TODAY

8:00 am - 12:00 am

MAIN LIBRARY

WITHIN HOWE LIBRARY

MapsM-Th by appointment, email govdocs@uvm.edu

Media Services8:00 am - 7:00 pm

Reference Desk10:00 am - 4:00 pm

OTHER DEPARTMENTS

Special Collections10:00 am - 6:00 pm

Dana Health Sciences Library7:30 am - 11:00 pm

Format:

Online

Author:

Grindle, Ryan

Title:

Perils and Pitfalls of Symbolic Regression

Dept./Program:

Computer Science

Year:

2021

Degree:

M.S.

Abstract:

The ever-growing accumulation of data makes automated distillation of understandable models from that data ever-more desirable. Deriving equations directly from data using symbolic regression, as performed by genetic programming, continues its appeal due to its algorithmic simplicity and lack of assumptions about equation form. However, few models besides a sequence-to-sequence approach to symbolic regression, introduced in 2020 that we call y2eq, have been shown capable of transfer learning: the ability to rapidly distill equations successfully on new data from a previously unseen domain, due to experience performing this distillation on other domains. In order to improve this model, it is necessary to understand the key challenges associated with it. We have identified three important challenges: corpus, coefficient, and cost. The challenge of devising a training corpus stems from the hierarchical nature of the data since the corpus should not be considered as a collection of equations but rather as a collection of functional forms and instances of those functional forms. The challenge of choosing appropriate coefficients for functional forms compounds the corpus challenge and presents further challenges during evaluation of trained models due to the potential for similarity between instances of different functional forms. The challenge with cost functions (used to train the model) is mainly the choice between numeric cost (compares y-values) and symbolic cost (compares written functional forms). In this work, we provide evidence for the existence of the corpus, coefficient, and cost challenges; we explore why these challenges exist in the model, and we propose possible solutions. We hope that this work can be used to initiate improvements to this already promising symbolic regression model.

Request print copy from Annex

Search Website

Search Directory

A to Z

Search Website

Search Directory

Collections

Research

Services

About

Help

Ask a Librarian

Threre are lots of ways to contact a librarian. Choose what works best for you.

10:00 am - 4:00 pm

Reference Desk

(802) 656-2022

Voice

(802) 503-1703

Text

Meet with a librarian or subject specialist for in-depth help.

Submit a question for reply by e-mail.

WANT TO TALK TO SOMEONE RIGHT AWAY?

Library Hours for Thursday, November 21st

All of the hours for today can be found below. We look forward to seeing you in the library.

HOURS TODAY

MAIN LIBRARY

WITHIN HOWE LIBRARY

OTHER DEPARTMENTS

CATQuest

Search the UVM Libraries' collections

UVM Theses and Dissertations