August 16th, 2024

Can Large Language Models Understand Symbolic Graphics Programs?

The study evaluates large language models' understanding of symbolic graphics programs, introducing a benchmark and Symbolic Instruction Tuning to enhance reasoning and instruction-following capabilities in visual content comprehension.

Read original article

The paper titled "Can Large Language Models Understand Symbolic Graphics Programs?" explores the capabilities of large language models (LLMs) in understanding symbolic graphics programs, which are used to procedurally generate visual data. The authors highlight the challenge of assessing LLMs due to their exposure to various tasks during training. They propose a new benchmark to evaluate LLMs based on their ability to answer questions about graphics content derived from symbolic programs. This task is particularly difficult as the questions cannot be easily answered from the symbolic programs alone, but can be answered from the rendered graphics. The study finds that LLMs need to imagine the visual content to understand the symbolic programs effectively. The authors evaluate existing LLMs on this benchmark and observe that models with better reasoning capabilities perform better. Additionally, they introduce a method called Symbolic Instruction Tuning (SIT), which involves querying a model with questions and images generated from symbolic programs to enhance its understanding. The results indicate that SIT can improve both the ability to reason about visual scenes and the general instruction-following capabilities of LLMs.

- The study assesses LLMs' understanding of symbolic graphics programs.

- A new benchmark is created to evaluate LLMs based on their reasoning about graphics content.

- The task is challenging as it requires imagining visual content from symbolic representations.

- Symbolic Instruction Tuning (SIT) is introduced to enhance LLMs' understanding.

- Better reasoning models perform more effectively on the proposed benchmark.

7 comments

By @abernard1 - 9 months

Since people are not reading the paper, here were the conclusions:

> Is it really easy to answer semantic reasoning questions over symbolic graphics programs? We provide an intriguing experiment to demonstrate that SVG programs can be quite difficult for LLMs to understand such that even if the corresponding rendered images are fairly easy for humans to recognize, all these powerful LLMs still fail dramatically, only reaching a chance-level accuracy.

They then fine tune the model for this problem, and it achieves better results.

By @brap - 9 months

Crazy that LLMs are able to do that when they’re not even trained on visual data.

By @skybrian - 9 months

It’s difficult to tell what they’re proposing from the title or abstract, but asking questions about an SVG image, given only the source code, seems like an interesting challenge.

There are a lot of ways to generate the same image in different ways, or to slightly change the source code, resulting in a very different image. This could be used to train a deeper understanding of program-like text.

By @slashdave - 9 months

In the title and abstract there is a proposal that LLMs "understand", "reason", and "imagine." It's off putting to me, since LLMs are not capable of any of these.

By @magicalhippo - 9 months

I was recently playing with some LLMs regarding a relay driver circuit. ChatGPT generated a circuit diagram for me in ASCII art style, but it contained some errors. For fun I tried to ask some others if they could identify the errors.

Here's the diagram (formatting errors were from ChatGPT):

         +5V
           |
           |
          ( ) Relay Coil
           |
           +----->|----+
           |      Diode |
           |            |
           |            |
           +------------+
           |
           |
         Drain (D) of MOSFET
            |
          Source (S)
            |
          -----
           GND
            |
         Gate (G)---[220Ω]--- GPIO (3.3V MCU)
            |
          [10kΩ]
            |
           GND

Here's what Llama 3.1 70B said:

After reviewing the diagram, I've found a few potential issues:

- The diode is not connected to the relay coil in a way that would prevent back-EMF from damaging the MOSFET. Typically, the diode is connected in parallel with the relay coil, with the anode connected to the positive terminal of the coil and the cathode connected to the negative terminal. In this diagram, the diode appears to be connected in series with the relay coil, which is not correct.

- The MOSFET is not properly biased. The gate of the MOSFET is connected to the GPIO pin through a 220Ω resistor, but there is no pull-down resistor to ensure the gate is at a known state when the GPIO pin is not driving it. The 10kΩ resistor connected to GND is likely intended to be a pull-down resistor, but it's not clear if it's connected to the gate or not.

- The relay coil is not specified, but it's likely that it requires a higher current than the MOSFET can handle. Typically, a relay driver circuit would include a current-limiting resistor or a transistor to handle the high current required by the relay coil.

So it correctly identified one of the main issues, the flyback diode is wrong, but failed to correctly identify the issue with the MOSFET gate being directly tied to ground. I admit, it's not a typical way to draw things so fair enough, after all it did mention there was something fishy with the pull-down.

Claude 3 Haiku provided a very similar answer. It was a bit more explicit and included more details. For instance it thought there might be an issue driving the MOSFET from 3.3V, though failed to suggest a logic-level MOSFET, instead suggesting a level-shifter. It also suggested the 220 Ohm gate resistor might not be enough to protect the microcontroller. Though it also tried to correct the diagram but failed: it changed the diode to ->|<-, and didn't put it in parallel with the coil.

All in all, somewhat impressed.

However they get easily confused. I tried to improve the diagram, and asked again. Here's the updated diagram:

         +5V
           |
           |          
           |
           +----->|--------+
           |      Diode    |
           |               |
          ( ) Relay Coil   |
           |               |
           +---------------+
           |
           |
    Drain (D) of MOSFET
           |
           |]- Gate (G)-----+-[220Ω]--- GPIO (3.3V MCU)
           |                |
           |             [10kΩ]
       Source (S)           |
           |                |
           +----------------+
           |
         -----
          GND

And Llama 3.1 now says:

After examining the diagram, I notice a few potential issues:

- The diode is not connected to the relay coil in a way that would prevent back-EMF damage to the MOSFET. Typically, a flyback diode is connected in parallel with the relay coil, with the anode connected to the coil's positive terminal and the cathode connected to the coil's negative terminal. In this diagram, the diode appears to be connected in series with the relay coil, which is not correct. - The MOSFET's source (S) is not connected to GND. In a typical relay driver circuit, the source of the MOSFET is connected to GND to provide a return path for the current. - The 10kΩ resistor is not connected to anything. It's possible that it's intended to be a pull-down resistor for the MOSFET's gate, but it's not clear where it's supposed to be connected.

So it totally fails to "see" that the 10k is now correctly connected to ground, and that it shares the ground point (in the diagram) with the source of the MOSFET. At least it figures it's meant to be a pull-down.

However it did notice the diode is now correctly placed in parallel with the coil, and did again correctly see the diode was the wrong way around. So that's something.

edit: I changed the diode direction and asked if that fixed it. Llama 3.1 did not "see" that and suggested a change.

Claude 3 did much better and noticed the direction change and was happy.

As I said, was just fooling around with this and thought I'd share. They're confident, but sometimes oh-so-wrong. Still, using a service where you can ask the same question to multiple models is nice. That way it can be easier to see when they're confabulating.

By @Lichtso - 9 months

What an awful paper title, saying "Symbolic Graphics Programs" when they just mean "vector graphics". I don't understand why they can not just use the established term instead. Also, there is no "program" here, in the same way that coding HTML is not programming, as vector graphics are not supposed to be Turing complete. And where they pulled the "symbolic" from is completely beyond me.

Can Large Language Models Understand Symbolic Graphics Programs?

Related

Related