33 Comments

Awesome article!

I've read around various reports where ChatGPT's actual logical thinking and problem-solving performance greatly increases when instructed to "think step by step and explain your reasoning", "describe your approach in great detail before tackling the problem" and such. Maybe that would allow the user to "inject" a step it missed in its reasoning.

I think it would be very interesting to prime it in such a way, correcting and adjusting its "train of thought" (so to speak) and maybe ask for a pseudocode implementation of the algorithm, before it even attempted to write a single line of actual code. I have no idea if it would matter, but from what I can recollect I've had better/more appropriate results when I tried to allow it for more prior planning instead of asking for code right away. Specifically, I noticed it tends to end in those "wrong solutions loop" (I've had those too) much less often. That was on ChatGPT3.5 though, I've no access to v4 ATM.

Expand full comment

Perhaps I'm missing a detail, but why is the pathfinding problem here not a textbook variation on Dijkstra? To quote WP: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Related_problems_and_algorithms

> The functionality of Dijkstra's original algorithm can be extended with a variety of modifications. For example, sometimes it is desirable to present solutions which are less than mathematically optimal. To obtain a ranked list of less-than-optimal solutions, the optimal solution is first calculated. A single edge appearing in the optimal solution is removed from the graph, and the optimum solution to this new graph is calculated. Each edge of the original solution is suppressed in turn and a new shortest-path calculated. The secondary solutions are then ranked and presented after the first optimal solution.

This would seem to cover your fire example and your different-requested-ranges example. You do Dijkstra to obtain the feasible path (taking into account movement penalties like water); then you calculate the next less optimal as described; you calculate the total damage of the feasible paths, sort by least damage to find the safest but still feasible path to show the player, and there you go - you get the safest feasible path first, then a modest amount of damage, then more damage, etc. This also covers the tradeoff between range and damage: if you target a nearby tile, then there will almost certainly be a safe feasible path, and an intermediate range may be able to avoid most damage, while if you target one at the maximum range, then you will just have to incur whatever damage happens along the way because that will be the only feasible path. This also naturally handles differently weighted damage if you have more hazards than just fire. Plus, it sounds much more straightforward and debuggable than whatever it is you have because it's factorized into first feasible movement, and then preference over movements, so you can see where any bugs come from: did the movement planning erroneously think a path was impossible, or did the preference misrank it?

> or if you think that it really could just be A* with a modified heuristic instead

I don't think that's equivalent to A* with a modified heuristic.

Expand full comment

Nice article! Just FYI: How would you correctly go about this? I imagine you have to calculate a specific angle range where the circles are allowed to touch for the crescents to overlap?

Expand full comment

Thanks for a good article! The "cat and fire" pathfinding problem is interesting, and I'm pleased to tell that - contrary to what the article says - the problem is solvable with a normal A* search.

It's just a matter of noticing, the the search states (graph nodes, or whatever we call them) are not the [col, row] squares in the 2D plane. Instead, each search state is uniquely defined by the 3 properties [col, row, visited-fire-squares].

Furthermore, the cost function for each move adds the penalty (1 + depthLimit) to the cost if it's a fire square. This ensures that the cat walks over fire squares only if it's necessary to do so to meet the budget. The minimum number of fire squares are visited in that case.

That's all there is to it!

Tyler wrote:

> (or if you think that it really could just be A* with a modified heuristic instead) (and if you are going to try and “suggest” to me ways to improve it, then please actually implement and try it in context first

I enjoyed implementing it in my home-made Lisp programming language. The code listing follows below, but first, to show that the program works correctly, here are the results for the test level with 6 fire squares presented in the article. I added a "back corridor" to the test level to show that with no depth limit, the A* algorithm finds the longer path around the obstacles and avoids walking over any of the fire squares.

Legend:

s: start

g: goal

f: fire

w: water (not used)

#: wall

-: empty

Uppercase move: move to a fire square

Lowercase move: move to an empty square

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 5: Not solved (It's unsolvable with this depth limit)

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 6: rRRRRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 8: ruRrrDRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit 10: ruurrdrDRr

----------

---f--###-

-s-ffffg--

---f--###-

----------

Depth limit +inf: ruurrrrrrrddll

Tyler wrote:

>[...] and know I won’t use it because what I have works fine

>already, has been *thoroughly* tested on thousands of

>different cases, and doesn’t have any performance issues either)

That's right; there is no reason to change running code that works. That said, perhaps it's interesting to see how the task can be solved in a minimum of code lines with a standard A* search algorithm.

______________

Code listing

I always enjoy writing miniature programs like this one. It's only ca. 110 lines, including input and output functions. The code is heavily commented in a style that makes it imperative to read it in a text editor with word wrapping disabled.

Even though it's not a standard Lisp dialect, it should be easy enough to get the gist of what the program is doing, and also to convert the code to other programming languages.

[It turned out that the post was too long with the complete code listing. Instead, here is a link to download it from DropBox.]

https://www.dropbox.com/s/y3jktdgwm5ytxj6/Mewgenics%20Pathfinding.lsp.txt?dl=1

Expand full comment

I think the interesting aspect here is that the GPT "API" needs to be learned, but it is not formally defined. You have to coax it and use your own intuition to know how to guide it to the desired outcome.

Expand full comment

Personnaly, I frequently bump into the character output limit of GPT 4, and my code output is cut in the middle. Any trick / workaround ?

Expand full comment
Mar 20, 2023·edited Mar 20, 2023

I think the tidy-and-efficient way to solve the original pathing problem is to include the number of fire tiles visited as a property of the _search space nodes_, then also include the count of fire tiles visited as the most significant cost tuple element. The A* heuristic can just be plus Manhattan distance in the physical dimension of the space, since admissibility only requires that heuristic never exceed the actual cost. Search will then treat reaching the same tile passing through 0, 1, etc fire tiles as separate possibilities, but will still consider only the most cost-efficient way of reaching a given tile crossing a given number of fire tiles.

I'm curious if any of the steps GPT-4 tried moved in this direction. If not, I think that's even more of an indictment of its limitations.

Expand full comment

Not to be _that guy_ but I'm just gonna be that one guy that everyone loves. The half-assed unsolicited backseat proofreader, yaaay.

Really though, I suppose I'll start off by saying that this is quite an interesting read. I'm curious what correct solutions to the crescent collision look like and what it would take for GPT-like AI to reach that.

But alas, while reading this I kept finding instances of "its / it's" that are mixed up, and every time my brain stumbles and trips over these. Maybe I could be better at ignoring them, but you even got some wrong in both directions. So I took a closer look after I was done reading, and while I do always feel conflicted about "blasting" people for tiny mistakes, well, I got the numbers now either way, so:

22x "its" -- correct use: 10 -- wrong use: 12

CWWWWWWCCCWCWCWWCWWCCC

16x "it's" -- correct use: 13 -- wrong use: 3

CCCCCWCCCCCCWCCW

So it seems like you mostly just forget the apostrophes, but then sprinkle in a few in places they don't belong. Idk, hopefully this helps correct those typos rather than coming off as being rude. :)

Expand full comment

Here is a site attempting to test outside GPT's training set:

https://BUGFIX-66.com

It was originally intended for Copilot, but it works for ChatGPT and the rest.

About 75% of the problems should be novel. They're easy enough that many dozens of Redditors have solved most of them.

Expand full comment

Great post.

Regarding crescents - I’m not sure you’re example fails 2a - it states both inner circles are *inside* the others outer circle.

The bullshitting interpretation is clever - I’m astounded it kindof even plausibly bullshits - it seems to sort of understand circles! Also I kindof think plausible bullshitting could also be like brainstorming - path finding could this work? I can almost think myself would come up with ok how bout this - at least my younger self - I’m too slow these days!! Lol

Expand full comment

I installed CoPilot in IntelliJ as soon as GitHub launched it. Since Microsoft controls both GitHub and OpenAI, I assume that the AI behind CoPilot is similar to ChatGPT.

CoPilot is fantastic when you need a function that solves a known problem you don't know. Then you write the function's name, and it suggests the rest. You make some minor changes, and it is ok.

The more I use it, the more CoPilot writes code using my style. In other words, it is pretty useful.

However, when I must solve an original problem and I try to prompt a suggestion, it does nothing, confirming that — for now — AI is excellent at learning and showing what it has learned but cannot really create new solutions.

Expand full comment

I tried ChatGPT with GPT-4 just today, and I think you nailed it. The solutions often _looks_ good and reasonable, but ultimately fails. Very good for creating templates, sketches, and so on.

Expand full comment

Really liked your article. Just my opinion about AI coding:

I don't think that most people, who claim that GPT-4 can code say that it can code difficult and very specific tasks. But it obviously can code simple and repetitive tasks and how much of the software engineering work is that difficult, really?

Of course we still need people to check the code from LLMs, but we are just at the beginning. I truely believe that Coding will move towards "low code" and text based code generation and we will see systems that fix code by jira ticket descriptions in the next years, semi-automating a lot of annoying easy work for us. We need to somehow get a share of the productivity increase, that's the real problem here.

Expand full comment

Well, two issues with this: your tile game prompts are a little vague (except the crescent ones). I think they should be more descriptive and give more invariants.

Second, regarding the crescent task: I think tons of programmers would also get it wrong when it comes to edge cases.

Heck, most people can't even write fizz buzz:

https://blog.codinghorror.com/why-cant-programmers-program/

Expand full comment

"at least for now."

Threatening.

Expand full comment
Comment deleted
Expand full comment