Skip to content

clojure-mode locks up when font-locking strings with many escaped characters #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
plexus opened this issue Apr 16, 2021 · 3 comments · Fixed by #591
Closed

clojure-mode locks up when font-locking strings with many escaped characters #589

plexus opened this issue Apr 16, 2021 · 3 comments · Fixed by #591

Comments

@plexus
Copy link
Contributor

plexus commented Apr 16, 2021

Steps to reproduce the problem

Run cider-pprint-eval-last-sexp on something that returns a large string (mine was about ~140k characters), with multiple escaped characters.

This completely freezes up my Emacs, the only way to get out of it is with a killall -SIGUSR2 emacs, which then gives you a stracktrace of where it was at the time.

Debugger entered--entering a function:
* #f(compiled-function () #<bytecode 0x22a293b5c0344f>)()
  clojure-string-start(t)
  clojure--font-locked-as-string-p()
  clojure-font-lock-escaped-chars(149240)
  font-lock-fontify-keywords-region(1 149240 nil)
  font-lock-default-fontify-region(57 557 nil)
  apply(font-lock-default-fontify-region 57 557 nil)
  #f(compiled-function (beg end &rest rest) #<bytecode 0x1cea7771e9712d09>)(57 557 nil)
  font-lock-fontify-region(57 557)
  #f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>)(font-lock-fontify-region)
  run-hook-wrapped(#f(compiled-function (fun) #<bytecode -0x154a37361c2bbd80>) font-lock-fontify-region)
  jit-lock--run-functions(57 557)
  jit-lock-fontify-now(57 557)
  jit-lock-function(57)
  redisplay_internal\ \(C\ function\)()

Having a look at those top few functions, it seems that clojure-font-lock-escaped-chars walks through the string

(defun clojure-font-lock-escaped-chars ...
  ...
  (while (and (not found)
    (re-search-forward "\\\\." bound t))
    (setq found (clojure--font-locked-as-string-p)))

For each backslach it calls clojure--font-locked-as-string-p, which calls clojure-string-start (twice if it's not a regex), which does a backwards regex search

(defun clojure-string-start ...
  ...
        ;; Find a quote that appears immediately after whitespace,
        ;; beginning of line, hash, or an open paren, brace, or bracket
        (re-search-backward "\\(\\s-\\|^\\|#\\|(\\|\\[\\|{\\)\\(\"\\)")

I'm not familar enough with this code or with font-locking to judge exactly what it's doing, but it does look to be doing something rather inefficient. Maybe clojure-font-lock-escaped-chars can precompute the end of the string, instead of checking the font-locking to see if it's iterated past the end of the string? Or does clojure--font-locked-as-string-p need to clojure-string-start check? it seems to check if it's font locked as string AND inside a string, which seems a bit redundant.

Environment & Version information

clojure-mode version

Include here the version string displayed by M-x clojure-mode-display-version. Here's an example:

;; Version: 5.13.0-snapshot

Emacs version

GNU Emacs 28.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0) of 2021-02-19

Operating system

Description:    Ubuntu 20.04.2 LTS
@yuhan0
Copy link
Contributor

yuhan0 commented Apr 17, 2021

@plexus Can you see if this fixes your issue? The function was indeed written in a strange way, I couldn't figure out why from the commit history. Emacs has an issue in general with long strings, it might not even be a clojure-mode problem.

(defun clojure-string-start (&optional regex)
  "Return the position of the \" that begins the string at point.
If REGEX is non-nil, return the position of the # that begins the
regex at point.  If point is not inside a string or regex, return
nil."
  (when (nth 3 (syntax-ppss)) ;; Are we really in a string?
    (let* ((beg (nth 8 (syntax-ppss)))
           (hash (eq ?# (char-before beg))))
      (if regex
          (and hash (1- beg))
        (and (not hash) beg)))))

@plexus
Copy link
Contributor Author

plexus commented Apr 23, 2021

This does seem to help tremendously!

To try this at home, do a cider-pprint-eval-last-sexp on

(slurp "https://www.youtube.com/playlist?list=PLZdCLR02grLoeorpcxc4vZKWjWI_O2KXf")

With this patch it remains pretty usable, without it you have to kill or SIGUSR2 emacs.

@plexus
Copy link
Contributor Author

plexus commented May 2, 2021

Thanks a lot @yuhan0 for figuring out the fix and @bbatsov for reviewing and merging! This is something I would run into occasionally every few weeks or months, and every time it made me want to throw my computer into the ocean. Super happy to see this fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants