9 comments

  • KetoManx64 1 day ago
    What's the performance like compared to tesseract? I don't see tesseract mentioned anywhere in the readme, which is surprising considering that's the number one tool most go to for Image > text OCR.
    • mrkn1 1 day ago
      No rigorous eval, and I love Tesseract. Here's the example that motivated me to build textsnap (which is in the github's README), parsed with Tesseract:

      https://imgur.com/a/i2eQra8

      • KetoManx64 1 day ago
        Very noticable difference and the exact issue I run repeatedly with tesseract! Definitely going to try dropping textsnap into my scripts now. Thanks!!
  • lavaman131 1 day ago
    This is awesome! Been needing something like this for some research paper diagrams I've been indexing.
  • abstract257 2 days ago
    Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...
    • krunck 2 days ago
      I had to extract the image from a PDF for it to work. Then run it on each page image extracted.
  • vivzkestrel 2 days ago
    - how well do you think this ll work with code? i mean take code screenshots and convert it into actual code for vscode
    • mrkn1 2 days ago
      Just ran

        textsnap "https://i.ytimg.com/vi/LBNDfxjEYlA/maxresdefault.jpg"
      
      and got this

        $('.count').each(function () {
        $('this').prop('Counter', 0).animate({
          Counter: $('this').text()
        }, {
            duration: 4000,
            easing: 'swing',
            step: 'function (now) {
                $('this").text(Math.ceil(now));
            }
          }); 
        });
  • monosma 2 days ago
    What was the reason for adopting PaddleOCR? Can other OCR models be used as well?
    • mrkn1 2 days ago
      No reason other than their Q4 model working reasonably well and fast on my CPU laptop. Should work with any ONNX VLM model
  • kouru225 2 days ago
    Roman alphabet only or does this work with other alphabets?
    • mrkn1 2 days ago
      109 languages, including other alphabets.
  • garrett2558 2 days ago
    Very cool, I'm building my own local-first product as well
    • mrkn1 2 days ago
      thank you! what is it about?
  • BIGFOOT_EXISTS 2 days ago
    Now this is legit cool, keep up the great work.
    • mrkn1 2 days ago
      thank you!
  • nabertronic 51 minutes ago
    [dead]

Data from the public Hacker News API

UI based on nuxt/hackernews under the MIT License. Copyright © Yuxi (Evan) You & Nuxt core team