Google Cloud Vision API full_text_annotation picking up "23" beside "#" symbol since March 7th

22 views Asked by At

I noticed that as early as I could observe of March 7th, Google Vision API has been picking up the non-existent symbols "23" either to the left or to the right of "#" symbols. For example, in the below image, it picks up the text "HIGHWAY # 236 NORTH" instead of just "HIGHWAY # 6 NORTH" which was the case before. The only correlation between "#" and "23" I could think of is that the # sign is U+0023 in Unicode, but in the past when Unicode characters were returned it was never returned this way.

enter image description here

This is confirmed when I look at the returned object in the google-cloud-vision Python API. I could reproduce this problem on both the API version I was using as well as the latest 3.7.2 version:

... 
symbols {
      bounding_box {
        vertices {
          x: 166
          y: 7
        }
        vertices {
          x: 178
          y: 7
        }
        vertices {
          x: 178
          y: 28
        }
        vertices {
          x: 166
          y: 28
        }
      }
      text: "#"
      confidence: 0.431674063
    }
    confidence: 0.431674063
  }
  words {
    property {
      detected_languages {
        language_code: "en"
        confidence: 1
      }
    }
    bounding_box {
      vertices {
        x: 165
        y: 7
      }
      vertices {
        x: 201
        y: 7
      }
      vertices {
        x: 201
        y: 28
      }
      vertices {
        x: 165
        y: 28
      }
    }
    symbols {
      bounding_box {
        vertices {
          x: 165
          y: 7
        }
        vertices {
          x: 178
          y: 7
        }
        vertices {
          x: 178
          y: 28
        }
        vertices {
          x: 165
          y: 28
        }
      }
      text: "2"
      confidence: 0.822502255
    }
    symbols {
      bounding_box {
        vertices {
          x: 165
          y: 7
        }
        vertices {
          x: 183
          y: 7
        }
        vertices {
          x: 183
          y: 28
        }
        vertices {
          x: 165
          y: 28
        }
      }
      text: "3"
      confidence: 0.882505536
    }
...

I know that according to the release notes they announced they are switching over to a new OCR model 90 days after December 5th (https://cloud.google.com/vision/docs/release-notes) which is March 5th and the first time I noticed this problem was on the 7th so maybe that had something to do with it.

Does anyone else observe this problem?

1

There are 1 answers

1
Nestor On

Aside from versioning issue another possibility is quality of the image fed to the API. Either way I would recommend to report this issue.You can report this as an Issue tracker for engineers to see these changes in result post update.

Issue tracker and FR Page: https://cloud.google.com/support/docs/issue-trackers#feature_requests

Create issue: https://issuetracker.google.com/issues/new?component=187174&template=0