OCR Benchmark Method

News bulletin

Sed ut perspiciatis eggs.

AIMultiple aims to help enterprises identify the right OCR for their business. These enterprises should expect to process high volumes of documents and images (ie at least tens of thousands of pages per month).

What will be the guiding principles?

AIMultiple’s benchmark methodology explains the participation requirements and principles.

Will it be benchmarked?

Extracting text in English from documents and images.

The dataset is expected to contain 500 pages:

Long format PDF documents (such as technical manuals, whitepapers, contracts) of up to 300 pages that contain text in image form. PDFs of varying legibility will be used. PDF will be collected online.
100 pages of transactional documents (such as invoices and receipts). They will be collected online and selected from the documents of AIMultiple and its partners.
100 pages of handwritten documents (such as receipts, insurance claim forms). They will be collected online and selected from the documents of AIMultiple and its partners.

In some documents, parts of the document will be digitally altered to protect PII.

How will AIMultiple benchmark perform?

AIMultiple’s OCR benchmarks aim to closely match the preferences of OCR buyers. They want a flexible, cost-effective solution. Therefore, AIMultiple will measure these metrics:

accuracy

This would be measured by cosine similarity. We will not use the Levenshtein distance because different products output text in different orders, especially in the case of multi-column text. While the Levenshtein distance takes these positional differences into account, we are interested in how accurately the text is detected, but not where it is located.

pace

Average response time and distribution of response times will be measured. A maximum of 5 seconds of data processing and transfer time will be allowed per page.

scalability

The same metrics can be tested with a certain number of simultaneous connections. This metric may be the same for all providers (i.e. simultaneous connections may not slow down processing). In such case, AIMultiple may not publish the results of this metric.

Cost

Public cost data published by vendors will be used to calculate the cost of the benchmark. Sellers’ pricing models will also be shared to help buyers compare prices of different loads.

customer service

Reviews on B2B review platforms will be analyzed to assess customer satisfaction.

How will the results be published?

They will be published on AIMultiple.com and will contain graphs that users can use to find the right salespeople for their business. The top three vendors in each of the above categories will be presented.

Each participant will receive

Detailed results for each document and page with timestamp
Average result for each document and page
dataset

Please note that AIMultiple is in the design phase of the benchmark and changes will be made once AIMultiple receives end user feedback and finalizes the benchmark.

If you would like to participate in the AIMultiple OCR benchmarks, reach out to the AIMultiple team at (email protected).

Cem has been a Principal Analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (according to SimilarWeb) every month, including 55% of the Fortune 500.

Kem’s work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms such as Deloitte, HPE, and NGOs such as the World Economic Forum and supranational organizations such as the European Commission. You can check out more reputable companies and resources that refer to AIMultiple.

Throughout his career, Cem has served as a technology consultant, technology buyer, and technology entrepreneur. He spent over a decade advising enterprises at McKinsey & Company and Altman Solon on their technology decisions. He also published McKinsey Report on Digitization.

He led the technology strategy and procurement of a telecom company, reporting to the CEO. He has also led the business growth of deep tech company Hypatos, which reached 7 digit annual recurring revenue and 0 to 9 digit valuation within 2 years. Kem’s work at Hypatos was covered by major technology publications such as TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated as a computer engineer from Bogazici University and holds an MBA from Columbia Business School.

source link

Google’s CFO just got promoted

How Google’s latest AI model is generating music from your brain activity

Easy Rider to Midnight Run, The Greatest Roadtrips Movies of All Time

Three new Starfield animated shorts offer more glimpses of Bethesda’s new universe

Some top AMD chips have a huge security flaw

What is a Linux Bash Script and How Do You Build One?

Trending Tags

World IVF Day: Infertility is a silent epidemic – why is it important to tackle fertility problems? experts tell

What is ‘duck walk’ in old age? Expert shares tips on maintaining normal mobility

Radiohead brands portfolio expands with the launch of Hustle™ energy drink. Unveiled through new campaign “Dreams are free, #HustleModeOn for everything else – Food Marketing Technology”

From Chris Gayle to Virat Kohli: Most runs scored by players in India vs West Indies ODI series

Infertility Treatment: How Ayurveda Can Help Increase Fertility? experts tell

Ishant Sharma opens up about the truth behind Zaheer Khan’s Test retirement and the allegations against Virat Kohli

Trending Tags

Google’s CFO just got promoted

How Google’s latest AI model is generating music from your brain activity

Easy Rider to Midnight Run, The Greatest Roadtrips Movies of All Time

Three new Starfield animated shorts offer more glimpses of Bethesda’s new universe

Some top AMD chips have a huge security flaw

What is a Linux Bash Script and How Do You Build One?

Trending Tags

World IVF Day: Infertility is a silent epidemic – why is it important to tackle fertility problems? experts tell

What is ‘duck walk’ in old age? Expert shares tips on maintaining normal mobility

Radiohead brands portfolio expands with the launch of Hustle™ energy drink. Unveiled through new campaign “Dreams are free, #HustleModeOn for everything else – Food Marketing Technology”

From Chris Gayle to Virat Kohli: Most runs scored by players in India vs West Indies ODI series

Infertility Treatment: How Ayurveda Can Help Increase Fertility? experts tell

Ishant Sharma opens up about the truth behind Zaheer Khan’s Test retirement and the allegations against Virat Kohli

Trending Tags

OCR Benchmark Method

Ashes 2023: Cricket Australia gives Nathan Lyon injury update, spinner likely to be out of rest of series

NFT collectors: NFT nostalgia of Snoop, The Goose attracts Gen Y to Sotheby’s

admin

NFT collectors: NFT nostalgia of Snoop, The Goose attracts Gen Y to Sotheby's

Leave a Reply Cancel reply

Recent posts

Recent News

Open Access vs. Subscription: Masa Depan Aksesibilitas Jurnal Akademik

Strategi Memilih Jurnal yang Tepat untuk Naskah Penelitian Anda