Worldwide Trillions of Dollars in value are lost in PDFs

Jesper Dramsch business, hugging face, natural language processing, spacy, read in 2 minutes

We pay thousands for a 300-page report that is at best skimmed.

Every company does it internally or externally!

Here's how you unlock that $$$ for free.

(With a sprinkle of Python, Spacy and Hugging Face)

📚 PDFs are for archives

They're terrible for search or knowledge management.

Tables get buried, images embedded, and if your consultant is extra salty they send you a scanned PDF!

How do we go from that, to being able to ask an actual human question and getting an answer?

🔥 Deep learning magic to parse PDFs

No PDF looks like the other. Especially scientific PDFs can be super complex.

With Layout Parser you can leverage deep learning to extract text, images, and even tables from PDFs!

Worth it just for the tables!!

Layout Parse

🔬 Summarize that text!

We can use modern Natural Language Processing to summarize documents!

Going from a 300-page beefy boi to a nice skimmable one-pager? Sounds like a win to me!

Here's a great intro using Spacy on Analytics Vidhya:

Text Summarization using Spacy

🦜 Ask an AI questions?

While summaries are great, how cool would the next step be?!

Ask a question and get the answer, maybe even corresponding images, data, or the source PDF?

Use 🤗 Hugging Face transformers to extract answers from documents!

Hugging Face Question Answering

🔀 Let computers figure some things out!

In Natural Language Processing there's a way to extract information about people, places, and even companies!

"🤗 give me the documents about our building site in New York!"

and skip the manual tagging!

Hugging Face Named Entity Recognition

🖥️ Then make it available!

Run a prototype on a few documents to see how it works. Then slowly expand it.

This type of knowledge extraction on reports, documents, and your companies PDFs is pure gold.

Then show people and host it for everyone!

Spacy Projects

Conclusion

Parse your dusty documents
Use Spacy or 🤗 Hugging Face for NLP
Summarize each 300-page doc
Assign tags to each document automagically
Build a Q&A AI for your colleagues

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

Worldwide Trillions of Dollars in value are lost in PDFs

Worldwide Trillions of Dollars in value are lost in PDFs

Here's how we unlock that value for free.

📚 PDFs are for archives

🔥 Deep learning magic to parse PDFs

🔬 Summarize that text!

🦜 Ask an AI questions?

🔀 Let computers figure some things out!

🖥️ Then make it available!

Conclusion

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure

Related Articles

How to get Involved in Offensive LLM Security

How does generative AI change in 2024?

My 2023 in Review: The year of AI

Preparing customGPTs for the GPT store

This website stores data such as cookies to enable site functionality including analytics through Google Analytics and personalization. By using this website, you automatically accept that we use cookies.

Read more...

Worldwide Trillions of Dollars in value are lost in PDFs

Worldwide Trillions of Dollars in value are lost in PDFs

Here's how we unlock that value for free.

📚 PDFs are for archives

🔥 Deep learning magic to parse PDFs

🔬 Summarize that text!

🦜 Ask an AI questions?

🔀 Let computers figure some things out!

🖥️ Then make it available!

Conclusion

Search

Contents

Recent Articles

Newsletter

Related Articles

Categories

Popular Tags

Share

© 2024 Jesper Dramsch All rights reserved.

Ethics Statement ◆ Privacy Policy ◆ Terms & Conditions ◆ Affiliate Disclosure