Hire people to do this for you.
This is a really challenging task not for a "quick answers" kind of forum. There are commercial applications for such tasks, but in general it can't be performed with 100% accuracy.
What you need:
- an OCR engine (let's suppose, that the quality of the images is good enough, and there is no handwriting) - some scanners are already making an ocr-ed layer above the scanned image
- you need one or more patterns that map text to the xml element based on position or some metadata (supposing your documents are of a limited number of type)
- you will need a document type recognition logic
- you will need a content validation logic to have a clue how good the automatic process performed
- editing a PDF is something else. If the scanned images is not ocred by the scanner, you cannot edit the images itself, you have to put the new text above the original one
But these are only the basic concepts. Such a task is really a hard one, many months of full-time shifts, and at the end you will still have special cases, when the automatic handling will not work, thus you have to add some user interaction, thus you will need user interface too.