Quoting from EoM talk: "This project is based on an electronic version of the 'Encyclopaedia of Mathematics', published by Kluwer Academic Publishers until 2003, and by Springer after that. The encyclopaedia goes back to the Soviet Matematicheskaya entsiklopediya (1977), originally edited by Ivan Matveevich Vinogradov. The electronic version had its formulae written in TEX, which were saved as png images. On its way through the various publishers the original TEX source code was lost, therefore, to edit a formula in one of these original pages requires to retype the code for that formula from scratch.
For the project, it will be of big help to transcribe the old pages. To make this easy, it was decided to use MathJax, which allows to use Plain TEX or LATEX for formulae encoding. "
Currently, there are about 270'000 images of formulas whoms LaTeX source code has been lost. Many of these images are duplicates (see User:Maximilian Janisch/latexlist/duplicates), making classification easier. There are services such as Mathpix which automatically transform the images back to TEX code. However, these translations are not infallible (see User:Maximilian Janisch/latexlist/latex).
Currently, I have classified all 270'125 images of this Encyclopedia into 103'285 classes of duplicates (some images appear hundreds of times, others just once). My goal is to translate the 103'285 images back to TEX code (remark: some of the images are not formulas but graphics of other types, so I won't re-encode those of course) with the help of Mathpix, and then replace them by the TEX code in the corresponding articles.
Ulf Rehmann found the original Nroff codes for about four fifths of the non-texified articles. They can be translated to LaTeX automatically with very few to no errors. So we are left with only 23'000 classes of images to translate back "manually".
Maximilian Janisch/latexlist. Encyclopedia of Mathematics. URL: http://www.encyclopediaofmath.org/index.php?title=Maximilian_Janisch/latexlist&oldid=44820