Enrique Nell and Joaquin Ferrero - Spanish Localization of the Perl Core Documentation
Title: Spanish Localization of the Perl Core Documentation
Name: Enrique Nell and Joaquin Ferrero
Grant Manager: not yet assigned
Duration: 6 months
Approved: May 2012
Amount Requested: $2,000
Spanish is the third most commonly used language on the Internet after English and Mandarin. It is also the second most studied language and second language in international communication, after English, in the world. Currently, there are 400 million native speakers and Spanish is the official language in 21 countries. However, the number of contributions to CPAN from the Spanish-speaking community is much lower than expected, considering these figures.
Our goal is to translate the Perl core documentation into Spanish, in order to make it available to a wider public through the POD2::ES distribution. In this process, we are using and developing sustainable procedures that reuse previous translations and provide a quick update for each new Perl release.
We are requesting a grant to boost our work on POD2::ES.
Benefits to the Perl Community
The availability of translated Perl documentation will bring more Perl programmers to the community and will increase the number of CPAN contributions.
The tools and procedures developed for this project can be used to translate Perl into other languages.
The resulting materials (e.g., translation memories, glossaries, style guides) can be used as a starting point for related projects, like the translation of Perl books, Perl 6 docs and the documentation of CPAN modules into Spanish.
- Increase of the percentage of translated & reviewed documents, targeting 60% of translated docs and 25% of reviewed docs (current figures are 43% translated & 3% reviewed)
- Documented procedures and tools that can be reused in other projects.
At the time of this writing, the latest version of Perl 5 is Perl v5.14.2. Its documentation is comprised of 189 documents, with a global word-count of 924,435 words. This translation volume, at a typical freelance translation rate (much cheaper than that of a translation agency), would cost well over 120,000 EUR (not counting tasks like DTP, project management, etc.), and it would take approx. 2 man-year (including revision).
Since this is volunteer-work, it's not as fast-paced as it would be desirable (after 16 months we have reached a translation status of more than 40% of the total documentation), but it is the best you will get while waiting for a real improvement of the available machine translation technology.
We use Computer-Assisted Translation (i.e., translation memory) technology since the beginning of the project, having in mind project sustainability and reusability: Each time a new Perl version is released, translators update the pod files and only have to work on new/changed strings. This reuse strategy ensures that the translated documentation will follow closely the Perl English documentation as it evolves. We use the Perl version numbering for each release, to state unambiguously the correspondence of the version of the original documents and that of the translated documents.
After evaluating several tools, we finally decided to use OmegaT, a convenient CAT tool that is actively developed, but we also follow current industry standards (e.g. TMX, the standard translation memory format), so contributing to the project does not require using a particular tool.
We have split the documentation in core documents on one hand, and perldeltas & readmes on the other, to give priority to the most popular documents.
The published POD2::ES distributions only include fully revised documents, which can be viewed using the following command:
perldoc -L ES 'document'
Translated (and unpublished) documents are available in the project's github repository: PerlDoc-ES at Github https://github.com/zipf/perldoc-es/.
Back in 2006, Joaquín Ferrero was involved in a previous effort that was later abandoned, as many other attempts for different languages. During YAPC::EU 2009 in Lisbon, Enrique Nell proposed relaunching the project. The authors of the present grant application met with the goal of launching a translation project of the Perl core documentation, and kept discussing the idea for some time.
The first release (5.12.3.01) of POD2::ES was published on CPAN in February 4, 2011. On July 16th of that same year, we released the first 5.14.1 version of POD2::ES, one month after the release of Perl 5.14.1, after updating the translated documents to the new Perl version. The first 5.14.2 version was released on October 6, 2011, only 10 days after the release of Perl 5.14.2. For each version upgrade, we were able to reuse easily the work done for previous versions.
Current status: 42% translated. The statistics are available in the following public Google Docs spreadsheet: PerlDoc-ES.Traducción
As long as new Perl versions are released, the project will be alive. For each new Perl version, the corresponding translation percentage will be higher (after the update process, of course).
Reaching the percentages mentioned above will take ~6 months (rough estimate).
Currently, we are 12 to 24 months behind the source (English) documentation, but two new members joined the team recently and we expect to increase the speed in the coming months.
Enrique Nell (aka zipf, aka @blasgordon) has a degree in Physics from Universidad Autónoma de Madrid, but has been working in the software localization industry since 1994. His main interests are natural language processing, data mining and statistics. He has contributed several modules to CPAN and regularly attends Perl events. Enrique translated Act into Spanish and he is the current maintainer of the Spanish translations of Padre and Kephra. He also contributed to Google Code-in 2011 as a mentor for translation tasks issued by The Perl Foundation.
Joaquín Ferrero (aka explorer) studied Software Engineering at Universidad de Valladolid. He has been using Perl since 2003, while working as a programmer in companies and public organizations. During these years he has reported bugs in several CPAN modules. He attends regularly Madrid.pm meetings. Since 2005 Joaquín is the main moderator of the PerlenEspanol.com website, a forum that provides support to the worldwide Spanish-speaking Perl community. Back in 2006 he was a member of the second attempt of translating the Perl documentation into Spanish (the perlspanish project hosted on SourceForge, now abandoned). During YAPC::EU 2009, Joaquín joined Enrique Nell's BOF to kick-off a new PerlDoc-ES project.
Manuel Gómez received his MS degree in Computer Science in 1991 from Universidad Politécnica de Madrid. After over 10 years of professional experience in Research and Development departments, he received a PhD in Computer Science in 2002 from Universidad Politécnica de Madrid. He is now an Associate Professor of Computer Science at Universidad de Granada. His research interests are probabilistic graphical models and decision analysis. Some of the journals where he has published his research papers are Computers and Operations Research, European Journal of Operational Research, Statistics & Computing, International Journal of Approximate Reasoning, Medical Decision Making, Decision Support Systems and Omega. His teaching interests include Programming Fundamentals, Simulation Systems, Data Mining and Bayesian Networks inference algorithms.