Statistical Machine Translation
DOWNLOAD
Download Statistical Machine Translation PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Statistical Machine Translation book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages. If the content not found or just blank you must refresh this page
Statistical Machine Translation
DOWNLOAD
Author : Philipp Koehn
language : en
Publisher: Cambridge University Press
Release Date : 2009-12-17
Statistical Machine Translation written by Philipp Koehn and has been published by Cambridge University Press this book supported file pdf, txt, epub, kindle and other format this book has been release on 2009-12-17 with Computers categories.
The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.
Syntax Based Statistical Machine Translation
DOWNLOAD
Author : Philip Williams
language : en
Publisher: Morgan & Claypool Publishers
Release Date : 2016-08-01
Syntax Based Statistical Machine Translation written by Philip Williams and has been published by Morgan & Claypool Publishers this book supported file pdf, txt, epub, kindle and other format this book has been release on 2016-08-01 with Computers categories.
This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.
Linguistically Motivated Statistical Machine Translation
DOWNLOAD
Author : Deyi Xiong
language : en
Publisher: Springer
Release Date : 2015-02-11
Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and has been published by Springer this book supported file pdf, txt, epub, kindle and other format this book has been release on 2015-02-11 with Language Arts & Disciplines categories.
This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.
Improving Statistical Machine Translation Through Adaptation And Learning
DOWNLOAD
Author : Carlos A. Henriquez Q.
language : en
Publisher:
Release Date : 2014
Improving Statistical Machine Translation Through Adaptation And Learning written by Carlos A. Henriquez Q. and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2014 with categories.
With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributions made to the SMT field that can be found in this Ph.D. thesis are: - We defined a similarity function that compares an MT system output with a translation reference for that output and align the errors made by the system with the correct translations found in the reference. This information is then used to compute an alignment between the original input sentence and the reference. - We defined a method to perform domain adaptation based on the alignment mentioned before. Using this alignment with an in-domain parallel corpus, we extract new translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the correct translations found in the reference. These new units are then scored and combined with the units in the original system in order to improve its quality in terms of both human an automatic metrics. - We succesfully applied the method in a new task: to improve a SMT translation quality using post-editions provided by real users of the system. In this case, the alignment was computed over a parallel corpus build with post-editions, extracting translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the corrections found in the feedback provided. - The method proposed in this dissertation is able to achieve significant improvements in translation quality with a small learning material, corresponding to a 0.5% of the training material used to build the original system. Results from our evaluations also indicate that the improvement achieved with the domain adaptation strategy is measurable by both automatic a human-based evaluation metrics.
Statistical Machine Translation
DOWNLOAD
Author : Philipp Koehn
language : en
Publisher: Cambridge University Press
Release Date : 2010
Statistical Machine Translation written by Philipp Koehn and has been published by Cambridge University Press this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with Computers categories.
The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.
Data Selection For Statistical Machine Translation
DOWNLOAD
Author : Amittai Axelrod
language : en
Publisher:
Release Date : 2014
Data Selection For Statistical Machine Translation written by Amittai Axelrod and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2014 with categories.
Machine translation, the computerized translation of one human language to another, could be used to communicate between the thousands of languages used around the world. Statistical machine translation (SMT) is an approach to building these translation engines without much human intervention, and large-scale implementations by Google, Microsoft, and Facebook in their products are used by millions daily. The quality of SMT systems depends on the example translations used to train the models. Data can come from a variety of sources, many of which are not optimal for common specific tasks. The goal is to be able to find the right data to use to train a model for a particular task. This work determines the most relevant subsets of these large datasets with respect to a translation task, enabling the construction of task-specific translation systems that are more accurate and easier to train than the large-scale models. Three methods are explored for identifying task-relevant translation training data from a general data pool. The first uses only a language model to score the training data according to lexical probabilities, improving on prior results by using a bilingual score that accounts for differences between the target domain and the general data. The second is a topic-based relevance score that is novel for SMT, using topic models to project texts into a latent semantic space. These semantic vectors are then used to compute similarity of sentences in the general pool to the target task. This work finds that what the automatic topic models capture for some tasks is actually the style of the language, rather than task-specific content words. This motivates the third approach, a novel style-based data selection method. Hybrid word and part-of-speech (POS) representations of the two corpora are constructed by retaining the discriminative words and using POS tags as a proxy for the stylistic content of the infrequent words. Language models based on these representations can be used to quantify the underlying stylistic relevance between two texts. Experiments show that style-based data selection can outperform the current state-of-the-art method for task-specific data selection, in terms of SMT system performance and vocabulary coverage. Taken together, the experimental results indicate that it is important to characterize corpus differences when selecting data for statistical machine translation.
Use Of Source Language Context In Statistical Machine Translation
DOWNLOAD
Author : Rejwanul Haque
language : en
Publisher: LAP Lambert Academic Publishing
Release Date : 2012-02
Use Of Source Language Context In Statistical Machine Translation written by Rejwanul Haque and has been published by LAP Lambert Academic Publishing this book supported file pdf, txt, epub, kindle and other format this book has been release on 2012-02 with categories.
The translation features typically used in state-of-the-art statistical machine translation (SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear phrase-based SMT (PB-SMT) and hierarchical PB-SMT (HPB-SMT), and can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this book we present novel approaches to incorporate source-language contextual modelling into the state-of-the-art SMT models in order to enhance the quality of lexical selection. We investigate the effectiveness of use of a range of contextual features, including lexical features of neighbouring words, part-of-speech tags, supertags, sentence-similarity features, dependency information, and semantic roles. We explored a series of language pairs featuring typologically different languages, and examined the scalability of our research to larger amounts of training data.
Using Linguistic Knowledge In Statistical Machine Translation
DOWNLOAD
Author : Rabih Mohamed Zbib
language : en
Publisher:
Release Date : 2010
Using Linguistic Knowledge In Statistical Machine Translation written by Rabih Mohamed Zbib and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2010 with categories.
In this thesis, we present methods for using linguistically motivated information to enhance the performance of statistical machine translation (SMT). One of the advantages of the statistical approach to machine translation is that it is largely language-agnostic. Machine learning models are used to automatically learn translation patterns from data. SMT can, however, be improved by using linguistic knowledge to address specific areas of the translation process, where translations would be hard to learn fully automatically. We present methods that use linguistic knowledge at various levels to improve statistical machine translation, focusing on Arabic-English translation as a case study. In the first part, morphological information is used to preprocess the Arabic text for Arabic-to-English and English-to-Arabic translation, which reduces the gap in the complexity of the morphology between Arabic and English. The second method addresses the issue of long-distance reordering in translation to account for the difference in the syntax of the two languages. In the third part, we show how additional local context information on the source side is incorporated, which helps reduce lexical ambiguity. Two methods are proposed for using binary decision trees to control the amount of context information introduced. These methods are successfully applied to the use of diacritized Arabic source in Arabic-to-English translation. The final method combines the outputs of an SMT system and a Rule-based MT (RBMT) system, taking advantage of the flexibility of the statistical approach and the rich linguistic knowledge embedded in the rule-based MT system.
Machine Translation With Minimal Reliance On Parallel Resources
DOWNLOAD
Author : George Tambouratzis
language : en
Publisher: Springer
Release Date : 2017-08-09
Machine Translation With Minimal Reliance On Parallel Resources written by George Tambouratzis and has been published by Springer this book supported file pdf, txt, epub, kindle and other format this book has been release on 2017-08-09 with Computers categories.
This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.
Creating A Strong Statistical Machine Translation System By Combining Different Decoders
DOWNLOAD
Author : Ayah ElMaghraby
language : en
Publisher:
Release Date : 2018
Creating A Strong Statistical Machine Translation System By Combining Different Decoders written by Ayah ElMaghraby and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2018 with Neural networks (Computer science) categories.
Abstract: Machine translation is a very important field in Natural Language Processing. The need for machine translation arises due to the increasing amount of data available online. Most of our data now is digital and this is expected to increase over time. Since human manual translation takes a lot of time and effort, machine translation is needed to cover all of the languages available. A lot of research has been done to make machine translation faster and more reliable between different language pairs. Machine translation is now being coupled with deep learning and neural networks. New topics in machine translation are being studied and tested like applying neural machine translation as a replacement to the classical statistical machine translation. In this thesis, we also study the effect of data-preprocessing and decoder type on translation output. We then demonstrate two ways to enhance translation from English to Arabic. The first approach uses a two-decoder system; the first decoder translates from English to Arabic and the second is a post-processing decoder that retranslates the first Arabic output to Arabic again to fix some of the translation errors. We then study the results of different kinds of decoders and their contributions to the test set. The results of this study lead to the second approach which combines different decoders to create a stronger one. The second approach uses a classifier to categorize the English sentences based on their structure. The output of the classifier is the decoder that is suited best to translate the English sentence. Both approaches increased the BLEU score albeit with different ranges. The classifier showed an increase of ~0.1 BLEU points while the post-processing decoder showed an increase of between ~0.3~11 BLEU points on two different test sets. Eventually we compare our results to Google translate to know how well we are doing in comparison to a well-known translator. Our best translation machine system scored 5 absolute points compared to Google translate in ISI corpus test set and we were 9 absolute points lower in the case of the UN corpus test set..