|
Two Paradigms in Language Modeling o Language Understanding |
|
1985-1995 o Rule-based models were fragile - not readily extensible to new data |
|
Corpus Development o Language and speech corpora |
|
Corpus Attributes o Data - representative of the language being modeled, usually the standard
language of the native speaker (NS) |
|
Non-native speaker (NNS) corpora o Begun in 1992 |
|
Gaps in NNS Corpus Creation o No NNS Corpus in America, so no corpus of English as a Second Language
(ESL) |
|
MELD Goals o Initial Goals |
|
MELD Overview o Data |
|
Annotation o Annotators "reconstruct" a grammatical form school systems {is/are} o Agreement between annotators is an issue |
|
Error Classification from a Predetermined List o Benefit |
|
Error Identification & Reconstruction o Benefits |
|
Agreement Measures o Recall: How well does the performance of the "non-expert"
match that of the "expert"? (What did the non-expert miss?) |
|
Agreement Measures
|
|
Conclusions on Tagging Agreement o Unsatisfactory level of agreement as to what is an error |
|
The Future o Immediate |
|
Acknowledgments Jacqueline Cassidy |