The Montclair Electronic Language Learner Database
(MELD)

 

Two Paradigms in Language Modeling
1960-1985

o Language Understanding
— based on deduction of models from grammaticality judgments
— theory-driven, rule-based
o Speech Recognition
— based on induction of models from performance data
— empirically-driven, statistically-based

1985-1995
Paradigm Shift in Language Understanding

o Rule-based models were fragile - not readily extensible to new data
o Statistical models were more robust
o Data storage capacity increased
o Electronic texts became available
o Awareness of writing and speaking as performance tasks developed
o Data began to be collected into statistically representative corpora

Corpus Development

o Language and speech corpora
— Text: Birmingham, LOB, BNC, . . .
— Speech: TIMIT, Resource Management, ATIS, Broadcast News, . . .
o Data collection agencies
— ECI, LDC
o Standardization efforts
— TEI, LDC
o Automatic Annotation tools
— part-of-speech taggers, parsers, markup

Corpus Attributes

o Data - representative of the language being modeled, usually the standard language of the native speaker (NS)
o Size - up to 100 million for a balanced corpus
o Annotation - based on tagging a representative subset of the corpus by hand
— Part-of-Speech (POS), Syntactic Structure, Discourse Markup, Speech Segment, prosodic features

Non-native speaker (NNS) corpora

o Begun in 1992
o Data
— written performance only
— essays of students of English as a foreign language
o Corpus development (academic)
— in Europe: Louvain, Lodz, Uppsala
— in Asia: Tokyo Gakugei University, Hong Kong Univ of Science and Technology
o Annotation
— Lodz: part of speech
— HKUST, Lodz: error tags

Gaps in NNS Corpus Creation

o No NNS Corpus in America, so no corpus of English as a Second Language (ESL)
o No NNS corpus is publicly available
o No NNS corpus annotates errors without a predetermined list of error types

MELD Goals

o Initial Goals
— Publicly available NNS data
— ESL student writing
— tagged for error
o Initial Goals support
— 2nd language pedagogy
— Language acquisition research
— tool building (grammar checkers, student editing aids, parallel texts from NS and NNS)

MELD Overview

o Data
— 44477 words of text annotated
— 53826 more words of raw data
— language, education data for each student author
— upper level ESL students
o Tools written to
— link essays to student background data
— produce an error-free version from tagged text
— allow fast entry of background data

Annotation

o Annotators "reconstruct" a grammatical form
{error/reconstruction}

school systems {is/are}
since children {0/are} usually inspired
becoming {a/0} good citizens

o Agreement between annotators is an issue

Error Classification from a Predetermined List

o Benefit
— annotators agree on what an error is: only those items in the classification scheme
o Problems
— annotators have to learn a classification scheme
— the existence of a classification scheme means that the annotators can misclassify
— errors not in the scheme will be missed

Error Identification & Reconstruction

o Benefits
— speed in annotating since there is no classification scheme to learn
— no chance of misclassifying
— less common errors will be captured
— a reconstructed text can be more easily parsed and tagged for part of speech
o Question
— How well can we agree on what is an error?

Agreement Measures

o Recall: How well does the performance of the "non-expert" match that of the "expert"? (What did the non-expert miss?)
o Precision: What percentage of the "non-expert's" tags are accurate?
o Reliability: What percentage of the errors do both taggers tag?

Agreement Measures
J&L
Essay
Recall
Precision
Reliability
1-10
.54
.58
.39
11-22
.57
.78
.49
J&N
Essay
Recall
Precision
Reliability
1-10
.58
.48
.23
11-22
.37
.54
.27
L&N
Essay
Recall
Precision
Reliability
1-10
.65
.70
.37
11-22
.60
.78
.36

Conclusions on Tagging Agreement

o Unsatisfactory level of agreement as to what is an error
o Disagreements are now resolved through regular meetings
o Tagging criteria (though not a list of error types) are worked out there
o Planned: test to determine if accuracy improves when taggers are shown a partially cleaned text

The Future

o Immediate
— Internet access to data and tools
— an error concordancer
— statistical tool to correlate error frequency with student background
— automatic part of speech and syntactic markup
— data from different ESL skill levels
o Long Range
— student editing aid
— grammar checker
— NNS speech data

Acknowledgments

Jacqueline Cassidy
Jennifer Higgins
Norma Pravec
Lenore Rosenbluth
Donna Samko
Jory Samkoff