E-Learning of Second Language Speaking Skills
Faculty of Education and Social Work,
The continual growth of information and communication technologies (ICT) has much facilitated online learning. Second language (L2) learners can easily get access to huge amount of relevant online resources for free and commercial courses as well. Within formal training programs, online second language teachers keep trying to develop and improve their students’ comprehensive skills of the target language. However, not all skills of a language can be taught online easily. When teaching languages at a distance, one of the main challenges is the development and practice of speaking skills (Hampel, 2003). On the other hand, learners often feel more confident and can take more risks and trials when they practise speaking using computers in a private workspace than in a face-to-face setting such as a real classroom or real-life communication situation (Gong, 2002; Kataoka, 2000). The gap between the affordance of online oral second language teaching and the demand from learners is expected to be further bridged.
Hence, under the mechanical-meaningful-communicative framework (Paulston,
The theoretical framework adopted in this study is based on classification of language learning. One example is the structural pattern drills for language teaching raised by Paulston (1971a, 1971b)—mechanical drills, meaningful drills and communicative drills. This framework helps language teachers organize their instruction according to different grades, stages and periods with corresponding objectives. As for beginners, teachers are advised to use mechanical drills, in which there is complete control of the response and only one correct way of responding. The ability to practice mechanical drills without necessarily understanding them is an important criterion in distinguishing them from meaningful drills. In a meaningful drill, there is still control of the response although it may be correctly expressed in more than one way. The teacher always knows what the student ought to answer. The main difference between a meaningful drill and a communicative drill is that in the latter the speaker adds new information about the real world. The expected terminal behavior in communicative drills is normal speech for communication or, if one prefers, the free transfer of learned language patterns to appropriate situations.
For the purpose of computer assisted language learning (CALL), Pennington (1989, 1996) further defines spoken language’s competence with “mechanical aspect” and “meaningful aspect”. The mechanical aspect of speech involves learning to discriminate and produce sounds of a language and tie these together prosodically in fluent strings of sounds comprising syllables, words, phrases and longer utterances or articulation and decoding of individual sounds (phonemes), while meaningful aspect involves learning to build as well as to decompose grammatically coherent utterances and to tie these to communicative functions according to rules of pragmatic appropriateness in a given speech community.
Pennington’s framework focuses on speech itself regardless the sequencing structure of learning or teaching a language. Mechanical aspect and meaningful aspect can be used separately to explain learners’ speaking level. For example, some L2 learners may know how to communicate appropriately but their pronunciation or fluency could be awkward. However, one of the characteristics of Paulston’s communicative drills—expecting speakers to add new information from their real world, is still worthwhile for reexamining online L2 speaking instruction. This is because Pennington’s framework “pays too little attention to the Internet” (Kisner, 1997, p. 13), but the development of information and communication technologies has increased the affordance of online environment for learn-teacher and learner-learner communication similar to the real world. Hence, in this study, I add a “communicative aspect” to Pennington’s framework, which helps further articulate the recent empirical studies on L2 speaking online instruction.
Recent Studies on L2 Speaking Online Instruction
Pronunciation has dominated the mechanical aspect of L2 speaking instruction with computer assisted. Pennington (1995) reports the situation in last century that, to practice speaking, L2 learners mainly use multimedia products incorporated extensive texts, graphics, animation, audio, and digitized audio or video clips. With some software, computers can produce relatively natural speech from individual phonemes stored as digital codes that are strung together by rule as the user types on the keyboard. This ‘synthesis-by-rule’ technology has the advantage that it can convert any text to speech thus enabling learners to gain exposure to a diversity and quantity of input. In such a mode, the computer keeps outputting but not “listens” to the learners.
Later in 1990s, automatic speech recognition (ASR) had been developed to the point where it could be used in language learning applications. Students began to be able to “talk” with their computers. While the American company Syracuse and the French company Auralog both began to employ this technology to design software for computer assisted pronunciation training (CAPT), Ordinate Corporation used ASR to evaluate students’ spoken English by means of its 10-minute PhonePass test administered by computer over the telephone. However, the product later has been proven failed to reflect students’ real pronunciation level (Hincks, 2001).
On the other hand, speech-recognition-based language learning programs were also evaluated. Hincks (2002) investigated whether such program would improve the general goodness of pronunciation. Eleven students were given a copy of the program Talk to Me by Auralog as a supplement to a 200-hour course in Technical English, and were encouraged to practice on their home computers. But the result is that such pronunciation training using ASR-based language learning software did not demonstrably improve the mean pronunciation abilities of the students. However, results from the PhonePass test indicate that use of the program was beneficial for the students who began the course with an ‘intrusive’ foreign accent.
You may notice that the same researcher (Hincks, 2001, 2002) has used a proven suspicious tool—PhonePass to implement the pre- and post-tests and then concludes that Talk to Me does not work for intermediate students. This result could be further suspicious.
Although using ASR for evaluation has to bear the risk of inaccuracy, its evaluation function kept being developed. But this time, evaluation and instruction have been integrated, and the role evaluation turned to assist instruction. MyET is an example of such development. It is a web-based program employing automatic speech analysis system (ASAS) to identify the words spoken into the recording device, and it can analyze the speech on pronunciation, pitch, timing and emphasis. It then displays the spectrum and contour of the user’s utterance, and provides a scoring mechanism with corrective feedback information that helps users to improve their pronunciation. MyET can explicitly pinpoint learners’ pronunciation errors by giving one on one feedback that compares the learner’s pronunciation with a model pronunciation (L-Labs, 2007).
Chen’s (2004) study on college students who used MyET found significant positive correlations between machine scorings and human graders. He suggested that subjects with different levels of language proficiency should be invited to further test the scoring validity of MyET. Tsai (2006) accepted the suggestion and continued the study. The result is that MyET can only distinguish between beginning and higher level learners. On the other hand, not much difference was found between the scores for intermediate and advanced learners. This conclusion is similar to Hincks’s (2002) finding on Talk to Me. In order to solve this problem, Tell Me More’s “individual package”, a later edition of Talk to Me, now provides three-level solution—beginner, intermediate and advanced. Each one allows learners to alter the various elements of the program to match their individual levels closely (Auralog, 2007).
ASR-based CAPT systems now are widely developed by different institutes around the world, but in common, their typical functions can be described with a sequence of five phases: Speech recognition, Scoring, Error detection, Error diagnosis and Feedback presentation (Neri, Cucchiarini, & Strik, 2003). But the first two have much criticism. Speech recognition accuracy is only good for native speakers (90% accuracy), but it performs much less well for non-native speakers. Therefore its application in L2 learning environment leaves suspicious (Coniam, 1999; Derwing, Munro, & Carbonaro, 2000), especially when such mechanical drill demands high accuracy since it supposes only one correct response (Paulston, 1971a, 1971b). Scoring system is based on the comparison between native speakers’ and learners’ utterances, but two utterances with the same content that may both be very well pronounced still have the waveforms that are very different from each other. So the scores are always confusing, especially when the results indicate that advanced learners even have less scores than lower level learners (Reesner, 2002; Tsai, 2006). So doubts should be expressed to the pedagogical value of these types of displays (Mackey & Choi, 1998; Neri, Cucchiarini, Strik, & Boves, 2002; Wildner, 2002).
It seems that the studies above are limited in the scales of computer and software themselves—either reporting how the programs work (e.g. Neri et al., 2003; Pennington, 1995) or evaluating whether those programs really work (e.g. Chen, 2004; Coniam, 1999; Derwing et al., 2000; Hincks, 2001, 2002; Mackey & Choi, 1998; Reesner, 2002; Tsai, 2006; Wildner, 2002). Since there are many problems with those programs, why not try to go back to the original educational objective—improving leaners’ L2 speaking skills? If some technologies consumed much time and budget but failed to help learners achieve the objective, we could think about other approaches—maybe going back to traditional classroom for clues.
In terms of clues from traditional classroom, Engwall & Bälter (2007) suggest that, since human teacher – learner interaction is vastly more effective than current CAPT pedagogy, pronunciation training software may be improved by studying how feedback is distributed in the real language classroom. Then they interviewed with teachers and students and observed their activities in classroom, focusing on four aspects—when pronunciation feedback should be given, for which errors, what kind of feedback should be used, and how to promote student motivation. After comparing the feedback from traditional classroom and current CAPT programs, they bring forward a list of strategies that may be useful for CAPT and then they create a virtual teacher to test those features. The result from users’ questionnaire indicates that, the virtual tutor with 3D computer animations successfully makes the learning environment more interesting and engaging, and provides more effective feedback. This study and its suggestion provide the compensation to the drawbacks of speech recognition’s inaccuracy and error detection. While those drawbacks can not overcome from the technical perspective, we may think of integrating real teachers into virtual pronunciation classroom by using CMC technologies, such as audio-conferencing (Lamy, 2004; Volle, 2005), voice chat (Jepson, 2005) and video-conferencing (McIntosh, Braul, & Chao, 2003). However, real teachers for online language teaching may be supposed to convey higher level drills, such as meaningful drills and communicative drills, rather than mechanical drills. Hence, further studies on this kind of integration are recommended.
From meaningful aspect, online L2 learners are supposed to correctly response in more than one way but they do not need to add new information to the “class” from the real world. They should be taught to understand grammatically coherent utterances and then speak appropriately in an instructional environment (Paulston,
In terms of oral conversation for meaningful aspect, a typical mode to convey online L2 instruction is human – machine conversation (HMC). The ideal scenario of HMC would be for a learner to speak to the computer and for the computer to ‘‘understand’’ and respond in a sufficiently appropriate and native-like manner to provide good target language input. But this is not likely a realistic aim given the current state of natural language processing (Stewart & File, 2007). As Feigenbaum (2003) has observed, real difficulty lies in managing ‘‘the ‘understand’ part: the semantics that attach real-world meaning to the word-symbols, then use those meanings for knowledge organization and inference’’ (p. 33). As a result, currently the most practical way to enable the computer response correctly is pre-storing corresponding utterances in a dialogue system. Since natural language is formulaic, automatic, and rehearsed, rather than prepositional, creative, or freely generated (Fillmore, 1976), pre-stored utterances used in L2 speaking instruction can aid learners’ speaking production by lightening the processing burden and thus facilitating fluency, and increase their listening comprehension of the full message speakers wish to convey (Wray, 2000, 2002).
In Let’s Chat by Steart and File (2007), a learner hear and see the virtual tutor’s question and then select and submit one favorite from a list of responses. The virtual tutor then continues the dialogue with elaboration prompt and a brief story. This system “offers a fertile environment for the acquisition and rehearsal of L2 social conversation skills” and such practice can enhance learners’ “grasp of idiomatic, native-like modes of expression by ‘conversing’ with it, thereby achieving higher levels of confidence and fluency in subsequent natural language interactions with human partners” (p. 114). Since Let’s Chat tends to prepare learners’ information for their speaking in real life, it still does not provide learners with the opportunity to speak out. A similar web-based conversation environment CandleTalk (Chiu, Liou, & Yeh, 2007), on the other hand, employs ASR to recognize a learner’s speaking out the selection from the suggested responses. This practice can improve learners’ sociocultural ability and sociolinguistic ability that aid them to select proper speech acts based on various sociocultural factors and to control over the language forms to perform the speech acts (Cohen & Olshtain, 1994). The result of a comparative experiment shows that there is a significant difference between the pretest and posttest oral performance after learners’ using the system (Chiu et al., 2007). Because there is still no evidence to prove that Let’s Chat can improve learners’ L2 speaking performance after the practice, those L2 speaking instruction system with ASR, such as CandleTalk, seem more recommendable.
While the meaningful aspect of L2 speaking instruction expects specific response from learners, the communicative aspect of it, however, emphasizes the free transfer of learned language patterns to appropriate situations (Paulston,
Live virtual classroom (LVC), based on audio-conferencing or video-conferencing, allows us to run a structured training program in real time in which the instructors and learners are online at the same time using the Internet. Many platforms can facilitate it, such as Centra, WebEx, IBM/Lotus Sametime, InterWise, etc. The skills needed by instructors, the use of slides, the support for lecture-based instruction, and classroom-like metaphors of hand-raising, question posing and the writing on a whiteboard are example of traits that make it easy to bridge from traditional classroom to LVC (Driscoll & Carliner, 2005). LVC requires much attention on the design of effective learning (Masie & Rinaldi, 2002), especially when L2 speaking learners need to adapt themselves to a new type of oral interaction, because the oral competence in synchronous environment requires more content knowledge and procedural knowledge than in traditional classroom (Lamy, 2004). Most of L2 speaking learners believe that technical issues have negative effect on the learning experience (Hampel, 2003). However, if they met any difficulty and then tried to negotiate it with teachers and peers, they would have more L2 production (Gass & Varonis, 1994; Kramsch, 1986; Varonis & Gass, 1985). But unfortunately, resulted from the lack of non-verbal communication in online environment, most of L2 speaking learners, unless group-working in “breakout rooms” of the LVC, may have little opportunity to engage in asides or spontaneous spoken chat during their tutorials (Heins, Duensing, Stickler, & Batstone, 2007). Furthermore, by spelling out, repeating and ensuring students’ comprehension, teachers tend to control and speak more in LVC than in traditional classroom during L2 speaking instruction (ibid), which may continually limits the communicative opportunities. In the rest of the time besides the teacher’s speech, students still can not experience satisfied interaction because users’ simultaneously speaking will usually lead to a simultaneous stop with awkward silence (Hampel, 2003).
While there are so many drawbacks in LVC for L2 speaking instruction, McIntosh, Braul, & Chao (2003) turned to an asynchronous approach—Wimba Voice Board, an asynchronous virtual classroom embedded in WebCT. The teacher in it directs debates based on different dilemmatic topics and students pose their speeches to response. The study indicates that students show the greatest enthusiasm in the activities with a high level of peer-to-peer interaction and they show a preference for interaction with classmates with which they are socially comfortable. But at the same time, they also suffer from technical issues such as the poor quality of sound and computer freezing.
Besides these structured instruction, unstructured L2 speaking practice, such as voice chat with peers, is also beneficial. Englishtown, an L2 distance education website, not only gives teacher-oriented conversation classes every hour all day, but also creates a virtual community comprising different voice chat rooms for their learners’ further practice after “class”. But such voice chat room has been suggested integrating into the context of unit study and basing on the “homework” assigned from conversation class, which may make the voice chatting more engaging and informative (He, 2007).
Reflection and Implication
The mechanical-meaningful-communicative framework (Paulston,
Integration of the three aspects
Currently we seldom see any e-learning provider integrating the three aspects of L2 speaking instruction synthetically. But if learners’ different skills of L2 speaking were developed separately and there was not any continual evaluation, they would not be aware of their zones of proximal development (Vygotsky, 1978) by thinking of what has been actually developed and what could be developed potentially. So the internal relationship of the mechanical-meaningful-communicative framework for L2 speaking instruction should be further explored.
Since it is reported that there are many problems in L2 speaking virtual classrooms, L2 online teachers need to improve their comprehensive skills of ICT. Hu (2005) notices that “under supportive conditions teachers tend to shift toward student-centred instructional approaches as they increase their use of ICT” (p. 281). However, according to Heins et al. (2007), L2 teachers do tend to create a strong control environment in LVC and such teaching style is quite different from their face-to-face classrooms. Is it because they lack the so-called “supportive conditions”? Kessler’s (2007) study can demonstrate this. Since L2 teacher education has not seen dramatic increases in perceived effectiveness as technology has become more readily available, most L2 teachers have to pursue informal study on ICT outside their degree programs, and specifically, they reflect that teaching L2 speaking skills with ICT is the most difficult for them. So we should provide more support for teacher education to overcome this disadvantageous situation.
It appears that the mechanical, meaningful and communicative drills from traditional classrooms for L2 speaking instruction have become available in an online environment through the last decade by creating APR-based pronunciation and conversation training programs and synchronous and asynchronous virtual classrooms and communities. While the meaningful aspect of these seems developed well without much criticism, the mechanical and communicative aspects need to be further improved since there are quite many problems from the technological and pedagogical perspectives. Furthermore, the integration of the three aspects is recommended and L2 teachers need more supports for their speaking instruction using ICT.
Auralog. (2007). Learn a language with TELL ME MORE. Retrieved November 7, 2007, from http://www.auralog.com/us/individuals_home.htm
Chen, H. J. (2004). Automatic speech recognition and oral proficiency assessment. Paper presented at the International Conference on English Language Teaching Instruction and Assessment 2004, Taiwan.
Chiu, T.-L., Liou, H.-C., & Yeh, Y. (2007). A Study of web-based oral activities enhanced by Automatic Speech Recognition for EFL college learning Computer Assisted Language Learning, 20(3), 209-233.
Cohen, A. D., & Olshtain, E. (1994). Researching the production of second-language speech acts. In E. E. Tarone, S. M. Gass & A. D. Cohen (Eds.), Research methodology in second-language acquisition (pp. 143-156). Mahwah, NJ Lawrence Erlbaum.
Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27, 49-64.
Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34, 592-603.
Driscoll, M., & Carliner, S. (2005). Advanced web-based training strategies. San Francisco, CA: Pfeiffer.
Engwall, O., & Bälter, O. (2007). Pronunciation feedback from real and virtual language teachers. Computer Assisted Language Learning, 20(3), 235-262.
Feigenbaum, E. A. (2003). Some challenges and grand challenges for computational intelligence. Journal of the ACM, 50(1), 32-40.
Fillmore, C. J. (1976). The need for a frame semantics in linguistics. In Statistical methods in linguistics. Stockholm: Skriptor.
Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah: Erlbaum.
Gass, S. M., & Varonis, E. M. (1994). Input, interaction, and second language production. Studies in Second Language Acquisition 16(3), 283-302.
Gong, J. (2002). The Employment of CALL in Teaching Second or Foreign Language Speaking Skills.pdf. Post-Script, 3(1).
Hampel, R. (2003). Theoretical perspectives and new practices in audio-graphic conferencing for language learning. ReCALL, 15(1), 21-36.
He, W. (2007). Englishtown.com’s Efekta system: could be further improved. Retrieved November 5, 2007, from http://www.hewenchao.com/Article_Show.asp?ArticleID=252
Heins, B., Duensing, A., Stickler, U., & Batstone, C. (2007). Spoken interaction in online and face-to-face language tutorials. Computer Assisted Language Learning, 20(3), 279-295.
Hincks, R. (2001). Using speech recognition to evaluate skills in spoken English. Working Papers 49, 49(58-61).
Hincks, R. (2002). Speech recognition for language teaching and evaluating: A study of existing commercial products. Paper presented at the ICSLP 2002, Denver.
Hu, C. (2005). Teacher as designers? Rethinking of preservice teachers making multimedia learning packages. Paper presented at the ASCILITE 2005, Brisbane, Queensland.
Jepson, K. (2005). Conversations – and negotiated interaction – in text and voice chat rooms. Language Learning & Technology, 9(3), 79-98.
Kataoka, K. (2000). Computers for English Language Learning in Japanese Schools. Hokkaido, Japan: Hokkaido Sapporo Shin’ei Senior High School. (ERIC Document Reproduction Service No. ED 439 600)o. Document Number)
Kessler, G. (2007). Formal and informal CALL preparation and teacher attitude toward technology. Computer Assisted Language Learning, 20(2), 173-188.
Kisner, C. (1997). Review of The Power of CALL. Language Learning & Technology, 1, 13-14.
Kramsch, C. (1986). From language proficiency to interactional competence. Modern Language Journal, 70(4), 366-372.
L-Labs. (2007). MyET – Learn English Online. Retrieved November 7, 2007, from http://www.myet.com/MyETWeb/SubPage.aspx?CultureName=en-US&fn=MyETIntro.htm
Lamy, M.-N. (2004). Oral conversations online: redefining oral competence in synchronous environments. ReCALL, 16(2), 520-538.
Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of language acquisition (Vol. 2, pp. 413-468). New York: Academic Press.
Mackey, A., & Choi, J.-Y. (1998). “Review of TriplePlayPlus! English”. Language Learning & Technology, 2, 19-21.
Mackey, A., Perdue, S., & McDonough, K. (2000). How do learners perceive interactional feedback? Studies in Second Language Acquisition, 22(4), 471-497
Masie, E., & Rinaldi, H. (2002). Virtual classroom technology scan. Saratoga, NY: The Masie Center Presents eLearning Consortium.
McIntosh, S., Braul, B., & Chao, T. (2003). A case study in asynchronous voice conferencing for language instruction. Educational Media International, 40(1), 63-74.
Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning Learning, 15, 441-447.
Neri, A., Cucchiarini, C., & Strik, W. (2003). Automatic speech recognition for second language learning: how and why it actually works. Paper presented at the 15th ICPhS, Barcelona.
Paulston, C. B. (1971a). The Sequencing of Structural Pattern Drills. TESOL Quarterly, 5(3), 197-208.
Paulston, C. B. (1971b). Structural pattern drills: A classification. Foreign Language Annals, IV(2), 187-193.
Pennington, M. C. (1989). Applications of computers in the development of speaking and listening proficiency. In M. C. Pennington (Ed.), Teaching languages with computers: The state of the art (pp. 97-121). La Jolla, CA: Athelstan.
Pennington, M. C. (1995). The power of CALL. Houston, TX: Athelstan.
Pennington, M. C. (1996). Phonology in English language teaching : an international approach. London: Longman.
Pica, T. (1994). Research on negotiation: What does it reveal about second-language learning conditions, processes, and outcomes? Language Learning, 44, 493-527.
Reesner, T. (2002). "Tell Me More French", Software review. CALICO Journal, 19, 419-428.
Stewart, I. A. D., & File, P. (2007). Let’s Chat: A conversational dialogue system for second language practice. Computer Assisted Language Learning, 20(2), 97 – 116.
Tsai, P.-H. (2006). Bridging pedagogy and technology: User evaluation of pronunciation oriented CALL software. Australasian Journal of Educational Technology 22(3), 375-397.
Varonis, E. M., & Gass, S. M. (1985). Non-native/non-native conversations: A model for negotiation of meaning. Applied Linguistics, 6(1), 71-90.
Volle, L. M. (2005). Analyzing oral skills in voice e-mail and online interviews. Language Learning & Technology, 9(3), 146-163.
Vygotsky, L. S. (1978). Mind in Society. Cambridge: Harvard University Press.
Wildner, S. (2002). “Learn German Now! Version 8”, Software review. CALICO Journal, 20, 161-174.
Wray, A. (2000). Formulaic sequences in second language teaching: principles and practice. Applied Linguistics, 21(4), 463-489.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.