Sound Recognition in Music Education: Promising New Developments

Background. Evolution of the Music Instructional Books

Contemporary music education can be divided into two major categories: formal and informal instruction. The first is mostly associated with educational institutions, and the latter with all sorts of self-learners. The second group is often referred to as the MI (music instruction), and it covers musical instruments which are commonly used in popular music (guitars, drums, keyboards, and vocals). Both categories use printed books during different stages of their tuition process. However, if a student learning music within a formal institution can always refer to a human teacher, the case for the MI student is different – they depend heavily on all kinds of instructional books and multimedia study materials. Which, up to date, mostly do not provide feedback.

Since the first printed music 500 years ago, the concept of a music textbook has evolved from plain notes and text into packages of rich multimedia information presented through electronic channels with accompanying videos, sound examples, and backing tracks. This has been a needed improvement, as sheet music alone can be incomplete and even misleading at worst, but with the same material doubled in audio form, the probability of false interpretations decreases markedly. While the inclusion of multimedia can solve the problem that is caused by the lack of a human instructor – demonstrating and explaining – another important feature, feedback, is lost. Most of the instructional materials in informal music education are still unidirectional, which means they only provide information without responding or adapting to students’ individual needs.

In 2007 a guitar textbook entitled Guitar School – the Key to the Practical Guitar Playing’ was published by the author of this paper. The book came bundled with a companion website with an innovative twist: it was possible for students to submit recordings of them playing the exercises and have a professional teacher give feedback based on them. Over 6 years, approximately 12,000 self-learners studied this material, uploading a total of 2,601 audio recordings, all of which were given written feedback.

Introducing Automatic Feedback to the Music Education

Although the textbook enjoyed success in the local market and had decent educational outcomes, the model faced two challenges. The first was low-cost efficiency and poor scalability from the teacher’s side, which could, at least in theory, be fixed with more manpower. The second and more fundamental problem was the delay in feedback, with students usually waiting at least a day before receiving it. This is a major problem, as it has been shown in numerous cases that feedback has to be fairly immediate in order for it to have a maximum impact (van der Kleij et al. 2012), especially in learning motor skills (Shute, 2008), like playing an instrument.

Luckily, modern technology has come far enough to help alleviate this problem. The first educational tools that are capable of providing automatic feedback are on the market already, and the current state-of-the-art in the field of Music Information Retrieval (MIR) suggests that music education will soon be changed fundamentally by music analysis technologies (Dittmar et al., 2006). MIR and sound recognition were first introduced to entertainment products and then gradually started to move into the educational domain as well.

Music Education and Games

Generally, gamification is expected to aid learning processes at the school level (Lee, Hammer, 2011), at the classroom level (de Freitas, de Freitas, 2013), at a personal level (Betts, 2011), and as a connection between formal and informal education (Cassidy, Paisley, 2013). Despite the scarcity of educational research on music games, several games exist, and some of them have enjoyed great popularity. The first commercial music rhythm game (“PaRappa the Rapper”) dates back to 1996 (Dittmar et al., 2006). Since then, the genre enjoyed a fair amount of popularity in the 00s with series such as Guitar Hero and Rock Band that were based on tapping buttons on plastic toy instruments. However, in the past few years, it has become possible to play music games with actual authentic musical instruments with commercial titles like Rocksmith, BandFuse, Guitar Bots, WildChords, Songs2See, and strumProfessor

The literature on music games in education has always underlined their potential to engage students and make them practice more than they otherwise would do. In many cases, games are a very good way of fostering learning as they generally lead to increased engagement with the material, which means more time spent working on the skills (Cassidy, Paisley, 2013; Muntean, 2011). However, in the context of music education, the gaming approach may not be optimal for two distinct reasons.

The first of those is the timing of feedback. Games are based on real-time feedback, which has been shown to lead to increased rates of learning. In music education, there seems to be an important distinction between real-time (i.e., supplied during the playing) and immediate (i.e., supplied immediately after) feedback. Research on real-time visual feedback to singing and tapping exercises has shown a decrease in performance, and authors have proposed that it can possibly be attributed to the increased information processing load (Wilson et al., 2008; Sadakata et al., 2008), which is consistent with the effect of split attention in cognitive load theory (Kalyuga, Chandler, Sweller, 1999; Sweller, 2010). In games, this effect may be further exacerbated by the need to make the games visually appealing and engaging by using detailed but spurious graphics.

Another potentially questionable aspect of games is to what extent the skills learned in-game actually transfer to real-life useful skills (see Hill, 2012; Jenson, de Castell, 2008; Peppler et al., 2011; Tanenbaum, Bizzocchi, 2009). For instance, in the context of music games, optimizing the game score may potentially lead to sloppy techniques that might later need to be re-learned for the playing actually to sound musical.

Dedicated applications for learning music

Other products like Smart Music, Garage Band, Weezic, and Tonara, which also utilize sound recognition, are developed directly with a clear educational purpose in mind. Today, these systems are often limited only to either a very specific instrument or platform. Also, they all have dedicated exercise banks, which indicates that adding each new exercise probably requires a fair amount of work.

MatchMySound System (MMS)

In 2013 Kristo Käo and Margus Niitsoo started a development project to create a web application that compares two audio files – a pre-recorded etalon provided by the teacher and one recorded by a student through the application. An algorithm finds the differences between the master audio file (a preferred way to call the etalon) and the user’s audio input in two main dimensions: sound and timing. The former includes the differences in sound qualities such as pitch, intonation, and articulation, while the latter includes the correctness of rhythm, changes in tempo, and overall speed. The feedback is presented as two numerical scores along with two graphs that provide a finer level of detail. When creating an exercise, the teacher has to specify goals for both sound and timing scores. If the goals are not reached by a student, then he/she is advised to try again.

Feedback. During recording, the only thing shown to the student is the progress bar that loosely shows how far he/she is in playing the track. The main function of this is to confirm that the computer is still listening. Feedback is shown only after the system determines that the student has finished playing. This feedback model is referred to as the KR  or Knowledge of Results (evaluative feedback from an external source) and is common in the traditional classroom teaching practice (Welch et al. 2005). It also emulates the classical teaching practices in music education, where the teacher usually lets the student finish playing before giving his comments and suggestions. In MMS’s case, not providing real-time feedback is a conscious and deliberate design choice to avoid potential problems related to real-time feedback.

One of the main design considerations was to avoid giving binary judgments of correct or incorrect in favor of a more continuous approach that stresses musicality. Music education has historically been based around apprenticeships, where a student learns to imitate her teacher before setting off to develop her own style. This tool follows the same tradition by trying to match what the student is playing to what the teacher did, going beyond mere pitches and rhythms to enter the area of expressive qualities of a musical performance. As such, it just reports the musical differences between the two recordings. 

Another design consideration for the system was that it works for as many musical instruments as possible. We have currently tested it with recordings of guitars, piano, violin, flute, violoncello, lute, theorbo, Irish whistle, accordion, xylophone, and also singing. Considering the range of instruments tested, we have reason to believe it works with all pitched instruments. 

Human Teachers Versus Automatic Feedback

We have tested the reliability of our matching algorithms by comparing the scores given by the application to those given by human teachers. The following shows those two sets of scores plotted against each other. (Käo, Niitsoo, 2014).

We can see that the scores align comparatively well. The main difference is scoring the recurring mistakes. Human teachers punish only once for a repeated mistake, whereas automated feedback currently gives each mistake equal weight. As for the differences in articulation and unwanted resonating of open strings, the human experts disregarded them as unimportant for novice guitarists. If the goal of this development project were to be the best possible match between human and automatic feedback, having some of the common novice mistakes made by the teacher in the etalon files would probably have improved the match. However, this would have been wrong didactically as it would have made a wrong impression of the desired musical result and downplayed the importance of learning by imitation (Criss, 2008; Kohut, 1985; Spencer, 2011).

Can Interactive Tools Help in a Classroom?

The automatic feedback tools that I have described are firsthand inspired by the needs of an individual self-learning musician. However, many of the principles can be used in formal education as well. This, of course, requires that the applications accommodate the needs of a group lesson format.

Estonian national curriculum for upper secondary schools (2011) demands that, between grades 7 and 9, every student has to learn the basic guitar chords. This is a fairly new addition to the program and has raised several practical questions. First, the teaching is done by school music teachers who have limited experience with the guitar. Secondly, the teaching has to take place with 24 students simultaneously, i.e., in group lessons. Thirdly, not all students having a guitar at home makes giving homework impossible. To alleviate the lack of relevant didactic material for group lessons, a dedicated method book was written (Käo, 2013), which became the basis for teacher training courses organized by the Estonian Academy of Music and Theatre. The problem of homework, however, still remained.

Whether in medicine, chess, sports, or music, obtaining expert skills takes thousands of hours of deliberate practice (Ericsson, Prietula, Cokelyo, 2007). Even without the ultimate goal of becoming an expert on a musical instrument, there is really no substitution for a certain amount of repetition (Lehmann, Gruber, 2006). Since practicing at home is impossible for many students, the main question is how to get 11-16-year-old students to practice effectively in the classroom in the group lesson context for the entirety of 45 minutes.

This is clearly no easy task. When investigating the practicing habits of 11-12-year-old pupils, Austin and Berg (2006) found a positive correlation between motivation and the amount of practice. Steve Oare (2012), after observing the practicing process of 7-9 grade children, concluded that their practice goals and the criteria of success tended to be unclear. He also noticed that the students who had a more defined goal were able to keep their focus on practicing for longer periods (up to 12 minutes). 

Although the type of timing that most games use for feedback is reported not to be optimal for learning music, certain game elements can be utilized to solve the problems which come up in a classroom environment. Namely, engagement, routinely repeating movements, and short-term achievable goals. As most pupils do not have a guitar at home, all the practicing has to take place within the classes. Since music lessons take place only once a week, making the most out of the 45-minute group lesson becomes a top priority. 

In a paper that we submitted in December 2014, we report the results of building and testing two gamified e-learning applications that utilize machine learning-based sound recognition algorithms: (1) interactive flashcards and (2) an arcade game, that both aim to facilitate drilling the basic chord shapes on the guitar and have different levels of interactivity. We compared the gain of the chord-changing speed under the two interactive conditions during a single music lesson using a two-group pre/post-test design (N=61). Backed by the cognitive theory of multimedia learning, we hypothesized that the group that practices under the less interactive conditions would demonstrate better results. We had to abandon the hypothesis as we were unable to detect any significant differences between the learning results of the two groups. (Käo, Niitsoo, 2015).

Screenshots of the two interactive gamified conditions used in this experiment.

Future Directions

Regardless of the quality of the automated feedback, students often learn much from just the process of recording and listening to their playing. Experiments with self-analysis of recordings have shown it to be a strong alternative to an instructor’s feedback (Napoles, Bowers, 2010; Deniz, 2012). This implies that many interactive tools that are capable of giving any feedback would be useful in teaching, even as a placebo tool. However, current results suggest that the level of feedback is quite well aligned with that of human teachers, so the expected gains are hopefully even larger. Since the spring of 2014, the MMS system has been tested on various groups at the Estonian Academy of Music and Theatre, since fall 2014, also with the 5th grade at the Gustav Adolf High School in Tallinn and numerous self-learners. 2015 will see the MMS automatic feedback being implemented into several teaching systems across the USA, and hundreds of MMS-enhanced interactive books will be published. In many cases, using automatic feedback will be the only solution for music education to compete with other educational domains. 


Austin, J. R., & Berg, M. H. (2006). Exploring Music Practice among Sixth-Grade Band and Orchestra Students. Psychology Of Music,34(4), 535-558.

Betts, Ben. (2011). The Four Pillars of Gamification. Learning Circuits: 1. Business Source Complete, EBSCOhost (accessed September 18, 2013).

Cassidy, G. G., & Paisley, A. M. (2013). Music-games: A case study of their impact. Research Studies In Music Education, 35 (1), 119-138. doi:10.1177/1321103X13488032

Criss, E. (2008). The Natural Learning Process. Music Educators Journal, 95(2), 42-46.

Deniz, J. (2012). Video Recorded Feedback for Self Regulation of Prospective Music Teachers in Piano Lessons. Journal Of Instructional Psychology, 39(1), 17-25.

Dittmar, C.,  Cano, E., Abeßer, Grollmisch, S. Music Information Retrieval Meets Music Education. [serial online]. 2012; Available from OAIster, Ipswich, MA. Accessed May 16, 2014, doi:10.4230/DFU.Vol3.11041.95

Ericsson, K. A., Prietula, M. J., Cokelyo, E. T. (2007). Making of an Expert. Harvard Business Review July-Aug, 115-121.

Estonian National curriculum for upper secondary schools (2011), Appendix 6 ‘Arts’ (Effective 2011, English translation published Sept. 2014). Accessed Dec 16, 2014, URL:

de Freitas, A., & de Freitas, M. (2013). Classroom Live: A software-assisted gamification tool. Computer Science Education, 23 (2), 186-206. doi:10.1080/08993408.2013.780449

Hill, Linda. (2012) “Violin Virtuoso: A game for violin education.” Master’s Thesis, Rice University.

Jenson, J. & de Castell, S. (2008). From Simulation to Imitation: New controllers, New forms  of Play. In Proceedings of the 2nd European Conference on Games-Based Learning. Barcelona, Spain, 16-17 October, 213-218.

Käo, K. (2007). Guitar School – the Key to the Practical Guitar Playing. Tartu: Kitarrikool Publishing.

Käo, K. (2013). Kitarriõpik noortele (Guitar method for the young). Tartu: Kitarrikool Publishing.

Käo, K., & Niitsoo, M. (2015). Optimizing the interaction between a self-learning guitar student and a sound recognition-based educational game (in press).

Käo, K., & Niitsoo, M. (2014). MatchMySound: Introducing Feedback to Online Music Education. In Y. Cao, T. Väljataga, J. Tang, H. Leung, & M. Laanpere (Eds.), New Horizons in Web Based Learning ICWL 2014 (pp. 217-225). Springer-Verlag New York.

Kalyuga, S., Chandler, P., & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology,13(4), 351-371.

van der Kleij, F. M., Eggen, T. M., Timmers, C. F., & Veldkamp, B. P. (2012). Effects of Feedback in a Computer-Based Assessment for Learning. Computers & Education, 58(1), 263-272.

Kohut, D. L. “Basic Concepts of Perceptual-Motor Learning,” in Musical Performance: Learning Theory and Pedagogy (Englewood Cliffs, NJ: Prentice Hall, 1985), 5-7.

Lee, J.J., Hammer, J. (2011). Gamification in education: what, how, why bother? Academic Exchange Quarterly 15, 2.

Lehmann, A., & Gruber, H. (2006). Music. In K. Ericsson, N. Charness, P. Feltovich, & R. Hoffmann (Eds.), The Cambridge handbook of expertise and expert performance (pp. 457-470). Cambridge: Cambridge University Press.

Muntean, C. I. 2011. Raising engagement in e-learning through gamification. The 6th

International Conference on Virtual Learning ICVL 2011. Accessed Dec 16, 2014, URL:

Napoles, J., & Bowers, J. (2010). Differential Effects of Instructor Feedback vs. Self-Observation Analysis on Music Education Majors’ Increase of Specific Reinforcement in Choral Rehearsals. Bulletin Of The Council For Research In Music Education, (183), 39.

Oare, S. (2012). Decisions Made in the Practice Room: A Qualitative Study of Middle School Students’ Thought Processes While Practicing.UPDATE: Applications Of Research In Music Education, 30(2), 63-70. doi:10.1177/8755123312437051

Peppler, K., Downton, M., Lindsay, E., & Hay, K. (2011). The Nirvana effect: Tapping video games to mediate music learning and interest. International Journal of Learning and Media, 3(1), 41–59. 

Sadakata, M. M., Hoppe, D. D., Brandmeyer, A. A., Timmers, R. R., & Desain, P. P. (2008). Real-time visual feedback for learning to perform short rhythms with expressive variations in timing and loudness. Journal Of New Music Research, 37(3), 207-220. doi:10.1080/09298210802322401

Shute, V. (2008). Focus on formative feedback. Review Of Educational Research, 78(1), 153-189. doi:10.3102/0034654307313795

Spencer, P. (2011). Suzuki method. Oxford University Press. doi:10.1093/acref/9780199579037.013.6570

Sweller, J. 2010.Cognitive load theory: recent advances in theory. In Cognitive load theory, edited by Plass, R. Moreno, and J. Brünken. Cambridge: Cambridge University Press.

Tanenbaum, J., Bizzocchi, J. 2009. Rock Band: a case study in the design of embodied interface experience. In Proceedings of the 2009 ACM SIGGRAPH Symposium on Video Games(Sandbox ’09), Stephen N. Spencer (Ed.) doi:10.1145/1581073.1581093h

Welch, G. F., Howard, D. M., Himonides, E., & Brereton, J. (2005). Real-time feedback in the singing studio: an innovatory action-research project using new voice technology. Music Education Research, 7(2), 225-249. doi:10.1080/14613800500169779

Wilson, P. H., Lee, K., Callaghan, J., & Thorpe, C. (2008). Learning to Sing in Tune: Does Real-Time Visual Feedback Help? Journal Of Interdisciplinary Music Studies, 2(1/2), 157-172.

Leave a Reply

Your email address will not be published. Required fields are marked *