Two [email protected] gmail.com Uddaish Porov SRMCEM, Lucknow,

Two Way Wireless Data Communication and American Sign Language Translator Glove for Images Text and Speech Display on Mobile Phone Dhananjai Bajpai SRMCEM, Lucknow, India [email protected] gmail.com Uddaish Porov SRMCEM, Lucknow, India [email protected] gmail.com Gaurav Srivastav AIT, Kanpur, India [email protected] gmail.com Nitin Sachan AIT, Kanpur, India [email protected] gmail.com Abstract- Sign language is a type of communication skill that uses gestures and postures instead of visual and acoustic display to convey speaker’s thoughts. Now-a-days market is equipped with many American Sign Language teaching gadgets that are both software and hardware based. There are two major drawbacks of these devices firstly, that none of them is a two way communication device secondly, all these gadgets can only teach ASL but they can-not communicate their corresponding meanings to remote areas. This paper is focused on providing an applicative architecture of hand glove that records the gestures made by a speech and hearing disabled people, converts them into a meaningful text and transmits them to remote areas with help of Bluetooth, GSM-CDMA and Internet modules. It has five flex sensors, three contact sensors and one three axis accelerometer that serve as an input channel, AVR 2560 micro controller is used for gesture processing algorithm, trans-receiver modules for transmitting and receiving data and a graphic user interface that displays all the information sent and received between two users. Keywords- American Sign Language, Flex sensors, Accelerometer, Contact Sensors, AVR 2560 micro controller, Trans-Receiver module, Graphic User Interface. I. INTRODUCTION Sign language uses combination of physical movements, facial expressions gestures and postures to communicate. Unlike verbal or written form of communication sign language is a manual form of communication, when two or more people do not know any common language then they communicate with help of sign language, which is the most primitive and elementary form of communication. Different areas have native sets of sign language but it was important to develop a standard international sign language all across the globe. Need of International Sign Language can be traced with the formation Word Deaf Congress in 1951, WDC used pidgins 1 as building blocks of ISL. In 1970 vocabulary and grammar with Gestunos were added to ISL so that it becomes more flexible and user friendly. There are various sign languages like American, Asia Pacific, European and Middle East sign language. Speech and hearing disabled people use American sign language2 that uses one hand gestures to make 26 English alphabets it has more than 500,000 speakers across the globe. 2015 Fifth International Conference on Communication Systems and Network Technologies 978-1-4799-1797-6/15 $31.00 © 2015 IEEE DOI 10.1109/CSNT.2015.121 578 II. RECENT WORK Some models are based on capturing gestures with help of a camera 3 and they have been successfully deployed in gaming purpose for creating virtual 3D environment like Sony play station eye. But using cameras as an input channel will produce different kind of background noises that will be difficult to process. Tan Tian Swee et al., proposed Wireless Data Gloves 4 that are designed for Malaysian sign language. One 3D accelerometer and five flex sensors were used to capture the gestures made by a user which were fed to PIC18F4550 for digital conversion. A computer was used for vector quantization of 24 input signals from six different sensors and quantized data was compressed to closet codeword predefined in the stored codebook. Indices of the codeword will be used as inputs to HMMs 5 for further evaluation and a model score is generated which should match the predefined scores in order to produce audio output of the corresponding text. In 2011 An Assistive Body Sensor Network Glove 6 was proposed by Satjakarn Vutinuntakasame et al., that tested 44 characters The Quick Brown Fox Jumps Over a Lazy Dog, he included full stop and space gestures by integrating five flex sensors, a 3D accelerometer and a speech synthesizer to produce audio outputs for corresponding ASL gestures. Data acquired by these sensors is recognized by help of probability density functions and the misclassified gestures are analyzed with help of confused clusters which are processed by multivariate Gaussian model in order to produce a confusion matrix, this work successfully produced 35 correct characters. Netchanok Tanyawiwat et al., in 2012 proposed An Assistive Communication Glove using Combined Sensory Channels 7 which had five flex sensors, five contact sensors and one 3D accelerometer combined into the same sensory channel which is attached with the body sensor network. In order to reduce the installation size, all the input channels were combined to one input channel network. The maximum and minimum values of flex sensors in the non-contact and contact state for all five fingers are collected and processed with median filtering. Multivariate Gaussian distribution covariance matrix combined with four different models of multiobjective Bayesian Framework are used for feature selection to enhance the recognition accuracy. Out of all four models used for feature selection, Model 4 had the maximum accuracy of 77.9% but recognition accuracy of letters A,D,E,F,G,H,K,L,P,Y,Z were reduced. In 2012 Kunal Kadam et al., proposed American Sign Language Interpreter 8 which had five flex sensors that served as an input channel, a MCU was used to process these inputs and LCD screen was used to display English alphabets. This glove could be used in teaching and learning mode with help of a switch matrix. In teaching mode appropriate hand gesture for the corresponding alphabet was made and held for three seconds so that MCU can calculate average of all five values and then it allocated a particular gesture for that value. In learning mode user made a hand gesture and the values of that gesture were checked for five times, if they matched with the predefined values 579 stored in microcontroller then perfect match was indicated on the LCD screen. In 2014 S.Saivenkatesh et al., suggested “Intelligible Speech Recognition and Translating Embedded Device for Deaf and Dumb Individuals.” It converts ASL gesture’s image drawn on a two layer tft touch screen into corresponding sounds using arm sound converter. Vertical and horizontal resistive layers change their resistance as the fingers move along x and y axis on tft screen, once the gesture is drawn it is converted into sound. It used an advance speech to image converter for translating words into image. Major disadvantage was that only two dimensional gestures could be drawn and decoded into sound, letters were not transmitted to different users. Architectures discussed above use complicated hardware design that are primarily focused on translating ASL. These architectures do not transmit or receive that translated information from a particular person. This paper discusses an applicative hand glove architecture that has very simple hardware assembly and it is focused on firstly translating ASL in form of English alphabet, its corresponding sound, sentences, or images, secondly giving a platform for making own gestures, selecting some gestures given in the application and writing its dedicated output, thirdly transmitting the translated data to an authorized mobile phone user with help of a three different protocols and fourthly displaying the responses send by a mobile user on a hand glove user interface. III. TRANSLATION OF ASL A. Extraction of Digital Values Six flex sensors 2.2″ are placed on wrist, thumb and four fingers, three contact sensors are placed on fore finger, middle finger and thumb these sensors serve as a primary input channel to AVR-2560 microcontroller. When user makes a gesture electrons in these flex sensors are subjected to bending stress and resistance changes from 4.1v to 5.0v. Thus deflection of 0.9v is amplified to a larger range with help of a darlington pair transistor circuitry shown in Fig. 1. Figure1. Conversion of analog resistance to voltage The output of this circuit is filtered with help of filter circuit and then converted into digital form with help of ADC pins of AVR-2560 microcontroller provided within Arduino Mega board. FIG. 2 shows hardware assembly used for feature extraction. Figure2. Hardware assembly for feature extraction Contact sensors are used to generate interrupts whenever a contact between thumb, fore finger or middle finger is made, this interrupt is acknowledged by AVR 2560 microcontroller. B. Creation of Database 580 Digital data of every gesture made by a user is acquired by six flex sensors and three contact sensors, microcontroller processes this data and coverts each ASL gesture in form nine sets of digital voltage values as shown in Table I. Thus a database is created that stores each ASL gesture in form of its corresponding digital voltage values and their respective outputs. When a gesture is made micro controller reads the incoming data and compares it with existing voltage values, if the correct gesture is made it initiates a output generation process, else it displays gesture is not recognized. With help of a switch user can select the customization mode in which he can make his own gesture. System will ask four times to repeat that gesture and then it will store new values of that gesture, then output generation graphic user interface is used to assign a particular output to that new gesture. User can extend the database to 1300 gestures with their outputs, by making his own gestures and assigning an output to them. C. Output Generation GUI for laptop and mobile phones Once a correct gesture is made micro controller Bluetooth module commands the graphic user interface installed in a laptop to display corresponding English alphabet and sound as shown in FIG. 3. Figure3. Bluetooth Laptop GUI display Hand glove Bluetooth module paired with any android mobile phone, performs the same operation that is discussed above as shown in FIG. 4 Figure4. Bluetooth GUI displaying letters, images, sentences and sounds in phone Message mode can be selected with help of a keypad in which a user can type and send messages to any number. Lastly he can select the chat mode in which he can chat with an authenticated mobile phone user using internet protocols as shown in FIG. 5 Figure5. Internet GUI displaying letters, images, sentences and sounds in phone IV. OUTPUT CUSTOMIZATION AND DATA RECEIVING FIG. 6 shows free gestures present in the application, in case user does not want make gestures he can select them and write the output corresponding to them so that every time that gesture is made same results are displayed on mobile phone and laptop users. User can select Images, complete set of sentences and phrases as outputs to a particular gesture. 581 Figure6. Extra gestures provided in the application Multimedia messages send by mobile phone or laptop users are displayed on hand glove screen as shown in FIG. 7 V. RESULTS OBTAINED Signal levels for each sensor used in making ASL gesture ‘B’ are shown in Fig.8 A B C D E F I K L O V W X Y are produced by combination of five flex sensors placed at each finger. Letters like G H J P Q Z are produced by combination flex sensor placed on the wrist and other five flex sensors. Letters like M N P R S T U are produced by combination of six flex sensors and three contact sensors. Table I shows digital values of all 26 alphabets and Table II shows digital values of eight sentences with their corresponding images. Efficiency of the glove can be calculated by generating 42 characters of dummy sentence THE QUICK BROWN FOX JUMPS OVER A LAZY DOG. This includes space and full stop gestures respectively. Table III represents efficiency of Hand Glove in three different iterations, maximum efficiency of 88% is obtained in 3rd iteration where other iterations have lower efficiencies. Thus overall efficiency is average of all three efficiencies attained in these iterations which is of 83%. FIG. 7 represents a graphic user interface that displays the transmitted and received data from a distant user. Figure7. Signal level of digital values attained by gesture ‘B’ 582 Figure8. Signal level of digital values attained by gesture ‘B’ Table I. English alphabet and their corresponding digital values for each flex and contact sensor Letter F0 F1 F2 F3 F4 F5 C-0 C-1 C-2 A 900 990 990 995 995 850 0 0 0 B 1000 940 930 890 890 850 0 0 0 C 995 990 995 992 990 845 0 0 0 D 1000 940 995 990 990 850 0 0 0 E 1012 1015 1011 1010 1013 850 0 0 0 F 1000 1003 1008 1020 1002 850 0 0 0 G 930 930 995 995 995 890 0 0 0 H 1000 930 930 1000 1000 890 0 0 0 I 1000 1000 995 995 930 850 0 0 0 J 995 1000 995 995 930 995 0 0 0 K 930 930 935 1000 995 850 0 0 0 L 930 940 1000 995 995 850 0 0 0 M 1010 990 995 990 1014 850 0 1 1 N 1010 995 995 1013 1011 850 0 1 0 O 1001 1006 1009 1003 1003 850 0 0 0 P 930 920 940 1000 995 995 1 0 0 Q 915 930 1000 1001 1010 1021 0 0 0 R 996 930 920 990 996 840 0 1 0 S 1002 990 1014 1012 1003 840 1 1 1 T 945 984 1020 1020 1020 830 1 1 0 U 1019 930 925 990 980 830 0 1 0 V 1019 930 925 990 980 830 0 0 0 W 990 930 930 920 1000 850 0 0 0 X 1020 956 990 1021 1021 840 0 0 0 Y 930 1000 1000 995 930 850 0 0 0 Z 1020 925 1016 1009 1020 1020 0 0 0 FS 1021 976 951 971 964 1022 1 1 1 SP 970 965 968 969 975 1012 0 0 0 583 TABLE II. Digital values of 8 gestures that are used to display images and set of sentences. TABLE III. Efficiency of Hand glove VI. CONCLUSION Unlike other architectures of hand glove which can perform only one way communication, this paper represents two way communication applicative architecture of ASL hand glove. This glove not only provides a graphic user interface that decodes 36 gestures to corresponding alphabets, sounds, images and set of sentences on its screen, it also provides a graphic user interface that successfully transmits them to an authorized mobile phone and simultaneously displays the responses transmitted by that phone on the same screen with help of Bluetooth, Global Service for Mobile 10 and Internet protocols. This applicative device provides a feature of selecting free gestures, making own gestures and selecting their corresponding output also. Use of sixth flex sensor on the wrist eliminates the use of accelerometers, which simplifies the gesture recognition algorithm, only three contact sensors are used instead of five contact sensors which were used in previous architectures, that complicated the hardware design, thus less number of sensors simplifies the complexity of hardware. Gesture translation has an efficiency of 83%. It can be concluded that simplified hand glove applicative architecture will help its users to communicate with others from anywhere and at anytime all over the world thus it will dissolve a communication barrier between a such and normal people. VII. APPLICATION SENTENCE FLEX0 FLEX1 FLEX2 FLEX3 FLEX4 FLEX5 Good morning 1020 970 950 950 966 995 Good evening 950 960 1010 1015 955 870 Good night 975 1021 970 950 976 871 Hi 963 950 1013 971 970 876 Hello 977 970 960 968 1021 876 Bye 941 1017 960 964 975 1003 1008 950 975 965 1018 1023 970 971 1021 1019 961 1016 Test sentence THE QUICK BROWN FOX JUMPS OVER A LAZY DOG. EFFICIENCY 1st Iteration THE QVICK BROWN FOX JVMPS OVEV A LAZY DEG 85% 2nd Iteration THE QRICK BROWN FEX JVMPE OUER A LAZY DEG 77% 3rd Iteration THO QUICK BUOWN FOX JUMPE ORER A LAZY DOG. 88% 584 There are more than 500 million speech and hearing disabled person all across the globe and more than 25,000 newborns are added into this category every year. This architecture can be used in creating an interactive class room education system for such people as shown in FIG. 9, it can be used in broadcasting messages to near ones in case of emergency, this can be used in enhancing their social integrity by assisting them in daily communication process. Glove can accommodate more than 1300 gestures it provides a platform for making own gestures or using free gestures for different scenarios Figure9. Class room teaching application VIII. REFERENCES 1. McKee R., Napier J., “Interpreting in International Sign Pidgin: An Analysis,” Journal of Sign Language Linguistics5(1), 2002 2. Trudy Suggs, “Alpha Teach Yourself American sign Language in 24 hours” Publication Date: December2, 2003 3. (29 September 2011), 5DT fifth dimension technology. Available: http:// www.5dt.com 4. E.Blass, (2007, 20 September 2011), Sony announces PlayStation Eye webcam for PS3. Available:http//www.engadget.com/ 2007/04/26/sony-announcesplaystation-eye-webcam-for-ps 5. T. T. Swee, A.K.Ariff, S.H. Salleh, S.K. Seng, and L.S. Huat, “Wireless Data Gloves Malay Sign Language Recognition System,” Proc. of International Conference on Information, Communications and Signal Processing, Singapore, 2007, pp. 1-4 6. Q.Munib, et al., “American Sign Language (ASL) recognition based on Hough Transform and Neural Networks,” Espert Systems with Applications, 2007 vol. 32, pp. 24- 27 7. S. Vutinuntakasame, V. Jaijongrak and S. Thiemjarus, “An Assistive Body Sensor Network Glove for Speech- and Hearing- Impaired Disabilities,” Proc. of International Conference on Body Sensor Networks, Pathumthani, , 2011, 7- 12 6 8. N. Tanyawiwat and S. Thiemjarus, “Design of an Assistive Communication Glove using Combined Sensory Channels, ” Proc. of 9th IEEE International Conference on Wearable and Implantable Body Sensor Networks, 2012, pp. 34-39. 9. Kunal Kadam, Rucha Ganu, Ankita Bhosekar, Prof.S.D.Joshi, “American Sign Language Inerpreter,” Proc. 4th International Conference on Technology for education.2012, pp. 157-159 10. S.Saivenatesh and K.Sutiya “Intelligble Speech Recognition and Translating Embedded Device for Deaf Dumb Individuals,” Proc. 5th National Conference on VLSI, Embeddded, and Communication Networks, April 17, 2014 585