Im not sure if you mean an exact match to the voice, meaning if a male voice is used in setting up the email and then a womans voice says the password, if you want to make sure it was that exact male voice on setup.
In the case you want just a spoken password, it is possible but i think you are over engineering this. Two factor authentication via text or phone call to tell you a code to enter is much more common and in my opinion, user friendly. Also, I don't think it is wise to have people yelling their passwords into a phone when they get frustrated that the way the spoke it on setup isn't the way its being recorded when trying to login. You are also going to have to deal with people going "My password is D A V I D" and have to figure out how to parse out the garbage text (Ex: My password is) and then parse the frustrated/yelling text (that hopefully gets transcribed correctly) as your password.
However, to essentially do Speech to text you have some options.
1) You can use twilios transcription service:
REST API: Transcriptions - Twilio[
^]
TwiML™ Voice: <Record> - Twilio[
^]
2) IBM Watson has speech to text capabilities
Speech to Text | IBM Watson Developer Cloud[
^]
So with twilio you would do an outbound call, record that call and use an option to transcribe it.
From that transcription you should be able to get the text, and then decrypt the pass in db, or hash the transcription and compare to password hash in db to then authenticate user.
I think this method is going to create a lot of headaches for you but that would be one approach. If you want to look into two factor authentication, twilio has that capability as well (
Authy Two-factor Authentication - Twilio[
^])