On average, men and women speak roughly 15,000 words per day. We call our friends and family, log into Zoom for meetings with our colleagues, discuss our days with our loved ones, or if you’re like me, you argue with the ref about a bad call they made in the playoffs.
Hospitality, travel, IoT and the auto industry are all on the cusp of leveling-up voice assistant adoption and the monetization of voice. The global voice and speech recognition market is expected to grow at a CAGR of 17.2% from 2019 to reach $26.8 billion by 2025, according to Meticulous Research. Companies like Amazon and Apple will accelerate this growth as they leverage ambient computing capabilities, which will continue to push voice interfaces forward as a primary interface.
As voice technologies become ubiquitous, companies are turning their focus to the value of the data latent in these new channels. Microsoft’s recent acquisition of Nuance is not just about achieving better NLP or voice assistant technology, it’s also about the trove of healthcare data that the conversational AI has collected.
Our voice technologies have not been engineered to confront the messiness of the real world or the cacophony of our actual lives.
Google has monetized every click of your mouse, and the same thing is now happening with voice. Advertisers have found that speak-through conversion rates are higher than click-through conversation rates. Brands need to begin developing voice strategies to reach customers — or risk being left behind.
Voice tech adoption was already on the rise, but with most of the world under lockdown protocol during the COVID-19 pandemic, adoption is set to skyrocket. Nearly 40% of internet users in the U.S. use smart speakers at least monthly in 2020, according to Insider Intelligence.
Yet, there are several fundamental technology barriers keeping us from reaching the full potential of the technology.
By the end of 2020, worldwide shipments of wearable devices rose 27.2% to 153.5 million from a year earlier, but despite all the progress made in voice technologies and their integration in a plethora of end-user devices, they are still largely limited to simple tasks. That is finally starting to change as consumers demand more from these interactions, and voice becomes a more essential interface.
In 2018, in-car shoppers spent $230 billion to order food, coffee, groceries or items to pick up at a store. The auto industry is one of the earliest adopters of voice AI, but in order to really capture voice technology’s true potential, it needs to become a more seamless, truly hands-free experience. Ambient car noise still muddies the signal enough that it keeps users tethered to using their phones.
In the customer service industry, your accent dictates many aspects of your job. It shouldn’t be the case that there’s a “better” or “worse” accent, but in today’s global economy (though who knows about tomorrow’s) it’s valuable to sound American or British. While many undergo accent neutralization training, Sanas is a startup with another approach (and a $5.5 million seed round): using speech recognition and synthesis to change the speaker’s accent in near real time.
The company has trained a machine learning algorithm to quickly and locally (that is, without using the cloud) recognize a person’s speech on one end and, on the other, output the same words with an accent chosen from a list or automatically detected from the other person’s speech.
It slots right into the OS’s sound stack so it works out of the box with pretty much any audio or video calling tool. Right now the company is operating a pilot program with thousands of people in locations from the U.S. and U.K. to the Philippines, India, Latin America and others. Accents supported will include American, Spanish, British, Indian, Filipino and Australian by the end of the year.
To tell the truth, the idea of Sanas kind of bothered me at first. It felt like a concession to bigoted people who consider their accent superior and think others below them. Tech will fix it … by accommodating the bigots. Great!
But while I still have a little bit of that feeling, I can see there’s more to it than this. Fundamentally speaking, it is easier to understand someone when they speak in an accent similar to your own. But customer service and tech support is a huge industry and one primarily performed by people outside the countries where the customers are. This basic disconnect can be remedied in a way that puts the onus of responsibility on the entry-level worker, or one that puts it on technology. Either way the difficulty of making oneself understood remains and must be addressed — an automated system just lets it be done more easily and allows more people to do their job.
It’s not magic — as you can tell in this clip, the character and cadence of the person’s voice is only partly retained and the result is considerably more artificial sounding:
But the technology is improving and like any speech engine, the more it’s used, the better it gets. And for someone not used to the original speaker’s accent, the American-accented version may very well be more easily understood. For the person in the support role, this likely means better outcomes for their calls — everyone wins. Sanas told me that the pilots are just starting so there are no numbers available from this deployment yet, but testing has suggested a considerable reduction of error rates and increase in call efficiency.
It’s good enough at any rate to attract a $5.5 million seed round, with participation from Human Capital, General Catalyst, Quiet Capital and DN Capital.
“Sanas is striving to make communication easy and free from friction, so people can speak confidently and understand each other, wherever they are and whoever they are trying to communicate with,” CEO Maxim Serebryakov said in the press release announcing the funding. It’s hard to disagree with that mission.
While the cultural and ethical questions of accents and power differentials are unlikely to ever go away, Sanas is trying something new that may be a powerful tool for the many people who must communicate professionally and find their speech patterns are an obstacle to that. It’s an approach worth exploring and discussing even if in a perfect world we would simply understand one another better.
A group of senators sent new Amazon CEO Andy Jassy a letter Friday pressing the company for more information about how it scans and stores customer palm prints for use in some of its retail stores.
The company rolled out the palm print scanners through a program it calls Amazon One, encouraging people to make contactless payments in its brick and mortar stores without the use of a card. Amazon introduced its Amazon One scanners late last year, and they can now be found in Amazon Go convenience and grocery stores, Amazon Books and Amazon four-star stores across the U.S. The scanners are also installed in eight Washington state-based Whole Foods locations.
In the new letter, Senators Amy Klobuchar (D-MN), Bill Cassidy (R-LA) and Jon Ossoff (D-GA) press Jassy for details about how Amazon plans to expand its biometric payment system and if the data collected will help the company target ads.
“Amazon’s expansion of biometric data collection through Amazon One raises serious questions about Amazon’s plans for this data and its respect for user privacy, including about how Amazon may use the data for advertising and tracking purposes,” the senators wrote in the letter, embedded below.
The lawmakers also requested information on how many people have enrolled in Amazon One to date, how Amazon will secure the sensitive data and if the company has ever paired the palm prints with facial recognition data it collects elsewhere.
“In contrast with biometric systems like Apple’s Face ID and Touch ID or Samsung Pass, which store biometric information on a user’s device, Amazon One reportedly uploads biometric information to the cloud, raising unique security risks,” the senators wrote. “… Data security is particularly important when it comes to immutable customer data, like palm prints.”
The company controversially introduced a $10 credit for new users who enroll their palm prints in the program, prompting an outcry from privacy advocates who see it as a cheap tactic to coerce people to hand over sensitive personal data.
There’s plenty of reason to be skeptical. Amazon has faced fierce criticism for its other big biometric data project, the AI facial recognition software known as Rekognition, which the company provided to U.S. law enforcement agencies before eventually backtracking with a moratorium on policing applications for the software last year.
Maine has joined a growing number of cities, counties and states that are rejecting dangerously biased surveillance technologies like facial recognition.
The new law, which is the strongest statewide facial recognition law in the country, not only received broad, bipartisan support, but it passed unanimously in both chambers of the state legislature. Lawmakers and advocates spanning the political spectrum — from the progressive lawmaker who sponsored the bill to the Republican members who voted it out of committee, from the ACLU of Maine to state law enforcement agencies — came together to secure this major victory for Mainers and anyone who cares about their right to privacy.
Maine is just the latest success story in the nationwide movement to ban or tightly regulate the use of facial recognition technology, an effort led by grassroots activists and organizations like the ACLU. From the Pine Tree State to the Golden State, national efforts to regulate facial recognition demonstrate a broad recognition that we can’t let technology determine the boundaries of our freedoms in the digital 21st century.
Facial recognition technology poses a profound threat to civil rights and civil liberties. Without democratic oversight, governments can use the technology as a tool for dragnet surveillance, threatening our freedoms of speech and association, due process rights, and right to be left alone. Democracy itself is at stake if this technology remains unregulated.
Facial recognition technology poses a profound threat to civil rights and civil liberties.
We know the burdens of facial recognition are not borne equally, as Black and brown communities — especially Muslim and immigrant communities — are already targets of discriminatory government surveillance. Making matters worse, face surveillance algorithms tend to have more difficulty accurately analyzing the faces of darker-skinned people, women, the elderly and children. Simply put: The technology is dangerous when it works — and when it doesn’t.
But not all approaches to regulating this technology are created equal. Maine is among the first in the nation to pass comprehensive statewide regulations. Washington was the first, passing a weak law in the face of strong opposition from civil rights, community and religious liberty organizations. The law passed in large part because of strong backing from Washington-based megacorporation Microsoft. Washington’s facial recognition law would still allow tech companies to sell their technology, worth millions of dollars, to every conceivable government agency.
In contrast, Maine’s law strikes a different path, putting the interests of ordinary Mainers above the profit motives of private companies.
Maine’s new law prohibits the use of facial recognition technology in most areas of government, including in public schools and for surveillance purposes. It creates carefully carved out exceptions for law enforcement to use facial recognition, creating standards for its use and avoiding the potential for abuse we’ve seen in other parts of the country. Importantly, it prohibits the use of facial recognition technology to conduct surveillance of people as they go about their business in Maine, attending political meetings and protests, visiting friends and family, and seeking out healthcare.
In Maine, law enforcement must now — among other limitations — meet a probable cause standard before making a facial recognition request, and they cannot use a facial recognition match as the sole basis to arrest or search someone. Nor can local police departments buy, possess or use their own facial recognition software, ensuring shady technologies like Clearview AI will not be used by Maine’s government officials behind closed doors, as has happened in other states.
Maine’s law and others like it are crucial to preventing communities from being harmed by new, untested surveillance technologies like facial recognition. But we need a federal approach, not only a piecemeal local approach, to effectively protect Americans’ privacy from facial surveillance. That’s why it’s crucial for Americans to support the Facial Recognition and Biometric Technology Moratorium Act, a bill introduced by members of both houses of Congress last month.
The ACLU supports this federal legislation that would protect all people in the United States from invasive surveillance. We urge all Americans to ask their members of Congress to join the movement to halt facial recognition technology and support it, too.