Research papers come out far too rapidly for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers — particularly in but not limited to artificial intelligence — and explain why they matter.
This week, a startup that’s using UAV drones for mapping forests, a look at how machine learning can map social media networks and predict Alzheimer’s, improving computer vision for space-based sensors and other news regarding recent technological advances.
Machine learning tools are being used to aid diagnosis in many ways, since they’re sensitive to patterns that humans find difficult to detect. IBM researchers have potentially found such patterns in speech that are predictive of the speaker developing Alzheimer’s disease.
The system only needs a couple minutes of ordinary speech in a clinical setting. The team used a large set of data (the Framingham Heart Study) going back to 1948, allowing patterns of speech to be identified in people who would later develop Alzheimer’s. The accuracy rate is about 71% or 0.74 area under the curve for those of you more statistically informed. That’s far from a sure thing, but current basic tests are barely better than a coin flip in predicting the disease this far ahead of time.
This is very important because the earlier Alzheimer’s can be detected, the better it can be managed. There’s no cure, but there are promising treatments and practices that can delay or mitigate the worst symptoms. A non-invasive, quick test of well people like this one could be a powerful new screening tool and is also, of course, an excellent demonstration of the usefulness of this field of tech.
(Don’t read the paper expecting to find exact symptoms or anything like that — the array of speech features aren’t really the kind of thing you can look out for in everyday life.)
Making sure your deep learning network generalizes to data outside its training environment is a key part of any serious ML research. But few attempt to set a model loose on data that’s completely foreign to it. Perhaps they should!
Researchers from Uppsala University in Sweden took a model used to identify groups and connections in social media, and applied it (not unmodified, of course) to tissue scans. The tissue had been treated so that the resultant images produced thousands of tiny dots representing mRNA.
Normally the different groups of cells, representing types and areas of tissue, would need to be manually identified and labeled. But the graph neural network, created to identify social groups based on similarities like common interests in a virtual space, proved it could perform a similar task on cells. (See the image at top.)
“We’re using the latest AI methods — specifically, graph neural networks, developed to analyze social networks — and adapting them to understand biological patterns and successive variation in tissue samples. The cells are comparable to social groupings that can be defined according to the activities they share in their social networks,” said Uppsala’s Carolina Wählby.
It’s an interesting illustration not just of the flexibility of neural networks, but of how structures and architectures repeat at all scales and in all contexts. As without, so within, if you will.
The vast forests of our national parks and timber farms have countless trees, but you can’t put “countless” on the paperwork. Someone has to make an actual estimate of how well various regions are growing, the density and types of trees, the range of disease or wildfire, and so on. This process is only partly automated, as aerial photography and scans only reveal so much, while on-the-ground observation is detailed but extremely slow and limited.
Treeswift aims to take a middle path by equipping drones with the sensors they need to both navigate and accurately measure the forest. By flying through much faster than a walking person, they can count trees, watch for problems and generally collect a ton of useful data. The company is still very early-stage, having spun out of the University of Pennsylvania and acquired an SBIR grant from the NSF.
“Companies are looking more and more to forest resources to combat climate change but you don’t have a supply of people who are growing to meet that need,” Steven Chen, co-founder and CEO of Treeswift and a doctoral student in Computer and Information Science (CIS) at Penn Engineering said in a Penn news story. “I want to help make each forester do what they do with greater efficiency. These robots will not replace human jobs. Instead, they’re providing new tools to the people who have the insight and the passion to manage our forests.”
Another area where drones are making lots of interesting moves is underwater. Oceangoing autonomous submersibles are helping map the sea floor, track ice shelves and follow whales. But they all have a bit of an Achilles’ heel in that they need to periodically be picked up, charged and their data retrieved.
Purdue engineering professor Nina Mahmoudian has created a docking system by which submersibles can easily and automatically connect for power and data exchange.
A yellow marine robot (left, underwater) finds its way to a mobile docking station to recharge and upload data before continuing a task. (Purdue University photo/Jared Pike)
The craft needs a special nosecone, which can find and plug into a station that establishes a safe connection. The station can be an autonomous watercraft itself, or a permanent feature somewhere — what matters is that the smaller craft can make a pit stop to recharge and debrief before moving on. If it’s lost (a real danger at sea), its data won’t be lost with it.
You can see the setup in action below:
Drones may soon become fixtures of city life as well, though we’re probably some ways from the automated private helicopters some seem to think are just around the corner. But living under a drone highway means constant noise — so people are always looking for ways to reduce turbulence and resultant sound from wings and propellers.
Researchers at the King Abdullah University of Science and Technology found a new, more efficient way to simulate the airflow in these situations; fluid dynamics is essentially as complex as you make it, so the trick is to apply your computing power to the right parts of the problem. They were able to render only flow near the surface of the theoretical aircraft in high resolution, finding past a certain distance there was little point knowing exactly what was happening. Improvements to models of reality don’t always need to be better in every way — after all, the results are what matter.
Computer vision algorithms have come a long way, and as their efficiency improves they are beginning to be deployed at the edge rather than at data centers. In fact it’s become fairly common for camera-bearing objects like phones and IoT devices to do some local ML work on the image. But in space it’s another story.
Performing ML work in space was until fairly recently simply too expensive power-wise to even consider. That’s power that could be used to capture another image, transmit the data to the surface, etc. HyperScout 2 is exploring the possibility of ML work in space, and its satellite has begun applying computer vision techniques immediately to the images it collects before sending them down. (“Here’s a cloud — here’s Portugal — here’s a volcano…”)
For now there’s little practical benefit, but object detection can be combined with other functions easily to create new use cases, from saving power when no objects of interest are present, to passing metadata to other tools that may work better if informed.
Machine learning models are great at making educated guesses, and in disciplines where there’s a large backlog of unsorted or poorly documented data, it can be very useful to let an AI make a first pass so that graduate students can use their time more productively. The Library of Congress is doing it with old newspapers, and now Carnegie Mellon University’s libraries are getting into the spirit.
CMU’s million-item photo archive is in the process of being digitized, but to make it useful to historians and curious browsers it needs to be organized and tagged — so computer vision algorithms are being put to work grouping similar images, identifying objects and locations, and doing other valuable basic cataloguing tasks.
“Even a partly successful project would greatly improve the collection metadata, and could provide a possible solution for metadata generation if the archives were ever funded to digitize the entire collection,” said CMU’s Matt Lincoln.
A very different project, yet one that seems somehow connected, is this work by a student at the Escola Politécnica da Universidade de Pernambuco in Brazil, who had the bright idea to try sprucing up some old maps with machine learning.
The tool they used takes old line-drawing maps and attempts to create a sort of satellite image based on them using a Generative Adversarial Network; GANs essentially attempt to trick themselves into creating content they can’t tell apart from the real thing.
Well, the results aren’t what you might call completely convincing, but it’s still promising. Such maps are rarely accurate but that doesn’t mean they’re completely abstract — recreating them in the context of modern mapping techniques is a fun idea that might help these locations seem less distant.
In an overcrowded market of online fashion brands, consumers are spoilt for choice on what site to visit. They are generally forced to visit each brand one by one, manually filtering down to what they like. Most of the experience is not that great, and past purchase history and cookies aren’t much to go on to tailor user experience. If someone has bought an army-green military jacket, the e-commerce site is on a hiding to nothing if all it suggests is more army-green military jackets…
Instead, Psycke ( it’s brand name is ‘PSYKHE’) is an e-commerce startup that uses AI and psychology to make product recommendations based both on the user’s personality profile and the ‘personality’ of the products. Admittedly, a number of startups have come and gone claiming this, but it claims to have taken a unique approach to make the process of buying fashion easier by acting as an aggregator that pulls products from all leading fashion retailers. Each user sees a different storefront that, says the company, becomes increasingly personalized.
It has now raised $1.7 million in seed funding from a range of investors and is announcing new plans to scale its technology to other consumer verticals in the future in the B2B space.
The investors are Carmen Busquets – the largest founding investor in Net-a-Porter; SLS Journey – the new investment arm of the MadaLuxe Group, the North American distributor of luxury fashion; John Skipper – DAZN Chairman and former Co-chairman of Disney Media Networks and President of ESPN; and Lara Vanjak – Chief Operating Officer at Aser Ventures, formerly at MP & Silva and FC Inter-Milan.
So what does it do? As a B2C aggregator, it pools inventory from leading retailers. The platform then applies machine learning and personality-trait science, and tailors product recommendations to users based on a personality test taken on sign-up. The company says it has international patents pending and has secured affiliate partnerships with leading retailers that include Moda Operandi, MyTheresa, LVMH’s platform 24S, and 11 Honoré.
The business model is based around an affiliate partnership model, where it makes between 5-25% of each sale. It also plans to expand into B2B for other consumer verticals in the future, providing a plug-in product that allows users to sort items by their personality.
How does this personality test help? Well, Psykhe has assigned an overall psychological profile to the actual products themselves: over 1 million products from commerce partners, using machine learning (based on training data).
So for example, if a leather boot had metal studs on it (thus looking more ‘rebellious’), it would get a moderate-low rating on the trait of ‘Agreeableness’. A pink floral dress would get a higher score on that trait. A conservative tweed blazer would get a lower score tag on the trait of ‘Openness’, as tweed blazers tend to indicate a more conservative style and thus nature.
It’s competitors include The Yes and Lyst. However, Psykhe’s main point of differentiation is this personality scoring. Furthermore, The Yes is app-only, US-only, and only partners with monobrands, while Lyst is an aggregator with 1,000s of brands, but used as more of a search platform.
Psykhe is in a good position to take advantage of the ongoing effects of COVID-19, which continue to give a major boost to global ecommerce as people flood online amid lockdowns.
The startup is the brainchild of Anabel Maldonado, CEO & founder, (along with founding team CTO Will Palmer and Lead Data Scientist, Rene-Jean Corneille, pictured above), who studied psychology in her hometown of Toronto, but ended up working at in the UK’s NHS in a specialist team that made developmental diagnoses for children under 5.
She made a pivot into fashion after winning a competition for an editorial mentorship at British Marie Claire. She later went to the press department of Christian Louboutin, followed by internships at the Mail on Sunday and Marie Claire, then spending several years in magazine publishing before moving into e-commerce at CoutureLab. Going freelance, she worked with a number of luxury brands and platforms as an editorial consultant. As a fashion journalist, she’s contributed industry op-eds to publications such as The Business of Fashion, T The New York Times Style, and Marie Claire.
As part of the fashion industry for 10 years, she says she became frustrated with the narratives which “made fashion seem more frivolous than it really is. I thought, this is a trillion-dollar industry, we all have such emotional, visceral reactions to an aesthetic based on who we are, but all we keep talking about is the ‘hot new color for fall and so-called blanket “must-haves’.”
But, she says, “there was no inquiry into individual differences. This world was really missing the level of depth it deserved, and I sought to demonstrate that we’re all sensitive to aesthetic in one way or another and that our clothing choices have a great psychological pay-off effect on us, based on our unique internal needs.” So she set about creating a startup to address this ‘fashion psychology’ – or, as she says “why we wear what we wear”.
Project management service Wrike today announced a major update to its platform at its user conference that includes a lot of new AI smarts for keeping individual projects on track and on time, as well as new solutions for marketers and project management offices in large corporations. In addition, the company also launched a new budgeting feature and tweaks to the overall user experience.
The highlight of the launch, though, is, without doubt, the launch of the new AI and machine learning capabilities in Wrike . With more than 20,000 customers and over 2 million users on the platform, Wrike has collected a trove of data about projects that it can use to power these machine learning models.
The way Wrike is now using AI falls into three categories: project risk prediction, task prioritization and tools for speeding up the overall project management workflow.
Figuring out the status of a project and knowing where delays could impact the overall project is often half the job. Wrike can now predict potential delays and alert project and team leaders when it sees events that signal potential issues. To do this, it uses basic information like start and end dates, but more importantly, it looks at the prior outcomes of similar projects to assess risks. Those predictions can then be fed into Wrike’s automation engine to trigger actions that could mitigate the risk to the project.
Task prioritization does what you would expect and helps you figure out what you should focus on right now to help a project move forward. No surprises there.
What is maybe more surprising is that the team is also launching voice commands (through Siri on iOS) and Gmail-like smart replies (in English for iOS and Android). Those aren’t exactly core features of a project management tools, but as the company notes, these features help remove the overall friction and reduce latencies. Another new feature that falls into this category is support for optical character recognition to allow you to scan printed and handwritten notes from your phones and attach them to tasks (iOS only).
“With more employees working from home, work and personal life are becoming intertwined,” the company argues. “As workers use AI in their personal lives, team managers and everyday users expect the smarts they’re accustomed to in consumer devices and apps to help them manage their work as well. Wrike Work Intelligence is the most comprehensive machine learning foundation that taps into tens of millions of work-related user engagements to power cross-functional collaboration to help organizations achieve operational efficiency, create new opportunities and accelerate digital transformation. Teams can focus on the work that matters most, predict and minimize delays, and cut communication latencies.”
The other major new feature — at least if you’re in digital marketing — is Wrike’s new ability to pull in data about your campaigns from about 50 advertising, marketing automation and social media tools, which is then displayed inside the Wrike experience. In a fast-moving field, having all that data at your fingertips and right inside the tool where you think about how to manage these projects seems like a smart idea.
Somewhat related, Wrike’s new budgeting feature also now makes it easier for teams to keep their projects within budget, using a new built-in rate card to manage project pricing and update their financials.
“We use Wrike for an extensive project management and performance metrics system,” said Shannon Buerk, the CEO of engage2learn, which tested this new budgeting tool. “We have tried other PM systems and have found Wrike to be the best of all worlds: easy to use for everyone and savvy enough to provide valuable reporting to inform our work. Converting all inefficiencies into productive time that moves your mission forward is one of the keys to a culture of engagement and ownership within an organization, even remotely. Wrike has helped us get there.”
As companies manufacturer goods, human inspectors review them for defects. Think of a scratch on smartphone glass or a weakness in raw steel that could have an impact downstream when it gets turned into something else. Landing AI, the company started by former Google and Baidu AI guru Andrew Ng, wants to use AI technology to identify these defects, and today the company launched a new visual inspection platform called LandingLens.
“We’re announcing LandingLens, which is an end-to-end visual inspection platform to help manufacturers build and deploy visual inspection systems [using AI],” Ng told TechCrunch.
He says that company’s goal is to bring AI to manufacturing companies, but he couldn’t simply repackage what he he had learned at Google and Baidu, partly because it involved a different set of consumer use cases, and partly because there is just much less data to work with in a manufacturing setting.
Adding to the degree of difficulty here, each setting is unique, and there is no standard playbook you can necessarily apply across each vertical. This meant Landing AI had to come up with a general tool kit that each company could use for the unique requirements of their manufacturing process.
Ng says to put this advanced technology into the hands of these customers and apply AI to visual inspection, his company has created a visual interface where companies can work through a defined process to train models to understand each customer’s inspection needs.
The way it works is you take pictures of what a good finished product looks like, and what a defective product could look like. It’s not as easy as it might sound because human experts can disagree over what constitutes a defect.
The manufacturer creates what’s called a defect book where the inspector experts work together to determine what that defect looks like via a picture, and resolve disagreements when they happen. All this is done through the LandingLens interface.
Once inspectors have agreed upon a set of labels, they can begin iterating on a model in the Model Iteration Module where the company can train and run models to get to a state of agreed upon success where the AI is picking up the defects on a regular basis. As customers run these experiments, the software generates a report on the state of the model, and customers can refine the models as needed based on the information in the report.
Ng says that his company is trying to bring in sophisticated software to help solve a big problem for manufacturing customers. “The bottleneck [for them] is building the deep learning algorithm, really the machine learning software. They can take the picture and render judgment as to whether this part is okay, or whether it is defective, and that’s what our platform helps with,” he said.
He thinks this technology could ultimately help recast how goods are manufactured in the future. “I think deep learning is poised to transform how inspection is done, which is really the key step. Inspection is really the last line of defense against quality defects in manufacturing. So I’m excited to release this platform to help manufacturers do inspections more accurately,” he said.
Every year at its MAX user conference, Adobe shows off a number of research projects that may or may not end up in its Creative Cloud apps over time. One new project that I hope we’ll soon see in its video apps is Project Sharp Shots, which will make its debut later today during the MAX Sneaks event. Powered by Adobe’s Sensei AI platform, Sharp Shots is a research project that uses AI to deblur videos.
Shubhi Gupta, the Adobe engineer behind the project, told me the idea here is to deblur a video — no matter whether it was blurred because of a shaky camera or fast movement — with a single click. In the demos she showed me, the effect was sometimes relatively subtle, as in a video of her playing ukulele, or quite dramatic, as in the example of a fast-moving motorcycle below.
With Project Sharp Shots, there’s no parameter tuning and adjustment like we used to do in our traditional methods,” she told me. “This one is just a one-click thing. It’s not magic. This is simple deep learning and AI working in the background, extracting each frame, deblurring it and producing high-quality deblurred photos and videos.”
Image Credits: AdobeGupta tells me the team looked at existing research on deblurring images and then optimized that process for moving images — and then optimized that for lower-memory usage and speed.
It’s worth noting that After Effects already offers some of these capabilities for deblurring and removing camera shake, but that’s a very different algorithm with its own set of limitations.
This new system works best when the algorithm has access to multiple related frames before and after, but it can do its job with just a handful of frames in a video.
The pandemic has put stress on companies dealing with a workforce that is mostly — and sometimes suddenly — working from home. That has led to rising needs for security and governance tooling, something that Egnyte is looking to meet with new features aimed at helping companies cope with file management during the pandemic.
Egnyte is an enterprise file storage and sharing (EFSS) company, though it has added security services and other tools over the years.
“It’s no surprise that there’s been a rapid shift to remote work, which has I believe led to mass adoption of multiple applications running on multiple clouds, and tied to that has been a nonlinear reaction of exponential growth in data security and governance concerns,” Vineet Jain, co-founder and CEO at Egnyte, explained.
There’s a lot of data at stake.
Egnyte’s announcements today are in part a reaction to the changes that COVID has brought, a mix of net-new features and capabilities that were on its road map, but accelerated to meet the needs of the changing technology landscape.
The company is introducing a new feature called Smart Cache to make sure that content (wherever it lives) that an individual user accesses most will be ready whenever they need it.
“Smart Cache uses machine learning to predict the content most likely to be accessed at any given site, so administrators don’t have to anticipate usage patterns. The elegance of the solution lies in that it is invisible to the end users,” Jain said. The end result of this capability could be lower storage and bandwidth costs, because the system can make this content available in an automated way only when it’s needed.
Another new feature is email scanning and governance. As Jain points out, email is often a company’s largest data store, but it’s also a conduit for phishing attacks and malware. So Egnyte is introducing an email governance tool that keeps an eye on this content, scanning it for known malware and ransomware and blocking files from being put into distribution when it identifies something that could be harmful.
As companies move more files around it’s important that security and governance policies travel with the document, so that policies can be enforced on the file wherever it goes. This was true before COVID-19, but has only become more true as more folks work from home.
Finally, Egnyte is using machine learning for auto-classification of documents to apply policies to documents without humans having to touch them. By identifying the document type automatically, whether it has personally identifying information or it’s a budget or planning document, Egnyte can help customers auto-classify and apply policies about viewing and sharing to protect sensitive materials.
Egnyte is reacting to the market needs as it makes changes to the platform. While the pandemic has pushed this along, these are features that companies with documents spread out across various locations can benefit from regardless of the times.
The company is over $100 million ARR today, and grew 22% in the first half of 2020. Whether the company can accelerate that growth rate in H2 2020 is not yet clear. Regardless, Egnyte is a budding IPO candidate for 2021 if market conditions hold.
One big technology by-product of the Covid-19 pandemic has been a much stronger focus on online education solutions — providing the tools for students to continue learning when the public health situation is preventing them from going into physical classrooms. As it happens, that paradigm also applies to the business world.
Today, a startup out of Dublin called LearnUpon, which has been building e-learning solutions not for schools but corporates to use for development and training, has raised $56 million to feed a growth in demand for its tools, particularly in the U.S. market, which currently accounts for 70% of LearnUpon’s sales.
The funding is coming from a single investor, Summit Partners . LearnUpon’s CEO and co-founder Brendan Noud said the capital will be used in two areas. First, to add more people to the startup’s engineering and product teams (it has 180 employees currently) to continue expanding in areas like data analytics, providing more insights to its customers on how their training materials are used on via its learning management system (commonly referred to as LMS in the industry). Second, to bring on more people to help sell the product particularly in countries where it is currently growing fast, like the U.S., to larger corporate clients.
LearnUpon already has some 1,000 customers globally, including Booking.com, Twilio, USA Football and Zendesk. And notably, eight-year-old LearnUpon was profitable and had only raised $1.5 million before now.
“We’ve been growing organically pretty fast since we started but especially for the last 4-5 years using a SaaS model, but now we’re at a scale where the opportunity is vast, especially with more people working from home,” he said. “We want to give ourselves firepower.”
Corporate learning has followed similar but not identical trajectory to that of online education for K-12 and higher learning. In common, especially in the last 8 months. has been a growing need to engage and connect with learners at a time when it’s been challenging, or in some cases impossible, to see each other in person.
What’s different is that corporate learning was already a very established market, with organizations widely investing in online tools to manage training and personal development for years before any pandemic necessitated it.
Areas like employee onboarding, personnel development, customer training, training on new products, partner training, sales development, compliance, and building training services that you then sell to third parties are all areas that count as corporate learning. One researcher estimated that the corporate learning market was valued at an eye-watering $64 billion in 2019, with LMS investments alone at over $9 billion that year, and both are growing.
That has been a boost for companies like LearnUpon, which provides services in all of those categories and says that annual recurring revenues have grown by more than 50% year-on-year for each of the last 12 quarters.
But that also underscores the challenge in the market.
“It’s definitely a very crowded space, with maybe over 1000 LMS’s out there,” said Noud, although he added that it only has about 10-15 actually direct competitors (which to me still sounds like quite a lot). They include the likes of Cornerstone, TalentLMS from the Greek startup Epignosis, the Candian publicly-traded Docebo, and 360Learning from France.
But also consider those that have moved into corporate learning from other directions. LinkedIn has made big moves into learning to complement its bigger recruitment and professional development profile; and companies originally built to target the education sector, such as Coursera and Kahoot, have also expanded into business training and education. Both represent further competitive fronts for companies like LearnUpon natively built to service the business market.
Noud said that one reason why LearnUpon is finding some traction against the rest of the pack, and why it’s better, is because it’s a more comprehensive platform. Users can run live or asynchronous (on-demand) learning or training, and the SaaS LMS is designed to handle material and learning environments for multiple “students” — be they internal users, partners of the organization, or customers. In contrast, he said that many other solutions are more narrow in their scope, requiring organizations to manage multiple systems.
“And the legacy platforms are overly bloated, with bad customer support, which was a key area for us,” he said, recalling back to eight years ago when he and co-founder Des Anderson were first starting LearnUpon. “Our first hire was in customer support, and that has carried through to how we have grown.”
One area where LearnUpon not doing anything right now is in content development. It does offer tools to construct tests and surveys, but users can also import content created with other e-learning authoring tools, Noud said. Similarly, it’s not in the business of building its own live teaching platforms: you can import links from others like Zoom to provide the platform where people will teach and engage.
That’s not going to be a focus for now for the company, but given that others it competes with are providing a one-stop shop, for those that are looking to simplify procurement and have a more direct hand in building training as well as managing it, you can see how this might be an area that LearnUpon might develop down the line.
“In today’s knowledge economy, we believe corporate learning has become a key requirement for all organizations of scale – and the added challenge of remote working has only accelerated the importance of delivering learning digitally,” said Antony Clavel, a Principal with Summit Partners, in a statement. “With its modern, cloud-based learning management system, strong product development organization, demonstrated dedication to customer success and capital efficient go-to-market model, we believe LearnUpon is strongly positioned to serve this growing and increasingly critical market need. We are thrilled to support Brendan and the LearnUpon team in this next phase of growth.”
Clavel is joining the LearnUpon Board of Directors with this round. The startup is not disclosing its valuation.
Brighteye Ventures, the European edtech venture capital firm, recently announced the $54 million first close of its second fund, bringing total assets under management above $112 million. Out of the new fund, the 2017-founded VC will invest in 15-20 companies over the next three years at the seed and Series A stage, writing checks up to $5 million.
Described as a thesis-driven fund investing in startups that “enhance learning” within the context of automation and other new technologies, coupled with changes in the way we live, Brighteye plans to disrupt the $7 trillion global education sector “as educators and students are adapting to distance learning en masse and millions of displaced workers are seeking to upskill,” according to a press release.
The firm’s investments to date include Ornikar, an online driving school in France and Spain serving more than 1.6 million students; Tandem, a Berlin-based peer-to-peer language learning platform with over 10 million members; and Epic!, a reading platform said to be used in more than 90% of U.S. schools.
To dig deeper into Brighteye’s thesis and the edtech sector more broadly, I caught up with managing partner Alex Latsis. We also discussed some of the findings in the firm’s recent European edtech funding report and how more venture capital than ever is set to flow into educational technology.
TechCrunch: Brighteye Ventures backs seed and Series A startups across Europe and North America that “enhance learning.” Can you elaborate a bit more on the fund’s remit, such as subsectors or specific technologies and what you look for in founders and startups at such an early stage?
Alex Latsis: We invest in startups that use technology to directly enable learning, skills acquisition or research as well as companies whose products address structural needs in the education sector. For example, Zen Educate addresses the systemic issue of teacher supply shortages in the U.K. via an on-demand platform that saves schools money whilst allowing educators to earn more. Litigate is an AI-driven coach and workflow tool improving results for legal associates, while Ironhack, the largest tech bootcamp in Europe and Latin America, gives young professionals the skills needed to enter the innovation economy and connects them to employers with a 90% job placement rate.
As education is a complex field we always seek to establish a degree of founder market fit, but more importantly that the founding teams themselves are a good fit internally. No startup succeeds on the merits of a founder alone, even if they may be driving the momentum.
In “The European EdTech Funding Report 2020,” you note that Europe is gaining momentum with a healthy increase in VC investments in local edtech startups. Specifically, you say that edtech VC investment has experienced 9.2x growth between 2014 and 2019 in terms of money invested. What is driving this and how does Europe compare to other major tech regions for edtech, such as Silicon Valley/U.S. or China?
Both Europe and the U.S. saw about 2% of venture capital invested in edtech in 2019. Growth in edtech investment in these markets to date has been driven largely by increased willingness to pay for training that is unavailable, unengaging or too expensive in legacy institutions and to a lesser extent by increased digital penetration in schools and universities that has enabled SaaS products to scale.
Given the rapid evolution of online education in the face of the pandemic, we expect funding for edtech will trend closer to 3%-5% of venture funding in the coming years on both sides of the Atlantic. This will mean billions in incremental investment, hundreds of new promising companies and incredible learning opportunities, particularly for those looking to upskill/reskill. In countries like India and China where school and university student populations are growing more rapidly, we expect 5%+ of VC funding to go into edtech as there is more growth in core demand.
Oxford scientists working out of the school’s Department of Physics have developed a new type of COVID-19 test that can detect SARS-CoV-2 with a high degree of accuracy, directly in samples taken from patients, using a machine learning-based approach that could help sidestep test supply limitations, and that also offers advantages when it comes to detecting actual virus particles, instead of antibodies or other signs of the presence of the virus which don’t necessarily correlate to an active, transmissible case.
The test created by the Oxford researchers also offer significant advantages in terms of speed, providing results in under five minutes, without any sample preparation required. That means it could be among the technologies that unlock mass testing – a crucial need not only for getting a handle on the current COVID-19 pandemic, but also on helping us deal with potential future global viral outbreaks, too. Oxford’s method is actually well-designed for that, too, since it can potentially be configured relatively easily to detect a number of viral threats.
The technology that makes this possible works by labelling any virus particles found in a sample collected by a patient using short, fluorescent DNA strands that act as markers. A microscope images the sample and the labelled viruses present, and then machine learning software takes over using algorithmic analysis developed by the team to automatically identify the virus, using differences that each one produces in terms of its fluorescent light emitted owing to their different physical surface makeup, size and individual chemical composition.
This technology, including the sample collection equipment, the microscopic imager and the flourescence insertion tools, as well as the compute capabilities, can be miniaturized to the point where it’s possible to be used just about anywhere, according to the researchers – including “businesses, music venues, airports,” and more. The focus now is to create a spinout company for the purposes of commercializing the device in a format that integrates all the components together.
The researchers anticipate being able to form the company, and start product development by early next year, with the potentially of having a device approved for use and ready for distribution around six months after that. It’s a tight timeline for development of a new diagnostic device, but timelines have changed already amply in the face of this pandemic, and will continue to do so as we’re unlikely to see if fade away anytime in the near future.
Savana, a machine learning-based service that turns clinical notes into structured patient information for physicians and pharmacists, has raised $15 million to take its technology from Spain to the U.S., the company said.
The investment was led by Cathay Innovation with participation from the Spanish investment firm Seaya Ventures, which led the company’s previous round, and new investors like MACSF, a French insurance provider for doctors.
The company has already processed 400 million electronic medical records in English, Spanish, German, and French.
Founded in Madrid in 2014, the company is relocating to New York and is already working with the world’s largest pharmaceutical companies and over 100 healthcare facilities.
“Our mission is to predict the occurrence of disease at the patient level. This focuses our resources on discovering new ways of providing medical knowledge almost in real time — which is more urgent than ever in the context of the pandemic,” said Savana chief executive Jorge Tello. “Healthcare challenges are increasingly global, and we know that the application of AI across health data at scale is essential to accelerate health science.”
Company co-founder and chief medical officer, Dr. Ignacio Hernandez Medrano, also emphasized that while the company is collecting hundreds of millions of electronic records, it’s doing its best to keep that information private.
“One of our main value propositions is that the information remains controlled by the hospital, with privacy guaranteed by the de-identification of patient data before we process it,” he said.
If you ever doubted the hunger brands have for more and better information about consumers, you only need to look at Twilio buying customer data startup Segment this week for $3.2 billion. Google sees this the same as everyone else, and today it introduced updates to Google Analytics to help companies understand their customers better (especially in conjunction with related Google tools).
Vidhya Srinivasan, vice president of measurement, analytics and buying platforms at Google, wrote in a company blog post introducing the new features that the company sees this changing customer-brand dynamic due to COVID, and it wants to assist by adding new features that help marketers achieve their goals, whatever those may be.
One way to achieve this is by infusing Analytics with machine learning to help highlight data automatically that’s important to marketers using the platform. “[Google Analytics] has machine learning at its core to automatically surface helpful insights and gives you a complete understanding of your customers across devices and platforms,” Srinivasan wrote in the blog post.
The idea behind the update is to give marketers access to more information they care about most by using that machine learning to surface data like which groups of customers are most likely to buy and which are most likely to churn, the very types of information marketing (and sales) teams need to try make proactive moves to keep customers from leaving or conversely turning those ready to buy into sales.
Image Credits: Google
If it works as described, it can give marketers a way to measure their performance with each customer or group of customers across their entire lifecycle, which is especially important during COVID when customer needs are constantly changing.
Of course, this being a Google product it’s designed to play nicely with Google Ads, YouTube and other tools like Gmail and Google Search, along with non-Google channels. As Srinivasan wrote:
The new approach also makes it possible to address longtime advertiser requests. Because the new Analytics can measure app and web interactions together, it can include conversions from YouTube engaged views that occur in-app and on the web in reports. Seeing conversions from YouTube video views alongside conversions from Google and non-Google paid channels, and organic channels like Google Search, social, and email, helps you understand the combined impact of all your marketing efforts.
All of this is designed to help marketers, caught in trying times with a shifting regulatory landscape, to better understand customer needs and deliver them what they want when they want it — when they’re just trying to keep the customers satisfied.
Atlassian has been offering collaboration tools, often favored by developers and IT for some time with such stalwarts as Jira for help desk tickets, Confluence to organize your work and BitBucket to organize your development deliverables, but what it lacked was machine learning layer across the platform to help users work smarter within and across the applications in the Atlassian family.
That changed today, when Atlassian announced it has been building that machine learning layer called Atlassian Smarts, and is releasing several tools that take advantage of it. It’s worth noting that unlike Salesforce, which calls its intelligence layer Einstein or Adobe, which calls its Sensei; Atlassian chose to forgo the cutesy marketing terms and just let the technology stand on its own.
Shihab Hamid, the founder of the Smarts and Machine Learning Team at Atlassian, who has been with the company 14 years, says that they avoided a marketing name by design. “I think one of the things that we’re trying to focus on is actually the user experience and so rather than packaging or branding the technology, we’re really about optimizing teamwork,” Hamid told TechCrunch.
Hamid says that the goal of the machine learning layer is to remove the complexity involved with organizing people and information across the platform.
“Simple tasks like finding the right person or the right document becomes a challenge, or at least they slow down productivity and take time away from the creative high-value work that everyone wants to be doing, and teamwork itself is super messy and collaboration is complicated. These are human challenges that don’t really have one right solution,” he said.
He says that Atlassian has decided to solve these problems using machine learning with the goal of speeding up repetitive, time-intensive tasks. Much like Adobe or Salesforce, Atlassian has built this underlying layer of machine smarts, for lack of a better term, that can be distributed across their platform to deliver this kind of machine learning-based functionality wherever it makes sense for the particular product or service.
“We’ve invested in building this functionality directly into the Atlassian platform to bring together IT and development teams to unify work, so the Atlassian flagship products like JIRA and Confluence sit on top of this common platform and benefit from that common functionality across products. And so the idea is if we can build that common predictive capability at the platform layer we can actually proliferate smarts and benefit from the data that we gather across our products,” Hamid said.
The first pieces fit into this vision. For starters, Atlassian is offering a smart search tool that helps users find content across Atlassian tools faster by understanding who you are and how you work. “So by knowing where users work and what they work on, we’re able to proactively provide access to the right documents and accelerate work,” he said.
The second piece is more about collaboration and building teams with the best personnel for a given task. A new tool called predictive user mentions helps Jira and Confluence users find the right people for the job.
“What we’ve done with the Atlassian platform is actually baked in that intelligence, because we know what you work on and who you collaborate with, so we can predict who should be involved and brought into the conversation,” Hamid explained.
Finally, the company announced a tool specifically for Jira users, which bundles together similar sets of help requests and that should lead to faster resolution over doing them manually one at a time.
“We’re soon launching a feature in JIRA Service Desk that allows users to cluster similar tickets together, and operate on them to accelerate IT workflows, and this is done in the background using ML techniques to calculate the similarity of tickets, based on the summary and description, and so on.”
All of this was made possible by the company’s previous shift from mostly on-premises to the cloud and the flexibility that gave them to build new tooling that crosses the entire platform.
Today’s announcements are just the start of what Atlassian hopes will be a slew of new machine learning-fueled features being added to the platform in the coming months and years.
Google is putting A.I. and machine learning technologies into the hands of journalists. The company this morning announced a suite of new tools, Journalist Studio, that will allow reporters to do their work more easily. At launch, the suite includes a host of existing tools as well as two new products aimed at helping reporters search across large documents and visualizing data.
The first tool is called Pinpoint and is designed to help reporters work with large file sets — like those that contain hundreds of thousands of documents.
Pinpoint will work as an alternative to using the “Ctrl + F” function to manually seek out specific keywords in the documents. Instead, the tool takes advantage of Google Search and its A.I.-powered Knowledge Graph, along with optical character recognition and speech-to-text technologies.
It’s capable of sorting through scanned PDFs, images, handwritten notes, and audio files to automatically identify the key people, organizations, and locations that are mentioned. Pinpoint will highlight these terms and even their synonyms across the files for easy access to the key data.
Image Credits: Google
The tool has already been put to use by journalists at USA Today, for its report on 40,600 COVID-19-related deaths tied to nursing homes. Reveal also used Pinpoint look into the COVID-19 “testing disaster” in ICE detention centers. And The Washington Post used it for a piece about the opioid crisis.
Because it’s also useful for speeding up research, Google notes Pinpoint can be used for shorter-term projects, as well — like Philippines-based Rappler’s examination of CIA reports from the 1970s or Mexico-based Verificado MX’s fast fact checking of the government’s daily pandemic updates.
Pinpoint is available now to interested journalists, who can sign up to request access. The tool currently supports seven languages: English, French, German, Italian, Polish, Portuguese, and Spanish.
Google has also partnered with The Center for Public Integrity, Document Cloud, Stanford University’s Big Local News program and The Washington Post to create shared public collections that are available to all users.
The second new tool being introduced today is The Common Knowledge Project, still in beta.
The tool allows journalists to explore, visualize and share data about important issues in their local communities by creating their own interactive charts using thousands of data points in a matter minutes, the company says.
Image Credits: Google
These charts can then be embedded in reporters’ stories on the web or published to social media.
This particular tool was built by the visual journalism team at Polygraph, supported by the Google News Initiative. The data for use in The Common Knowledge Project comes from Data Commons, which includes thousands of public datasets from organizations like the U.S. Census and the CDC.
At launch, the tool offers U.S. data on issues including demographics, economy, housing, education, and crime.
As it’s still in beta testing, Google is asking journalists to submit their ideas for how it can be improved.
Google will demonstrate and discuss these new tools in more detail during a series of upcoming virtual events, including the Online News Association’s conference on Thursday, October 15. The Google News Initiative training will also soon host a six-part series focused on tools for reporters in seven different languages across nine regions, starting the week of October 20.
The new programs are available on the Journalist Studio website, which also organizes other tools resources for reporters, including Google’s account security system, the Advanced Protection Program; direct access to the Data Commons; DataSet Search; a Fact Check Explorer; a tool for visualizing data using customizable templates, Flourish; the Google Data GIF Maker; Google Public Data Explorer; Google Trends; DIY VPN Outline; DDoS defense tool, Project Shield; and tiled cartogram maker Tilegrams.
The site additionally points to other services from Google, like Google Drive, Google Scholar, Google Earth, Google News, and others, as well as training resources.
Zoom, Microsoft Teams and Google Meet have become standard tools for teachers who have had to run lessons remotely since the start of the Covid-19 pandemic. But they’re not apps necessarily designed for classrooms, and that fact has opened a gap in the market for those looking to build something more fit to the purpose.
Today, a startup called Engageli is coming out of stealth with a service that it believes fills that need. A video conferencing tool designed from the ground up more as a digital learning platform, with its own unique take on virtual classrooms, Engageli is aiming first at higher education, and it is launching with $14.5 million in seed funding from a Benchmark partner and others.
If that sounds like a large seed round for a startup that is still only in pilot mode (you can contact the company by email to apply to join the pilot), it might be due in part to who is behind Engageli.
The startup is co-founded by Dan Avida, Serge Plotkin, Daphne Koller and Jamie Nacht Farrell. Avida is a general partner at Opus Capital who in the past co-founded (and sold, to NetApp) an enterprise startup called Decru with Plotkin, who himself is a Stanford emeritus professor. Koller is one of the co-founders of Coursera and also an adjunct professor at Stanford. And Farrell is a former executive from another pair of major online learning companies, Trilogy and 2U.
Avida and Koller, as it happens, are also married, and it was observing their kids in the last school year — when they were both in high school (the oldest is now in her first year at UC Berkeley) — that spurred them to start Engageli.
“The idea for this started in March when our two daughters found themselves in ‘Zoom School.’ One of them watched a lot of Netflix, and the other, well, she really improved her high scores in a lot of games,” he said wryly.
The problem, as he and Koller saw it, was that the format didn’t do a good enough job of connecting with individual students, checking in with them to make sure they were paying attention, understanding, and actually interested in what was being taught.
“The reason teachers and schools are using conferencing systems is because that was what was out there,” he said. But, based on the team’s collective experiences across past e-learning efforts at places like Coursera — which built infrastructure to run university courses for mass audiences online — and Trilogy and 2U (which are now one company that covers both online learning for universities and boot camps), “we thought we could build a better system from the ground up.”
Even though the idea was inspired by what the pair saw playing out with their high school-attending children, Engageli made the decision to focus first on higher education because that was where it was getting the most interest from would-be customers to pilot the service. But also, Avida believes that because higher ed not already has a big market for remote learning, it represents a more significant opportunity.
“K-12 schools will eventually go back to normal,” he said, “but we’re of the opinion that higher education will be a blend with more and more online learning,” one of the reasons also for the founding of the likes of Coursera, Trilogy and 2U. “Younger kids need face-to-face contact, but in college, many students are now juggling work, family and studying, and online can be much more convenient.”
Also there is a very practical selling point to providing better tools to university classrooms: “People pay those tuitions to have access to professors and other students, and this is a way to provide that in a remote world,” he said.
As it appears now, Engageli lets teachers build and run both synchronous (live) and asynchronous (recorded) lessons, giving students and teachers the option to catch up or replace a live lesson if necessary.
The startup’s idea is also to make it as easy to integrate into existing workflows as possible: no need to install special desktop or mobile apps, as the platform works in all major browsers, and Avida notes that it’s also designed to integrate with the software systems that many universities are already using to organise their educational content and track students’ progress. (Making the barrier to entry low is not a bad idea also considering that many institutions are already using other products, making them more entrenched and increasing the challenges of getting them to migrate to something else.)
But perhaps Engageli’s most unique feature is how it views the virtual classroom.
The platform lets teachers create “tables” where students sit together in smaller groups, where they can work together. With tables, the idea is that either an instructor — or in the case of large classes as you might get with university seminars, teaching assistants assigned to tables — can engage with students in a more personalised way.
When a class is delivered asynchronously (that is, recorded), it means that students sitting at a table can still partly be involved in a “live” experience where they can talk about the work with others in their groups. The tables are also opened up before a class starts, and students can go from one to the other to chat with others before the class begins.
On top of the tools that Engageli has built to record and consume lessons, it’s also building a set of analytics that lets professors (or their assistants) monitor how well audio and video and working both for themselves as we as for their audience, and also collect other kinds of “engagement” information, which could come in the form of getting people to ask or answer questions or take polls and other interactive media.
Together, these features create better feedback to make sure that everyone is getting as much out of the remote experience as possible.
Education has not always been one of the buzziest areas in the world of startups — it’s been something of a boring cousin to more headline-grabbing segments like social media, or those taking on giants like Amazon and Google.
But the pandemic has thrown a spotlight on the opportunities in the field, both to fill a sudden surge of demand for remote learning tools, and to create more innovative approaches to doing so, as Engageli is doing here.
Just yesterday, Kahoot — a platform for building and using gamified learning apps — raised $215 million from SoftBank; and other recent rounds have included Outschool (which raised $45 million and is now profitable), Homer (raised $50 million from an impressive group of strategic backers), Unacademy (raised $150 million) and the Indian juggernaut Byju’s (most recently picking up $500 million from Silver Lake).
On top of the recent spotlight on education, it’s also been interesting to see the proliferation of startups that are also coming out of the woodwork to provide new takes on videoconferencing.
Last week, a startup called Headroom — also started by already-successful entrepreneurs — launched with an AI-driven alternative to Zoom and the rest providing not just automatic transcriptions of conversations, but automatically generated highlights and insights into how engaging your webinars and other video content really was.
Apps like Headroom and Engageli are just the tip of the iceberg, with other innovative approaches also stepping out and raising significant funding. The big question will be whether they will get much attention and time from would-be customers who are already “happy enough” with what they already use.
But in a tech world that thrives on the concept of disruption and companies creating businesses out of simply being better approaches to entrenched markets, it’s a bet worth making.
“Dan, Serge and Daphne have repeatedly built fast-growing, extremely successful companies. I am so fortunate to be working with them again,” said Alex Balkanski, a partner at Benchmark who is investing individually, in a statement. “Investing in a company linked to education is incredibly important to me on a personal level, and Engageli has the potential to enable a truly transformative learning experience.”
Updated to clarify that Balkanski is investing privately, not through Benchmark.
Google launched version 4.1 of Android Studio, its IDE for developing Android apps, into its stable channel today. As usual for Android Studio, the minor uptick in version numbers doesn’t quite do the update justice. It includes a vast number of new and improved features that should make life a little bit easier for Android developers. The team also fixed a whopping 2370 bugs during this release cycle and closed 275 public issues.
The highlights of today’s release are a new database inspector and better support for on-device machine learning by allowing developers to bring TensorFlow Lite models to Android, as well as the ability to run the Android Emulator right inside of Android Studio and support for testing apps for foldable phones in the emulator as well. That’s in addition to various other changes the company has outlined here.
The one feature that will likely improve the quality of life for developers the most is the ability to run the Android Emulator right in Android Studio. That’s something the company announced earlier this summer, so it’s not a major surprise, but it’s a nice update for developers since they won’t have to switch back and forth between different windows and tools to test their apps.
Talking about testing, the other update is support for foldable devices in the Android Emulator, which now allows developers to simulate the hinge angle sensor and posture changes so their apps can react accordingly. That’s still a niche market, obviously, but more and more developers are now aiming to offer apps to actually support these devices.
Also new is improved support for TensorFlow Lite models in Android Studio, so that developers can bring those models to their apps, as well as a new database inspector that helps developers get easier insights into their queries and the data they return — and that lets them modify values white running their apps to see how their apps react to those.
Other updates include new templates in the New Project dialog that support Google’s Material Design Components, Dagger navigation support, System Trace UI improvements and new profilers to help developers optimize their apps’ performance and memory usage.
Grid AI, a startup founded by the inventor of the popular open-source PyTorch Lightning project, William Falcon, that aims to help machine learning engineers more efficiently, today announced that it has raised an $18.6 million Series A funding round, which closed earlier this summer. The round was led by Index Ventures, with participation from Bain Capital Ventures and firstminute.
Falcon co-founded the company with Luis Capelo, who was previously the head of machine learning at Glossier. Unsurprisingly, the idea here is to take PyTorch Lightning, which launched about a year ago, and turn that into the core of Grid’s service. The main idea behind Lightning is to decouple the data science from the engineering.
The time argues that a few years ago, when data scientists tried to get started with deep learning, they didn’t always have the right expertise and it was hard for them to get everything right.
“Now the industry has an unhealthy aversion to deep learning because of this,” Falcon noted. “Lightning and Grid embed all those tricks into the workflow so you no longer need to be a PhD in AI nor [have] the resources of the major AI companies to get these things to work. This makes the opportunity cost of putting a simple model against a sophisticated neural network a few hours’ worth of effort instead of the months it used to take. When you use Lightning and Grid it’s hard to make mistakes. It’s like if you take a bad photo with your phone but we are the phone and make that photo look super professional AND teach you how to get there on your own.”
As Falcon noted, Grid is meant to help data scientists and other ML professionals “scale to match the workloads required for enterprise use cases.” Lightning itself can get them partially there, but Grid is meant to provide all of the services its users need to scale up their models to solve real-world problems.
What exactly that looks like isn’t quite clear yet, though. “Imagine you can find any GitHub repository out there. You get a local copy on your laptop and without making any code changes you spin up 400 GPUs on AWS — all from your laptop using either a web app or command-line-interface. That’s the Lightning “magic” applied to training and building models at scale,” Falcon said. “It is what we are already known for and has proven to be such a successful paradigm shift that all the other frameworks like Keras or TensorFlow, and companies have taken notice and have started to modify what they do to try to match what we do.”
The service is now in private beta.
With this new funding, Grid, which currently has 25 employees, plans to expand its team and strengthen its corporate offering via both Grid AI and through the open-source project. Falcon tells me that he aims to build a diverse team, not in the least because he himself is an immigrant, born in Venezuela, and a U.S. military veteran.
“I have first-hand knowledge of the extent that unethical AI can have,” he said. “As a result, we have approached hiring our current 25 employees across many backgrounds and experiences. We might be the first AI company that is not all the same Silicon Valley prototype tech-bro.”
“Lightning’s open-source traction piqued my interest when I first learned about it a year ago,” Index Ventures’ Sarah Cannon told me. “So intrigued in fact I remember rushing into a closet in Helsinki while at a conference to have the privacy needed to hear exactly what Will and Luis had built. I promptly called my colleague Bryan Offutt who met Will and Luis in SF and was impressed by the ‘elegance’ of their code. We swiftly decided to participate in their seed round, days later. We feel very privileged to be part of Grid’s journey. After investing in seed, we spent a significant amount with the team, and the more time we spent with them the more conviction we developed. Less than a year later and pre-launch, we knew we wanted to lead their Series A.”
Language learning apps, like many educational technology platforms, soared when millions of students went home in response to safety concerns from the coronavirus pandemic. It makes sense: Everyone became an online learner in some capacity, and for non-frontline workers, each day became an opportunity to squeeze in a new skill (beyond sourdough).
So why not learn a new language in a low-lift way?
Language learning platforms, including Babbel, Drops and Duolingo, all have benefitted from quarantine boredom as shown by surges in their usage. However, success also depends on whether these same companies can turn that primetime interest into dollars and profit.
Von Ahn tells TechCrunch that Duolingo has hit 42 million monthly active users, up from 30 million in December 2019. The surge comes as new users are spending more time on the app in aggregate, for some of the reasons explained above. Duolingo has been steadily increasing in bookings over the past few years:
This year, Duolingo will hit $180 million in bookings, von Ahn estimates. The company discloses bookings as a proxy for revenue, because when someone purchases a subscription the app it is considered a “booking” until the completion of the subscription, when it becomes revenue.
“We’re more than breaking even,” von Ahn told TechCrunch.
While this growth is impressive, the most staggering metric that von Ahn revealed is that $180 million in bookings is only coming from 3% of its current users.
“Only 3% of our users pay us, yet we make more money than the apps where 100% of their users pay them,” he said.
As machine learning has grown, one of the major bottlenecks remains labeling things so the machine learning application understands the data it’s working with. Datasaur, a member of the Y Combinator Winter 2020 batch, announced a $3.9 million investment today to help solve that problem with a platform designed for machine learning labeling teams.
The funding announcement, which includes a pre-seed amount of $1.1 million from last year and $2.8 million seed right after it graduated from Y Combinator in March, included investments from Initialized Capital, Y Combinator and OpenAI CTO Greg Brockman.
Company founder Ivan Lee says that he has been working in various capacities involving AI for seven years. First when his mobile gaming startup, Loki Studios was acquired by Yahoo! in 2013, and Lee was eventually moved to the AI team, and most recently at Apple. Regardless of the company, he consistently saw a problem around organizing machine learning labeling teams, one that he felt he was uniquely situated to solve because of his experience.
“I have spent millions of dollars [in budget over the years] and spent countless hours gathering labeled data for my engineers. I came to recognize that this was something that was a problem across all the companies that I’ve been at. And they were just consistently reinventing the wheel and the process. So instead of reinventing that for the third time at Apple, my most recent company, I decided to solve it once and for all for the industry. And that’s why we started Datasaur last year,” Lee told TechCrunch.
He built a platform to speed up human data labeling with a dose of AI, while keeping humans involved. The platform consists of three parts: a labeling interface, the intelligence component, which can recognize basic things, so the labeler isn’t identifying the same thing over and over, and finally a team organizing component.
He says the area is hot, but to this point has mostly involved labeling consulting solutions, which farm out labeling to contractors. He points to the sale of Figure Eight in March 2019 and to Scale, which snagged $100 million last year as examples of other startups trying to solve this problem in this way, but he believes his company is doing something different by building a fully software-based solution.
The company currently offers a cloud and on-prem solution, depending on the customer’s requirements. It has 10 employees with plans to hire in the next year, although he didn’t share an exact number. As he does that, he says he has been working with a partner at investor Initialized on creating a positive and inclusive culture inside the organization, and that includes conversations about hiring a diverse workforce as he builds the company.
“I feel like this is just standard CEO speak but that is something that we absolutely value in our top of funnel for the hiring process,” he said.
As Lee builds out his platform, he has also worried about built-in bias in AI systems and the detrimental impact that could have on society. He says that he has spoken to clients about the role of labeling in bias and ways of combatting that.
“When I speak with our clients, I talk to them about the potential for bias from their labelers and built into our product itself is the ability to assign multiple people to the same project. And I explain to my clients that this can be more costly, but from personal experience I know that it can improve results dramatically to get multiple perspectives on the exact same data,” he said.
Lee believes humans will continue to be involved in the labeling process in some way, even as parts of the process become more automated. “The very nature of our existence [as a company] will always require humans in the loop, […] and moving forward I do think it’s really important that as we get into more and more of the long tail use cases of AI, we will need humans to continue to educate and inform AI, and that’s going to be a critical part of how this technology develops.”
Research papers come out far too rapidly for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers — particularly in but not limited to artificial intelligence — and explain why they matter.
The topics in this week’s Deep Science column are a real grab bag that range from planetary science to whale tracking. There are also some interesting insights from tracking how social media is used and some work that attempts to shift computer vision systems closer to human perception (good luck with that).
One of machine learning’s most reliable use cases is training a model on a target pattern, say a particular shape or radio signal, and setting it loose on a huge body of noisy data to find possible hits that humans might struggle to perceive. This has proven useful in the medical field, where early indications of serious conditions can be spotted with enough confidence to recommend further testing.
This arthritis detection model looks at X-rays, same as doctors who do that kind of work. But by the time it’s visible to human perception, the damage is already done. A long-running project tracking thousands of people for seven years made for a great training set, making the nearly imperceptible early signs of osteoarthritis visible to the AI model, which predicted it with 78% accuracy three years out.
The bad news is that knowing early doesn’t necessarily mean it can be avoided, as there’s no effective treatment. But that knowledge can be put to other uses — for example, much more effective testing of potential treatments. “Instead of recruiting 10,000 people and following them for 10 years, we can just enroll 50 people who we know are going to be getting osteoarthritis … Then we can give them the experimental drug and see whether it stops the disease from developing,” said co-author Kenneth Urish. The study appeared in PNAS.
It’s amazing to think that ships still collide with and kill large whales on a regular basis, but it’s true. Voluntary speed reductions haven’t been much help, but a smart, multisource system called Whale Safe is being put in play in the Santa Barbara channel that could hopefully give everyone a better idea of where the creatures are in real-time.
The system uses underwater acoustic monitoring, near-real-time forecasting of likely feeding areas, actual sightings and a dash of machine learning (to identify whale calls quickly) to produce a prediction for whale presence along a given course. Large container ships can then make small adjustments well-ahead of time instead of trying to avoid a pod at the last minute.
“Predictive models like this give us a clue for what lies ahead, much like a daily weather forecast,” said Briana Abrahms, who led the effort from the University of Washington. “We’re harnessing the best and most current data to understand what habitats whales use in the ocean, and therefore where whales are most likely to be as their habitats shift on a daily basis.”
Incidentally, Salesforce founder Marc Benioff and his wife Lynne helped establish the UC Santa Barbara center that made this possible.