White Men Okay, Black Women Not: The Problem With Photo Algorithms

Let’s make sure algorithms don’t stay as racists.

Facial recognition technology has come a long way from its early days.

It is not more pervasive than ever.

More and more consumer products along with law enforcement agencies are making use of it.

Increasingly powerful and robust machine learning algorithms are backing it for more accuracy.

However, a recent test of some facial analysis commercially available services from companies such as Microsoft and IBM has raised some concerns.

These concerns center on the fact that the artificial intelligence systems which are busy scrutinizing people’s features have more competence and accuracy when it comes to people with white skin than black skin.

When researchers ran some tests that featured face-analysis services from the likes of IBM and Microsoft which were actually supposed to help them identify the exact gender of a person present in a photo, they saw some interesting results.

More precisely, the algorithms that these companies had used in their services provided pretty close to perfect at tasks such as looking at an image and identifying the person as a male with a lighter skin tone.

However, these same algorithms didn’t do so well (gave some errors) when it came to analyzing the images that had dark-skinned women in them.

What does this actually show?

According to some, this shows skewed accuracy.


And the reason for the fault is due to the fact that the training data that scientists used to actually create these face-analysis computer algorithms did not represent the darker skin tones appropriately.

But this isn’t something unique.

In fact, it is actually a disparity that researchers are slowly starting to see in a growing collection of AI system bloopers.

Such AI systems seem to have successfully searched for and picked up various societal biases that exist around specific groups of people.

For example, it is interesting to see that the photo-organizing online services from Google still don’t back away from censoring various search terms like monkey and gorilla nearly three years after an incident took place where algorithms in these services tagged the term gorilla to pictures with black people.

This obviously raises questions on how AI researchers can make sure that various machine-learning algorithms and systems which companies deploy in many of their consumer products, government programs, and commercial systems don’t have these biases.

Such questions are slowly forming the major portion of discussions on the use of AI in services.

Georgetown published a report back in 2016 which described an extensive and largely unregulated official deployment of various facial recognition technologies by organizations such as the FBI along with other state and local police forces.

The report also found out that the evidence in these AI systems that were in use were far less accurate for people with darker skins and/or African Americans.

Now researchers such as Timnit Gebru (a researcher working for Microsoft and a grad student at Stanford) along with Joy Buolamwini of MIT Media Lab have come out with a new study where they fed a given facial-recognition AI system with around 1300 photos that consisted of parliamentarians from regions such as Africa and Europe.

Researchers chose the photos in such a manner which would allow them to appropriately represent a wide spectrum of skin tones (related to humans) with the use of Fitzpatrick scale.

What is the Fitzpatrick scale?

It is a classification system from dermatology.

Researchers working on the report have planned to present their finding at the FAT conference sometime later this month (The FAT in FAT conference stands for Fairness, accountability, and Transparency in various computer algorithmic systems).

Researchers made use of the image collection to actually test commercially available cloud services which searched for and looked at faces in different photos from companies such as, IBM, Microsoft, and Face++ (a division of Megvii, a startup based in Beijing).

The major focus of the researchers’ analysis was on the gender detecting features of all the above-mentioned services.

What researchers found was that all the three services that used facial-recognition algorithms worked more accurately when they saw male faces rather than female faces.

Moreover, the algorithms used in these facial-recognition services also performed worse on darker faces than lighter faces.

Researchers also found out that the facial-recognition algorithms from all the three services faced particular problems when it came to correctly recognizing photos which contained darker-skinned women as women.

In other words, the algorithms had trouble in looking at women with dark skin and recognizing them as women.

When researchers used Microsoft’s service to analyze an image set that only contained male faces with the lightest color, the service identified each and every single man in those photos correctly.

On the other hand, the algorithms that IBM used in its facial-recognition service had an overall success rate of 99.7 percent.

In other words, only a 0.3 percent error rate.

When researchers asked Microsoft’s service to analyze faces belonging to darker female pictures it reported a 21 percent error rate.


Other facial-recognition services from Mevii Face++ and IBM both posted error rates of 35 percent.

To be fair to Microsoft though, the company (via an official statement) has stated that it had planned on taking steps in order to improve its facial-recognition technology in terms of its accuracy.

Moreover, the company also announced that it had started to invest more resources in order to improve its machine-learning training datasets.

The statement from the company said that Microsoft believed in the fairness of artificial intelligence technologies as a critical issue for the whole AI industry.

Microsoft also said that the company took such issues very seriously.

However, the company simply refused to come up with answers to questions about whether the company’s face-recognition and/or face-analysis service had gone through rigorous testing for accurate performance on various groups with different skin tones.

When reporters contacted IBM, the company’s spokesperson said that IBM had plans to deploy an updated version of the company’s older facial-recognition service a few weeks into the future.

Media outlets have also found out that IBM had incorporated an audit’s latest findings into its already existing plan to further bulk up its upgrade efforts.

IBM, as a company, has also managed to create a new dataset of its own in order to comprehensively test the accuracy of its machine learning algorithms on various different skin tones.

Recently IBM published a white paper that said the company had launched tests that made use of new datasets.

And reports from those tests found that the company had improved its facial-recognition service’s gender-detection error rate come down to around 3.5 percent for all female faces with darker skin.

That may sound great.

And it probably is.

But it is still pretty much worse than the error rate of the facial-recognition service (around 0.3 percent) for male faces with lighter skin tone.

As compared to other error rates in the same study, the 3.5 percent mark is about one-tenth of previous ones.

When reporters approached Megvii for a response on the issue, it did not oblige with a comment.


But why is facial-recognition technology getting so much attention from the media lately?

Surely, there are other areas where researchers and companies are putting in an effort to offer AI related service?

Well, as it turns out, services that provide features backed up by machine learning algorithms and on-demand have actually managed to become a red-hot area of hyper-competition, especially among large technology firms.

Technology giants such as Amazon, Google, IBM and Microsoft regularly pitch their cloud services for various tasks such as parsing the exact meaning of text or images as a feasible way for other industries like healthcare, sports and manufacturing to easily tap into advanced artificial intelligence capabilities.

Only technology companies themselves had access to such artificial intelligence capabilities before.

Of course, there is a flip side as well.

When customers buy these artificial intelligence capabilities, they also buy into all the limitations that come with this AI service.

The worst part is that these limitations may not become apparent at first.

Pivothead, a startup that works on smart glasses to help out visually impaired people, recently become a customer of Microsoft-offered AI services.

The startup makes use of the company’s cloud vision services in order to enable a synthetic voice to try its best and describe the facial expression and age of the people nearby.

Click here to view a video of the aforementioned project.

Microsoft collaborated with Pivothead in order to show people how the company’s glasses helped the visually impaired man to understand what was in his surroundings as he attempted to walk down a street in London with the help of a white cane.

It is clear at one specific point in the video, the glasses tell the man that it thinks there is a man in front of the customer who is jumping in the air and doing some tricks with the help of a skateboard.

This happens exactly when a young white man with the skateboard zips past the man using the glasses.

Now, the only problem that needs to be discussed here is the fact that an audit of the vision services from Microsoft has suggested that such pronouncements (like the glasses telling the man of a kid with a skateboard in front of him) could actually be less accurate if the kid on the skateboard had a darker skin color.

If one reads the technical documentation that Microsoft has provided for the service, it clearly mentions that the service’s gender detection, as well as all the other attributes the service has the ability to report for various faces like age and emotion, were still in their experimental stages.

Moreover, it also tells the readers that these attributes along with gender detection rate may not be that accurate.

The chief data scientist for the United States of America under the Obama administration, DJ Patil recently said that the findings of all the latest studies highlighted the immediate need for technology companies to make sure that their machine learning systems had the ability to work well equally for all kinds of people.

DJ Patil also suggested that purveyors of all such artificial intelligence-enabled services that were test should exercise more openness in terms of the limitations of their facial-recognition services that they offer under the fairly shiny banner of machine learning and artificial intelligence.

He also said that technology companies had this tendency to slap the currently in-demand label of artificial intelligence and/or machine learning on everything they offer.

But customers and users really had no way of knowing about the boundaries of such systems and how well the system worked within those boundaries.

He also said that the community needed that level of transparency where companies showed more of where the machine learning or artificial intelligence systems worked best and where they did not work as intended.

Gebru and Buolamwini’s research paper also argued that even if technology companies only disclosed a full suite of various accuracy numbers belonging to different groups of people, that would give customers and users a true sense of the actual capabilities of the offered image processing and facial-recognition software applications that work to scrutinize different people.

It is also good to know that another forthcoming white paper from IBM on all the changes that the company had made to its offerings in the facial analysis department would include such type of information for customers and users.

Researchers who managed to force such a response from certain technology companies also have this hope that other technology companies would learn from their research paper and start their very own audits regarding their machine learning and artificial intelligence systems.

Researchers will also make available the collection of text and images that they made use of in order to test all the companies’ cloud services.

This way other researchers would also have the opportunity to study them and then using them in their own projects.

As a company, Microsoft has made appreciable efforts in order to position itself as not only as a cloud services company but also as the sole leader when it comes to thoughts about machine learning and its ethics.

Currently, the company has hired many area-specific researchers to work on ethical machine learning and such topics.

The company even has a ethics panel internally.

Microsoft calls it Aether.

Aether actually stands for Artificial Intelligence and Ethics in Engineering and Research.

About a year ago (back in the year 2017) the company involved itself in a complete audit which eventually discovered that its cloud service that had the responsibility of analyzing facial expression did not perform up to the standards when it came to children under a specific age.

More investigation into the matter revealed that the data researchers used to train their machine learning algorithms had shortcomings.

After finding out the fault, researchers fixed the service.

More on detecting hidden bias in machine learning algorithms


As mentioned before, about three years ago, Google Photos actually did not have the ability to separate black people from Gorillaz since it labeled black people as gorillas.

However, now the service has improved enough to not use labels such as gorillas while analyzing photos.

Some say it is not really an improvement.

The incident actually took place in 2015 when a block software engineer managed to embarrass the technology giant Google with a tweet that stated the company’s then Google Photos service had, in reality, labeled images of his black friend and himself as gorillas.

The technology giant tried to save face by declaring that it genuinely felt sorry and was appalled at the results.

Another engineer who later on became the company’s public face in terms of cleaning up the service with an extensive operation stated that the company’s Photos service would no longer have the ability to apply the label gorilla to images or groups of images.

He also said that Google, as a company, had already started work to improve and fix some long-term issues with the service.

At the start of 2018, almost three years since the incidence, the company came out with a fix that erased the label gorillas from the service.

It also erased labels of some other primates directly from the Google Photos’ lexicon.

But one can’t ignore the fact that it is indeed an awkward workaround.

It illustrates the kind and scale of difficulties that technology companies like Google have to face in artificial intelligence advanced image-recognition technology.

Technology companies have big hopes for image-recognition technologies because they want to use it in everything from personal assistants and other things to self-driving cars.

As it turns out, WIRED took the opportunity to actually test Google Photos with the help of more than 40,000 images.

They stocked the collection with lots of animal photos as well.

WIRED reported that Google Photos performed reasonably well when it came to finding various types of creatures.

The service impressed them by recognizing poodles and pandas.

However, WIRED also found out that Google Photos now reported “no results” for certain types of search terms.

Those search terms included words like,

  • Monkey
  • Chimpanzee
  • Chimp
  • gorilla

To read more about this store and how that may affect just over 500 million users who use Google Photos on their desktop machines, mobile devices and through the company’s web interface, click here.


Zohair A. Zohair is currently a content crafter at Security Gladiators and has been involved in the technology industry for more than a decade. He is an engineer by training and, naturally, likes to help people solve their tech related problems. When he is not writing, he can usually be found practicing his free-kicks in the ground beside his house.
Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.