To the best of my knowledge, Twitter flattens-out all accented latin letters and treats them the same, so... a = á = â = à = ä = ā = ã = å.
One possible way to clean a little bit your search results is to use Twitter's advanced search language operator lang:[xx] in negation -lang:[xx], where [xx] represents the 2 letter ISO language code of the languages which might be using that particular letter (assuming you wish to filter-out from the results).
In your example, the letter Ââ (circumflex) is used by the following languages: Sami, Romanian, Vietnamese, French, Frisian, Portuguese, Turkish, Walloon and Welsh. Assuming you wish to filter-out results from these specific languages, your Twitter search query would look like this:
To the best of my knowledge, Twitter flattens-out all accented latin letters and treats them the same, so...
a=á=â=à=ä=ā=ã=å.One possible way to clean a little bit your search results is to use Twitter's advanced search language operator
lang:[xx]in negation-lang:[xx], where[xx]represents the 2 letter ISO language code of the languages which might be using that particular letter (assuming you wish to filter-out from the results).In your example, the letter
Ââ(circumflex) is used by the following languages: Sami, Romanian, Vietnamese, French, Frisian, Portuguese, Turkish, Walloon and Welsh. Assuming you wish to filter-out results from these specific languages, your Twitter search query would look like this:try it...
Alternatively, you can use the same
lang:[xx]operator to limit Twitter's search results to one specific language (for example - English):try it...
This might not be a water-tight solution but it can reduce a lot of false positives.
Finally, you should keep in mind that Twitter is not guaranteeing accuracy in their machine-identification of languages.