Finding Messages Explicitly Marked as Spam in Gmail #
tl;dr: Search Gmail for “is:spam -label:^os
” to find messages that you manually marked as spam (as opposed to ones that Gmail automatically marked for you).
Gmail recently had a bug where some emails were accidentally moved to the trash or marked as spam. Google “encouraged” users that might have been affected to check their trash and spam folders for any messages that didn't belong. Since I get a lot of spam (one of the perks of having the same email address since 1996), I didn't relish the thought of going through thousands of messages to see if any of them were mislabeled¹.
I figured that Gmail must keep track of which messages were explicitly marked as spam by the user versus one that it automatically classifies (though I get a lot of spam, almost all of it is caught by Gmail's filters). Gmail (like Google Reader) keeps track of per-message state via internal system labels. For example, others have discovered that Gmail's Smart Labels are represented as ^smartlabel_type
labels while Superstars uses names like ^ss_sy
. Indeed, if you try to use a caret in a label name, Gmail says that it is not allowed.
It therefore seemed like a reasonable assumption that there was a system label that would tell us how a message came to be marked as spam. The problem was to figure out what it was called.
Thinking back to Reader (where all label operations went through an edit-tag HTTP API call, which listed the labels to added or removed), I figured I would see what the request was when marking a message as spam. Unfortunately, it looked like Gmail's requests were of slightly higher abstraction level, where marking a message as spam would send a request with an act=sp parameter (while marking as read uses act=rd, and so on).
I then figured I should look at HTTP response when loading the spam folder. There appeared to be a bunch of system label names associated with each message. One that I explicitly marked as spam had the labels:
"^a", "^ad_1391126400000", "^all", "^bsm"," ^clu_group", "^clu_unim", "^cob-processed-gmr", "^cob_pevent", "^oc_group", "^os_group", "^s", "^smartlabel_group", "^u"
Meanwhile, another that had been automatically marked as spam used:
"^ad_1391126400000", "^all"," ^bsm", "^clu_notification", "^cob-processed-gmr", "^oc_notification", "^os", "^os_notification", "^s", "^smartlabel_notification", "^u”
^s
was present on all of them, and indeed doing a search for label:^s
shows all spam messages (and the UI rewrites the search to in:spam
). Others could also be puzzled out based on name, for example ^u
is for unread messages. The more mysterious ones like ^cob_pevent
I figured I could ignore².
After looking at a bunch of messages, both automatically and manually marked as spam, ^os
stood out. It only seemed to be present on messages that Gmail itself had decided were spam. Doing the search is:spam -label:^os
seemed to show only messages that I had marked as spam. Indeed, each of the messages in the result displayed the header: "Why is this message in Spam? You clicked 'Report spam' for this message." Thus I was able to go through the much shorter list and see if any where mistakenly marked (they weren't).
Seeing the plethora of labels that were present on all messages, I got curious what other internal labels there were. Between examining HTTP responses, looking through Gmail's JavaScript for strings that start with ^
and a simple dictionary attack for two-letter names, here's some others that I've found (those that are marked as “unknown” are ones that match some messages in my account, but with no apparent pattern):
^a
: archived conversations^b
: chat transcripts (equivalent tois:chat
, presumably the “b” is for “Buzz”, Google Talk's codename)^f
: sent messages (equivalent tois:sent
)^g
: muted conversations (equivalent tois:muted
, the “g” is most likely for “ignore”)^i
: inbox (equivalent toin:inbox
)^k
: trashed messages (equivalent toin:trash
, unclear why “k” is the abbreviation)^o
: unknown^p
: messages that were marked as phishing attempts^r
: drafts (equivalent tois:draft
)^s
: spam (equivalent tois:spam
)^t
: starred messages (equivalent tois:starred
, the “t” is most likely for “to do”)^u
: unread messages (equivalent tois:unread
)^ac
: Google Buzz messages (equivalent tois:buzz
)^act
: Google Buzz messages (unclear how it's different from^ac
)^af
: unknown^bc
: unknown subset of chat transcripts^p_cc
: another unknown subset of chat transcripts^fs
: unknown^ia
: unknown^ii
: unknown^im
: unknown^iim
: Priority Inbox (based on Android's documentation)^mf
: unknown^np
: unknown^ns
: unknown^bsm
: unknown^op
: messages that were automatically marked as phishing attempts^os
: messages that were automatically marked as spam^vm
: Google Voice voicemails (equivalent tois:voicemail
)^pop
: unknown, seems to match some (very old messages) that I imported via POP^ss_sy
,^ss_so
,^ss_sr
,^ss_sp
,^ss_sb
,^ss_sg
,^ss_cr
,^ss_co
,^ss_cy
,^ss_cg
,^ss_cb
,^ss_cp
: Superstar stars^sl_root
,^smartlabel_promo
,_receipt
,_travel
,_event
,_group
,_newsletter
,_notification
,_personal
,_social
,_receipt
and_finance
: Smart Labels^io_im
: important messages (equivalent to is:important)^io_imc1
through^io_imc5
,^io_lr
: unknown, possibly more degrees of importance (“Info Overload” was the project that resulted in the importance filtering)^clu_unim
: unknown, possibly unimportant messages^unsub
and^hunsub
: messages where an unsubscribe link has been detected (when marking one as spam, the “In addition to marking this message as spam, you can unsubscribe...” dialog appears).^unsub
seems to be for messages where there's an unsubscribe link you have to click while^hunsub
is for ones where Gmail offers to unsubscribe on your behalf.^cff
: sender is in a Google+ circle (equivalent tohas:circle
)^sps
: unknown (no matches in my account, but it was referenced in the JavaScript next to^p
, if I had to guess I would say it's something related to spear phishing)^p_esnotif
: Google+ notifications ("es" presumably being "Emerald Sea", Google+'s code name)
- Of course, in deciding to automate this task, I doomed myself to spend more time that I would have if I'd just gone through the messages by hand.
- It's somewhat interesting to see how features that were developed later (like Smart Labels —
^smartlabel_group
) use longer system label names than ones of medium age (like Superstars —^ss_sy
) which are in turn longer than the original system labels (^u
for unread, etc.). Bytes 10 years ago were clearly more precious.
16 Comments
Since you're so skilled on Gmail label topic... you know a method to filter for messages without no label? I searched without success for months...
Ciao.
Ivano
^pop also finds a lot of messages in my account, including very recent ones. The sent messages with this label are those I sent through desktop outlook. Not sure about the ones I received and have the label...
I'm making a spreadsheet: https://docs.google.com/spreadsheets/d/1BS8yazyPcfqbWMG2jQb8HvPvCvQsDNQ3tsp_pBh5P6Q/edit#gid=0
I hope you don't mind using your discoveries this way?
I think I've discoverd some more labels' meanings, I'll try to find some new labels too.
E.g. ^iim appears to be a Priority Inbox label.
Post a Comment