Yet more Alien_Txtbase logs shared

DAK · April 27, 2025

TL;DR: A number of discussions have been had regarding the stealer log data dump known as Alien_Txtbase. One of these analyses was performed by Specops Software on March 27, 2025. You can use that writeup to compare to the new data. Before Breach Forums was taken down yet again, a number of new records were offered by a forum member, totalling about 126m rows. This data was not explicitly mentioned as more Alien_Txtbase data, however the files were named the appropriate filename, with the Alien_Txtbase header, consistent with previous releases. We will now perform an analysis of the data to investigate how real the threat is (and discuss the records therein).

The offending Breach Forum post where the data was listed

The files have the following headers, advertising the telegram channel; as the intent of these datasets is to drive paying traffic to closer to real-time log purchases:

|=====================================================================================|
|      ___   _     _____ _____ _   _   _______   _____________  ___   _____ _____     |
|     / _ \ | |   |_   _|  ___| \ | | |_   _\ \ / /_   _| ___ \/ _ \ /  ___|  ___|    |
|    / /_\ \| |     | | | |__ |  \| |   | |  \ V /  | | | |_/ / /_\ \\ `--.| |__      |
|    |  _  || |     | | |  __|| . ` |   | |  /   \  | | | ___ \  _  | `--. \  __|     |
|    | | | || |_____| |_| |___| |\  |   | | / /^\ \ | | | |_/ / | | |/\__/ / |___     |
|    \_| |_/\_____/\___/\____/\_| \_/   \_/ \/   \/ \_/ \____/\_| |_/\____/\____/     |
|                                                                                     |
|                JOIN TELEGRAM TXTBASE:                                               |
|                JOIN TELEGRAM TXTBASE:            SNIP                               |
|                JOIN TELEGRAM TXTBASE:                                               |
|          _________________________________________________________________          |
|               ▼ BUY PRIVATE SUBSCRIPTION ON OUR 7/24 ONLINE SHOP BOT ▼              |
|                                                                                     |
|                                                                                     |
                                         SNIP

So uh, that’s not overly hard to determine attribution.

The Delta

As we are interested in the passwords, and the patterns therein, we split the passwords off the records into its own file (most records are the format url:username:password, with some ~ username:password:url; we’ll simply discard the latter for speed of processing).

Since it’s clear that the data is from the same source, a delta was taken between the previous dataset known as Alien_Txtbase and these new 126m records, resulting in a count of 51,571,780; representing ~ 51m passwords that were not in the previous release as consumed by HIBP. This will allow us to discuss only the new records.

The Base Words

Base Word Count
qq.com 25129
guruku.id 12939
user 10666
gmail.com 6197
alex 5965
admin 5915
aruba.it 5310
ahmed 4919
daniel 4246
david 3951

We can disregard a couple of the records due to simply the result of dealing with the always clobbered formatting of credential lists that get posted on forums such as breached; they’re always clobbered to shit. But once you disregard those, you see a pretty standard set of base words (where basewords are the special characters and numbers stripped off); ie a baseword of admin could come from a password admin123. Nothing exciting here, given it’s stealer logs, and people are garbo at generating their own memorable passwords, you get what you see here.

Password Lengths

Length Count (and percentage)
10 4702923 (9.12%)
11 3612608 (7.01%)
8 3516211 (6.82%)
9 3399076 (6.59%)
12 2746051 (5.32%)
22 2246982 (4.36%)
23 2159239 (4.19%)
21 2142296 (4.15%)
13 2141544 (4.15%)
20 2041611 (3.96%)
24 2012335 (3.9%)
7 1948241 (3.78%)

Generally, everything longer than 20 characters is a result of the aformentioned clobbered formatting resulting in email addresses getting rammed into the dataset; so as is often tradition, so long as you’re using sufficiently long passwords and being good and following NIST 800-63B, there is minimal risk of re-use for organizations.

Character Set Distribution

Character Types Count
loweralphaspecialnum 11685903 (22.66%)
loweralphaspecial 10055146 (19.5%)
loweralphanum 7123899 (13.81%)
numeric 5905319 (11.45%)
loweralpha 5322840 (10.32%)
mixedalphanum 2751838 (5.34%)
mixedalpha 2273913 (4.41%)
mixedalphaspecialnum 2147456 (4.16%)
mixedalphaspecial 1203950 (2.33%)
upperalphanum 1163896 (2.26%)
specialnum 902512 (1.75%)
upperalpha 420664 (0.82%)
upperalphaspecialnum 304816 (0.59%)
upperalphaspecial 155093 (0.3%)

The complexity distribution isn’t phenomenal, for example, loweralphaspecialnum would be a password such as password123!, meaning no mixed casing. And knowing the distribution of lengths, it’s also not an amazing look. This is naturally a side-effect of the human nature of creating simple and easy to remember passwords. This is why it’s so important to enforce commonly agreed upon password complexity standards and lengths; see: NIST 800-63B while it still exists and before NIST gets completely dismantled.

Some Domain Samples

For curiosity’s sake we’ll take some samples of the domains that are impacted to highlight what organizations should be concerned about rotating credentials.

Domain Count
irs.gov 5378
pornhub.com 52313
proton.me 25004
onlyfans.com 33910
ashleymadison.com 7618

Since stealer logs pull their content from the saved credential store of browsers, the involved domains obviously trend towards erm, consumer-facing sites. You’ll see odd pockets of government, financial, and so on when a user didn’t have good security posture and saved a work account’s credentials; but generally it’s consumer, which leads to it also being a little spicy. This does however also reduce the risk of impact down to reasonably inconsequential personal accounts, which is great news.

Conclusion

The final drop of Alien_Txtbase telegram data to Breach Forums follows the same patterns as the big one that hit the media; largely consumer-facing sites just by virtue of where the data comes from, and largely thankfully inconsequential to corporate environments.

It is worth noting that with the closure of Breached Forums, the threat is not gone, it’s simply cast into the wind to find other forums to share data. More on that later.

For an organization or application that’s following NIST 800-63B as they should and forcing complexity and a correct 12+ characters minimum length (and hopefully using a breached corpus) you’d just dodge any re-use of these common passwords. Strong MFA (preferably multiple factors, ala Specops Authentication) should be enabled where possible, and users should be taught never to share a multi-factor code with another user or their servicedesk.

Same old poorly processed crap scraped from personal machines. Stop using browser password stores, and use a password manager such as Bitwarden instead.

Twitter, Facebook