Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 917609 - app-text/calibre-6.29.0 phones home
Summary: app-text/calibre-6.29.0 phones home
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Zac Medico
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-20 02:21 UTC by Mark Harmstone
Modified: 2024-01-20 17:56 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Harmstone 2023-11-20 02:21:58 UTC
Calibre downloads https://code.calibre-ebook.com/latest on startup, ostensibly to check for the latest version, with no option to turn this off. I think this behaviour probably ought to be behind a "telemetry" USE flag, if not disabled entirely.

Reproducible: Always
Comment 1 John Helmert III archtester Gentoo Infrastructure gentoo-dev Security 2023-11-20 02:32:46 UTC
If it's supposed to go behind a use flag, and upstream doesn't have a way to turn it off, what is the USE supposed to do? Have you asked upstream about adding such a toggle?
Comment 2 Ionen Wolkens gentoo-dev 2023-11-20 02:41:29 UTC
From a quick grep there's:

    parser.add_option('--no-update-check', default=False, action='store_true',
            help=_('Do not check for updates'))

And check is not really useful when when updates are handled by the package manager. Aka it should ideally just be always disabled rather than a USE.

For kitty (same upstream) there's *is* a build-time switch to disable the update checking, and few months ago recall it's been disabled by default for anything but official builds. Guess calibre been handling this differently though.
Comment 3 Eli Schwartz 2023-11-20 02:45:07 UTC
Historically, the packaging has copied, or I should say, cargo-culted, a debian patch. The comment in the ebuild:

# Don't prompt the user for updates - they've installed via an ebuild.

Unfortunately, this change also prevented prompting the user for plugin updates, and failing to prompt for plugin updates is a robustness issue, leaves users prone to bugs, and is also, potentially, a security vulnerability depending on just what kind of outdated plugins you have. (Not sure how likely this actually is.)

Checking the latest version on startup isn't inherently a sketchy thing to do. It's not the most useful thing ever, when you have multiple channels to be notified of updates (the other being portage), but it's not particularly harmful. And the update notification includes a checkbox to not nag you about this ever again. (It will still let you know when plugins can be updated.)

I'm pretty opposed to turning off this check if it has *any* influence on plugins. The original patch, being copied from Debian, had no qualms about enforcing a negative user experience for plugins, mostly because the debian developer that wrote the original patch didn't like plugins and figured they were "security vulnerabilites" because they didn't use the https://www.pling.com/ protocol for downloading plugins (I do not know why on earth said debian developer felt so passionately in support of this).



> I think this behaviour probably ought to be behind a "telemetry" USE flag, if not disabled entirely.

Can you elaborate in greater detail why you rate this as a telemetry concern?
Comment 4 Mark Harmstone 2023-11-20 02:45:53 UTC
I sent Kovid a polite message about this, and he sent me an extremely unprofessional response a few minutes later, calling me a moron amongst other things. So he's not sympathetic (in either senses of the word).

From src/calibre/gui2/update.py:
    headers={
        'CALIBRE-VERSION':__version__,
        'CALIBRE-OS': ('win' if iswindows else 'osx' if ismacos else 'oth'),
        'CALIBRE-INSTALL-UUID': prefs['installation_uuid'],
        'CALIBRE-ICON-THEME': icon_theme_name,
    }

It does do more than check for an update, it also returns version, OS, and unique ID.
Comment 5 Eli Schwartz 2023-11-20 02:47:38 UTC
And have you read the additional details at https://calibre-ebook.com/dynamic/calibre-usage ?
Comment 6 Ionen Wolkens gentoo-dev 2023-11-20 02:49:59 UTC
Ah, if it has an impact on plugins guess I'll take back what I said then.
Comment 7 Mark Harmstone 2023-11-20 02:53:18 UTC
(In reply to Eli Schwartz from comment #5)
> And have you read the additional details at
> https://calibre-ebook.com/dynamic/calibre-usage ?

What do you mean, that he promises not to do anything nefarious?

IIRC collecting even anonymized data without an opt-in breaches the GDPR in the UK and EU.
Comment 8 Jesse Adelman 2023-12-05 19:09:10 UTC
(In reply to Mark Harmstone from comment #7)
> (In reply to Eli Schwartz from comment #5)
> > And have you read the additional details at
> > https://calibre-ebook.com/dynamic/calibre-usage ?
> 
> What do you mean, that he promises not to do anything nefarious?
> 
> IIRC collecting even anonymized data without an opt-in breaches the GDPR in
> the UK and EU.

I believe Kovid Goyal and the other Calibre contributors have good intentions, but not having an opt-in for centralized data collection is not in the spirit (or the letter, perhaps, but IANAL) of the GDPR, or general privacy-focused practices.

I'd even question how much a "randomly generated ID" as the "calibre-usage" page says that *never expires* is really much of a protection at all. I mean, the threat model is relevant here - but if their database is exposed, then basically every installation of Calibre can be linked back to the data collected anyway if a person's computer is seized, if I'm understanding their model correctly.

Can Gentoo do anything about this without the upstream creating hooks to disable this without damaging plugin updates? Probably not without a lot of labor.

I think even Gentoo had some debate about this sort of data collection in it's distant past - the "Gentoo Stats" project, IIRC?

Cheers.
Comment 9 Zac Medico gentoo-dev 2023-12-05 20:18:54 UTC
If we patched out the headers shown in comment #4, it seems like it would just request the version from https://code.calibre-ebook.com/latest and operate normally.

The headers seem pretty harmless since they only link an anonymous CALIBRE-INSTALL-UUID to some seemingly benign data. I suppose in the worst case it might be used to track a person's location though, if you were somehow able to decrypt the transmissions and also link a person to a CALIBRE-INSTALL-UUID. Maybe this is enough to convince Kovid to add an opt-in or opt-out for some or all of the headers.
Comment 10 Mark Harmstone 2023-12-05 20:31:07 UTC
> The headers seem pretty harmless since they only link an anonymous CALIBRE-INSTALL-UUID to some seemingly benign data. I suppose in the worst case it might be used to track a person's location though, if you were somehow able to decrypt the transmissions and also link a person to a CALIBRE-INSTALL-UUID. Maybe this is enough to convince Kovid to add an opt-in or opt-out for some or all of the headers.

Or if his servers get breached. I'm not comfortable with someone tracking which IP addresses I've been using. It was only by chance that I realized what he was doing.

I e-mailed Kovid, and he has zero interest in changing any aspect of this.
Comment 11 Eli Schwartz 2023-12-05 20:33:42 UTC
(In reply to Mark Harmstone from comment #7)
> What do you mean, that he promises not to do anything nefarious?


That is kind of the definition of every privacy policy, surely...


(In reply to Mark Harmstone from comment #7)
> IIRC collecting even anonymized data without an opt-in breaches the GDPR in
> the UK and EU.


I'm not sure what point you're trying to make here.

Is this GDPR violation something calibre upstream should be worried about? Is the GDPR concern, specifically, what you say you "sent Kovid a polite message about"?

Is this GDPR violation something *I*, as a distro maintainer for calibre, should be worried about? Personally, I'm not worried about it since a) I am not collecting your data, anonymized or otherwise, b) I don't live in the EU and don't plan on it in the future, so the local laws of a place that has no impact on my life do not matter to me, c) gentoo empowers you to place a patch in /etc/portage/patches if you would like to not get notifications for security updates in plugins.

...

Frankly, I feel like a decision about telemetry should be based on what's good for the package, not whether you can say "but the EU will charge you with a crime if you don't do like I say".

Which is exactly what it feels like to me when you start citing foreign laws at me because I *didn't* apply a non-default patch to software I didn't write.
Comment 12 Eli Schwartz 2023-12-05 20:34:52 UTC
(In reply to Jesse Adelman from comment #8)
> I'd even question how much a "randomly generated ID" as the "calibre-usage"
> page says that *never expires* is really much of a protection at all. I
> mean, the threat model is relevant here - but if their database is exposed,
> then basically every installation of Calibre can be linked back to the data
> collected anyway if a person's computer is seized, if I'm understanding
> their model correctly.


But that is I think the crux of the issue. Because calibre upstream claims that they don't store such information anyway. They only record the unique randomly generated IDs themselves, and even the IP addresses aren't actually stored, merely used to update the aggregate count of users per version/country/OS and then dropped. The unique ID is stored in order to prevent double-updating the stats.

If a person's computer is seized *and* the calibre server databases are seized, then someone could compare the randomly generated ID stored as a config key in ~/.config/calibre/global.py.json as "installation_uuid", to the server database. They can then find out... what? That it is one of 3 million UUIDs stored as floating, standalone records in the server database?


The only actual telemetry concern, IMO, is if the calibre usage statistics page is untruthful, and contrary to its claim the server is actually storing IP addresses per UUID. This does not matter if a person's computer is seized, because in that case whoever seizes it already knows much more about you than calibre can ever expose, but if only the server is seized then the IP addresses could be compared with other information leaks from, idk, social media accounts, to prove that a specific person uses calibre (and whether that person is a Windows, macOS, or Linux user).


(In reply to Jesse Adelman from comment #8)
> Can Gentoo do anything about this without the upstream creating hooks to
> disable this without damaging plugin updates? Probably not without a lot of
> labor.


Well, essentially it would involve creating those hooks and then not submitting them upstream as a patch.

For the record, I'm not interested in writing that patch myself but if upstream adds a hook to allow distros to disable that, then I'd be open to using the exposed build system option to do so.

It wouldn't actually stop calibre from connecting to the plugin index on every startup to check for updates (exposing)


(In reply to Jesse Adelman from comment #8)
> I think even Gentoo had some debate about this sort of data collection in
> it's distant past - the "Gentoo Stats" project, IIRC?


Possibly, though I'm not familiar with the background.
Comment 13 Mark Harmstone 2023-12-05 20:46:21 UTC
Eli, I was saying "this is unethical, and also potentially illegal in some jurisdictions".

A leak of Kovid's database would mean that someone would be able to say which IP addresses were used by the same user. You don't need access to the user's home directory for this to be a problem.
Comment 14 Eli Schwartz 2023-12-06 13:30:53 UTC
(In reply to Mark Harmstone from comment #13)
> A leak of Kovid's database would mean that someone would be able to say
> which IP addresses were used by the same user. You don't need access to the
> user's home directory for this to be a problem.

It is important to qualify this statement.

After "would mean that", insert "if his privacy policy is lying about how much data is recorded in the database".

I acknowledge that your previously stated opinion on privacy policies of people that have been rude to you, is:

> [sarcastic tone of voice: vague and unprovable
> promises not to do anything nefarious?

I'm not (yet) convinced that that is enough data to conclusively prove there is a privacy policy violation problem here.

In the absence of a definite issue I would rather reduce (preferably to 0) the number of soft-forked local patches, rather than grow them.