Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We know the city where HIV first emerged (bbc.com)
185 points by Perados on Nov 20, 2015 | hide | past | favorite | 47 comments


Simian AIDS was discovered after human AIDS. How do they know the jump didn't occur in reverse?

Here is the first report of simian AIDS: "Examination of the species-specific annual mortality rates of macaques at the center during the previous 4 yr showed a significant increase in deaths in 1980 and 1981" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC393899/

Also, it is convenient they do not share the mcmc traces. Based on these settings I bet they were ugly:

>"For each data set, at least 3 MCMC chains of 250 million steps were computed. Parameters and trees were sampled every 50,000th step. Samples were combined with LogCombiner (77) and between 10 to 30% of each MCMC chain was discarded as burn-in. MCMC mixing was diagnosed using visual trace inspection and calculation of effective sample sizes in Tracer (77). We report the posterior mean and 95% Bayesian credible intervals for evolutionary parameters."

http://dx.doi.org/10.1126/science.1256739

Another thing, they write:

"Our estimated location of pandemic origin explains the observation that Kinshasa exhibits more contemporary HIV-1 genetic diversity than anywhere else"

This is totally circular. As shown in their figure S2, they used genetic diversity in a region as an indication of earlier presence.


My limited understanding is that once discovered scientists found the families of simian viruses that demonstrated evolutionary and divergence that cannot be explained in going the other way or the timeframe of human hiv.

See http://news.nationalgeographic.com/news/2006/05/060525-aids-... http://www.ncbi.nlm.nih.gov/m/pubmed/8026477/#

There is a video documentary that I watched in the last couple years that I was really hoping to find again and link to.


Thanks. I wonder if this analysis will be reproducible though (they don't mention any parameters...):

>"Phylogenetic analysis Phylogenetic relationships were estimated from comparisons of predicted protein sequences. Sequences were aligned using CLUSTAL (Higgins and Sharp, 1988, 1989). Evolutionary distances between all pairs of sequences were computed using Kimura's empirical method (Kimura, 1983; eqn. 4.8) to estimate the number of superimposed amino acid replacements; sites at which there was a gap in any sequence in the alignment were excluded from all comparisons. Phylogenetic relationships were estimated from these distances by the neighbor-joining method (Saitou and Nei, 1987). The reliability of branching orders was estimated by the bootstrap approach (Felsenstein, 1985). These methods were implemented using CLUSTAL V (Higgins et al., 1992).

Nucleotide accession numbers All sequences were submitted to GenBank and are available under accession numbers U03994-U04018."

It looks like those sequences are a mixture of the various proteins (env, pol, etc) so it would take some time to figure out which is which. I don't feel it like right now. It seems like getting the sequences from genbank and then aligning using default params with clustalw would be closest to their method. If someone wants to do it: http://www.ncbi.nlm.nih.gov/genbank/ http://www.genome.jp/tools/clustalw/


> Simian AIDS was discovered after human AIDS. How do they know the jump didn't occur in reverse?

I'm not sure we (that is, humans) realized that SIV was a thing until HIV prompted us to look in those places. It's the classic example of not knowing what to look for until you've found it.


I agree, the time of discovery proves nothing. But I asked how they can tell either way since there is no data from earlier. However, I would think captive animals were monitored more closely for strange diseases than people in rural mid-1900s Africa.


You'd probably be wrong about that. Even today in most of these places, a captive animal that dies is just 'sick' and disposed of. Or often sold off as food if the death symptoms aren't obviously apparent, as would be the case with HIV. There is no functioning FDA in central africa.


This was at the "New England Regional Primate Research Center" though.


Thanks for the article link!

>"Our estimated location of pandemic origin explains the observation that Kinshasa exhibits more contemporary HIV-1 genetic diversity than anywhere else" >This is totally circular. As shown in their figure S2, they used genetic diversity in a region as an indication of earlier presence.

It seems like they also used phylogenetic history. Diversity alone is not a sufficient to identify origination.

>A very high genetic diversity of HIV-1 has been reported, not only in Kinshasa and the north and south of the DRC (12, 13, 31, 32), but also in Brazzaville in the RC and, to a lesser extent, in the Mayombe area of RC near Pointe-Noire, all of which have been suggested as potential source locations of the pandemic (22, 33, 34). We therefore performed phylogeographic analyses of viruses collected in both the DRC and RC (table S1) and compared sequence sampling locations with phylogenetic history to formally test hypotheses concerning the location of ancestral viral lineages (30). Our analyses robustly place the spatial origin of the HIV-1 group M pandemic in Kinshasa [posterior probability (PP) = 0.99]

They could have used "consistent with" rather than "explains", though that may be a bit pedantic and debatable and we wouldn't be in the sorry state of science education/trust if all researchers would communicate like Neil deGrasse Tyson.


Well I know that information about genetic diversity was used to generate the phylogenetic tree since the same sequences are used to determine both, therefore there is data leakage. So I am sure it is circular, it is only a matter of how circular. I think totally circular.

Perhaps I am misunderstanding their model, but look at figure S2. It shows the less closely related the sequences at each location, the more likely that was the originating region relative to the others. See how the leafs of the right tree corresponding to location B are more widely distributed (indicating diversity) than those for location A or C (which are more clustered within the tree)?


These seem to be fairly standard phylogeographic methods that are applied to a mostly uncontroversial dataset. It's basically ancestral state reconstruction where geography are the traits being reconstructed: https://en.wikipedia.org/wiki/Ancestral_reconstruction#Trait... . Although there are concerns with inferring ancestral state using these sorts of methods, the fact that the authors had some historical data means that the inferred ancestral region is far more accurate than would be possible with extant data alone.

Several of the authors are also pretty hardcore Bayesian methods people in the field of phylogenetics specifically applied to the evolution of diseases. It's unlikely that their long MCMCs were due to some kind of coverup; it's possible that either the large number of sequences made convergence more difficult and / or they wanted to be absolutely certain of their results by running their analyses longer than usual. These sorts of phylogeographic methods (especially with an asymmetric movement model and lots of different localities) tend to have likelihood surfaces with many ridges, making MCMC quite difficult for all but the most trivial datasets.


>"making MCMC quite difficult"

They thinned 250 million steps to 500, then it sounds like they dropped the first 50-100 on top of that post-hoc (the 10-30% burn in gives an impression of manual tinkering).

Obviously this is going to bring up questions regarding convergence, so reviewers should have asked for the diagnostic charts. This would probably have been better suited to ABC:

https://en.wikipedia.org/wiki/Approximate_Bayesian_computati...


BEAST (which I assumed they used) is not very fast. 250 million generations on BEAST with the number of parameters they estimated and the size of their dataset would likely take about 2-4 weeks to run. Assuming they ran all 3 chains in parallel this is still a really long time to wait for an analysis. (edit: just checked, looks like they used 700+ sequences, so this is closer to 1-3 months of compute time)

The standard number of samples for phylogenetic analyses is 1,000-10,000 samples, trending towards the smaller as the phylogenies get larger. Since they did 3 independent analyses and combined their samples, as long as they assessed convergence and ensured that all 3 chains were converged this is likely fine. Excluding burnin is also quite typical since the MCMC move operators for phylogenetics are not very good due to the high dimensionality of the search space. ABC is also challenging to work with due to the high dimensionality (though I admit I haven't worked much with these methods in phylogenetics)

Typically in this field it is not enough to say "you didn't run your MCMC chains for long enough" or anything like that. Of course the chains should be run for longer -- theoretically speaking they should be run for infinite amount of time, but there are routine disagreements over whether 100 or 1000 effective samples are sufficient for phylogenetics. But unless there is a serious model inadequacy that the authors haven't addressed there's typically no reason to nitpick about these sorts of things. The authors have covered a lot of their bases by trying a number of different models and priors and I don't see any reason to doubt this particular study based on that.


Disclaimer: I don't know anything about this stuff.

Would they be able to run these MCMC chains in a way that perhaps progressively renders results at greater and greater accuracy? Then they could get some initial "low resolution" results and they wouldn't necessarily have to wait for weeks, but the full data will eventually be available.


In principle, there would be nothing stopping them from looking at the output every step, or every n steps. I don't know if the software they used supports this though.


It is not that 500 steps is too low that concerns me, it is not showing the posteriors for the parameters used (which it sounds like may simply be the custom in this area). I would think if people had such trouble using mcmc that common practice would be publishing diagnostic charts to reassure each other.


Radiolab has an episode that covers some of this. It's a pretty good listen: http://www.radiolab.org/story/169879-patient-zero/

As far as any circular reasoning, they're just reiterating the claim in a different way. It's saying "Based on X data, we deduce Y. Y would explain why X happened."

Direct download if you don't want the Flash audio player: http://50.31.154.42/radiolab/radiolab111411.mp3?downloadId=5...


Wikipedia links to an article stating that SIV has been present in monkeys and apes for at least 32000 years: http://www.nytimes.com/2010/09/17/health/17aids.html?_r=2&sr...


They get that date by assuming humans didn't bring it to the island though, they don't do anything to rule that out:

>"Barring the possibility that humans introduced multiple species-specific SIV lineages to the wild monkey populations of Bioko, the mainland and island SIVdrl variants must have been evolving independently since Bioko became isolated ~10,000 yr B.P., and perhaps longer given the high levels of genetic diversity seen within local SIV populations."

http://www.sciencemag.org/cgi/content/abstract/329/5998/1487

Interesting stuff. I hadn't thought of this until now, but from these few papers I don't think anyone has tried very hard to rule out the human to simian transmission idea.


On the other hand SIV isn't killing the monkeys or apes it infected, which makes likely a long coevolution.


Chimps and gorillas do get AIDS from SIV. IIRC they're relatively recent infections (though older than human HIV) and SIVgor actually descends from SIVcpz. Rhesus Macaques also get AIDS from SIV (SIV was originally discovered in AIDS-suffering macaques)


>" longer given the high levels of genetic diversity seen within local SIV populations."

That is them ruling out that it came from humans.


> Simian AIDS was discovered after human AIDS. How do they know the jump didn't occur in reverse?

Obviously, just as humans are higher than monkeys on the Great Chain of Being, so must HIV be higher than SIV. ;)


There's an excellent Radiolab Podcast that covers the same study, I'd highly recommend listening to it:

* http://www.radiolab.org/story/169879-patient-zero/


There's also an updated version which briefly covers recent Ebola outbreak: http://www.radiolab.org/story/patient-zero-updated/


Thanks! I didn't know about the updated one. The original is one of my favorite episodes.


Same here, although the description of chimps hunting was a bit disturbing.


It's a morbid irony that HIV emerged from "Leopoldville" considering the other atrocities unleashed on Africa by Leopold II.


https://en.wikipedia.org/wiki/Leopold_II_of_Belgium

If anyone else is curious like I was.


Wow, on the order of 10 million dead - half the population. That's Hitler-level genocide right there.


Workers who were short of the quotas might have their hands severed. Leopold II is a forgotten monster. Arguably it birthed the international human rights movement.


tl;dr, Kinshasa, Democratic Republic of the Congo (then known as Leopoldville)


There's actually more content in the article. It's far from click-bait.

After reading it I think that the exact city is the least interesting fact they have to offer.


Click-bait titles dont' mean the article is crap. It's a good article with a clickbait title, plain as that. Clickbait is when some information is left out of the title in an obvious way such that readers will click in pursuit of that piece of information, rather than clicking it in pursuit of expanding on the headline.


I don't disagree. The story is interesting. I just realized that anyone who read the headline would want to know what city. I probably came across snarkier than intended.


I believe you, but the name of the city could still be included in the title, at least.


Thanks.


The book "The history of AIDS" was a fascinating and accessible read on the subject of the spread of HIV, from it's origins to today.


Definitely, highly recommended book. Learned of it from Quammen's "Spillover"[0] (a great — if somewhat terrifying — book in its own right) mentioning it as a source in the AIDS chapter and it was a fantastic read. Note that Quammen released his own AIDS-history book this hear, "The Chimp and the River"[1] (haven't read it yet though I intend to).

Direct link to Jacques Pépin's "The Origin of AIDS" on amazon: http://www.amazon.com/dp/0521186374

[0] http://www.amazon.com/dp/0393346617/

[1] http://www.amazon.com/dp/0393350843/


Is HN becoming "spammy"? In this case, would it hurt to include the name of the city in the title?


It's not really clear to me:

Are they saying that the jump happened in a large metropolis ?

Or that Kinshasa was the first large city in which HIV became an epidemic ?


The first large city.


Belgian control of the Congo was rife with pretty extreme abuse -- the obvious, often horrifying consequences of conquest. But the influx of ambitious businesspeople and capital which they claim started the wider outbreak of HIV, is a curious case. Often people will hand-wring about vague cultural factors, but this seems like one of the most concrete major international disasters caused by rapid gentrification and international investment.

In many ways, whether you support e.g. more direct foreign investment areas in India and China or not, I'd prefer the conversations about costs to be as well formulated as this (I also suspect free trade and foreign investment in China and India has done a lot more good than HIV has done bad, even if you just count lives saved).


I think that any comparison with the Belgian rape of the Congo in the 1900's with rapid gentrification and international investment is completely unwarranted.


_rapid gentrification_ of Congo

whuhh???

Just 'caus it had a CIA sponsored coup, doesn't mean things got any better.


This would be the first time I've seen someone suggest gentrification makes things better.


Kinshasa, the capital of the Democratic Republic of Congo.


tl;dr - Kinshasa.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: