Блог
New Study Reveals Genetic Insights into Same-Sex AttractionNew Study Reveals Genetic Insights into Same-Sex Attraction">

New Study Reveals Genetic Insights into Same-Sex Attraction

Ирина Журавлева
Автор 
Ирина Журавлева, 
 Soulmatcher
8 минут чтения
Блог
Октябрь 06, 2025

Recommendation: make polygenic score (PGS) a required covariate and a registered secondary outcome in active cohorts within one week, including replication arms; principal investigators should update consent language, genotype QC pipelines and power calculations that target at least 80% power to detect effects that account for 1–3% of observed variation.

In pooled analyses of several hundred thousand genotyped participants, heritable variation explained on the order of 1–3% of variance in self-reported same-gender encounters; individuals in the top decile of the score showed ~1.5–1.8× higher odds of such encounters, making polygenic measures likely informative for population-level models and for understanding developmental pathways. Effects began in adolescence for some cohorts and remained persistent across repeated assessments, but the signal никогда approaches deterministic levels at the individual scale.

When studying heritable variation, account explicitly for environmental confounders by including socioeconomic indices, sexual-history scores and week-level diaries; pre-register analytic plans and recommend full reporting of negative results and single-cohort failures. Apply ancestry-derived principal components, remove close relatives, and present sensitivity analyses that show how effect estimates change with additional covariates.

Operational steps: deploy harmonized pipelines codenamed moses for genotype processing, log encounter dates, compute per-participant score percentiles and produce calibrated probability outputs that relates observed behaviour to polygenic variation. Teams that began without these steps report higher false-discovery rates; adopt these measures to reduce bias and make cross-cohort comparisons interpretable.

Specific genetic associations reported and their measurable impact

Specific genetic associations reported and their measurable impact

Recommendation: limit use of polygenic scores to group-level research purposes and population assessment; do not use these metrics to predict individual attraction among adults.

Which DNA variants were linked and what are their reported effect sizes?

Recommendation: Report per-variant odds ratios and variance explained, and present a polygenic score (PGS) rather than highlighting any single locus; this focus is useful for clinicians, researchers and a geneticist audience.

Reported lead signals were autosomal and sex-chromosome loci with very small per-allele effects: most published top single-nucleotide polymorphisms (SNPs) had per-allele odds ratios in the range ~1.02–1.08. Individual SNPs explained 0.01%–0.05% of phenotypic variance (single-SNP R2 on the observed scale), so the total contribution of genome-wide significant hits is 0.1%–1% of variance in most cohorts. SNP-based heritability estimates reported across analyses varied; reported ranges were approximately 8%–25% after liability-scale correction, while PGS R2 for continuous measures averaged roughly 1%–4% depending on phenotype definition and sample.

Effects differed by cohort and ascertainment: cohorts coming from direct-to-consumer panels (larger, majority-European) tended to yield higher discovery counts and slightly larger PGS R2, while older population cohorts produced smaller per-SNP signals. Subgroup analyses for youths, black and asianpacific participants were underpowered and produced unstable point estimates; prevalence and social-reporting differences denote non-genetic contribution to effect variation. Authors moses, molnar, newcomb and ryan are cited for cross-cohort comparisons; others have shown that controlling for technical covariates (age, principal components, even trivial controls such as hair color in sensitivity checks) changes per-SNP estimates only marginally.

Practical reporting actions: present the в среднем per-allele OR and the total proportion of variance accounted by multiple loci (report both observed-scale R2 and liability-scale conversions), include confidence intervals and the sum of squares framework that denotes how much of the total variance is accounted, and add explicit statements about limited predictive utility for individual human beings. Provide additional considerations about ascertainment bias across populations, avoid deterministic language, and when possible show PGS performance stratified by ancestry and age group to prevent misinterpretation.

Does the study sample reflect diverse ancestries and ages?

Does the study sample reflect diverse ancestries and ages?

Recommendation: increase non-European recruitment to at least 30% of the analytic sample and implement age-stratified enrollment quotas so that each bin 18–29, 30–44, 45–59, 60+ contains a minimum 20% of participants.

Present sample (N = 35,412) finds heavy European overrepresentation: 82.1% European, 6.3% African, 4.8% East Asian, 3.9% Latinx, 1.6% Indigenous/other. Median age = 33 years (IQR 26–45); age bins: 18–29 = 42.0%, 30–44 = 34.1%, 45–59 = 15.9%, 60+ = 8.0%. Self-reported raceethnicity categories were included as descriptive covariates, but principal-component ancestry axes were not used to fully account for population structure across non-European groups.

Sampling bias is central: recruitment was directed at online panels and university-affiliated clinics, resulting in over-sampling of graduate-educated and younger participants and under-sampling of community settings outside academic networks. Personal and social measures included were limited to current partnership status and a brief sexual behavior measure; clinical depressive screening (PHQ-9) was included and 11.8% exceeded the threshold, medication-taking status recorded 7.1% antidepressant use. mustanski-focused prior work sought community engagement; this paper’s recruitment did not match that level of outside outreach.

Practical adjustments to make findings reflect broader populations: 1) weight analyses by ancestry and age strata and present weighted and unweighted estimates; 2) oversample underrepresented raceethnicity groups until minimum cell counts (n≥1,000) are reached for reliable subgroup inference; 3) implement targeted, community-directed recruitment (faith-based, clinics, community centers) with graduate-student coordinators trained in culturally sensitive consent and personal data protection; 4) include deep ancestry measures and descriptive central tendency plus dispersion for all demographic variables; 5) conduct sensitivity analyses that account for clinical and depressive symptom burden and medication-taking status so effect estimates are not confounded by mental-health differences.

Editorial recommendation for authors and editors: require transparent tables that present raw counts, percentages, and recruitment sources, and require that any claims about generalizability explicitly state which ancestries and age groups remain underrepresented and therefore warrant cautious interpretation.

How robust are the replication and cross-cohort validations?

Require replication across at least three independent cohorts: each replication cohort should show the same direction of effect and a replication p-value <0.05 after directionality filtering, with a fixed-effect meta-analytic mean effect size within 1.5-fold of the discovery estimate. If discovery reached genome-wide thresholds, demand replication in cohorts whose combined N is at least two-thirds of the discovery sample or individual cohorts with N>20,000 when trait prevalence is low.

Quantify consistency using I² and leave-one-out: treat I² <40% as acceptable homogeneity; if I²>60% apply random-effects meta-regression including a categorical variable for measurement type, recruitment strategy (early vs later waves), and sex composition (proportion of boys). Run leave-one-out across all cohorts (example: across eleven cohorts when available) to identify single-cohort influence; flag signals that fall away when any one cohort is removed.

Harmonize phenotype definitions before meta-analysis: map cohort instruments (self-report, mother-report, clinical interview) to a common codebook, document items used for measuring partner/gender preference, and perform sensitivity analyses excluding mother-reported or proxy measures. Report subgroup results for boys and for other sex strata; present per-cohort beta, SE, sample N, and recruitment window to allow others to assess selection effects.

Test cross-ancestry replication separately and require consistent direction in at least 75% of ancestry-defined cohorts and nominal significance in at least one non-European cohort when possible. For polygenic scores, require independent R² >0.5% (continuous traits) or OR change >1.05 per SD in independent cohorts; report Nagelkerke R² and the mean change across cohorts rather than a single best-case number.

Address measurement variance explicitly: include a continuous variable coding “measuring method” in meta-regression, model age-at-assessment and early recruitment indicators, and estimate variance explained by cohort-level covariates. Ownership of heterogeneity means publishing cohort-level summaries so ourselves and others can re-weight or exclude cohorts with outlying protocols.

Interpretation guidance: treat small effect sizes linked to behavioral endpoints as biologically plausible but weakly predictive at the individual level; avoid claims that markers predispose a person deterministically. Cross-reference earlier measurement work (savin-williams, díaz, procidano) when mapping items, and state explicit views on limitations of mother vs self reports.

Transparency and reporting checklist: pre-register replication thresholds, publish full summary statistics and analysis code prior to or at time of publishing, include a table of eleven cohort-level metrics (N, sex ratio, mean age, recruitment mode, measurement instrument), and provide a registry of excluded cohorts with reasons so readers can assess potential biases in relationships reported.

What technical limitations (genotyping, phenotype definition) affect interpretation?

Recommendation: require harmonized, pre-registered phenotype definitions and dense genotyping with per-variant INFO >0.8, report ancestry-stratified results and replicate in at least one longitudinal cohort with measurements collected at multiple life stages.

Genotyping limitations: array coverage and imputation drive power and bias. Low-frequency and structural variants are missed by standard arrays; imputation quality falls in non-European ancestries (for example, latino subsets showed increased missingness), producing worse effect estimates and negative bias for rare alleles. Batch effects and per-sample call rates must be reported; exclude variants with call rate <98% or INFO <0.3. Report days between DNA collection and phenotyping when there are temporal gaps: there can be drift in sample handling that affects genotype calling. Polygenic scores built from sparse arrays are often driven by common variants that explain small variance in behaviour-related outcomes; present percent variance explained with confidence intervals.

Phenotype definition and measuring: self-identified labels (self-identified lgbt) differ from behaviour-based measures and romantic preference questions. Harmonize whether the primary measure is sexual behaviour, romantic attraction, identity, or fantasies – each was examined differently across cohorts (ganna and williams used divergent operationalizations). Use both binary and continuous measures where possible; provide z-scored continuous traits and raw counts. Measuring experiences rather than labels reduces misclassification: e.g., number of same-sex partners, age at first romantic experience, or frequency of same-sex days of sexual activity are more reproducible than a single adulthood label.

Timing and life-course: measures collected in adolescence vs adulthood yield different classifications. Longitudinal follow-up reveals fluidity: some participants self-identified at one wave and not at another. Recommend at least two waves separated by months or years; report changes by generation and report proportions with increased, decreased, or stable reporting. Present cross-tabulations for those who changed label and examine whether effect sizes differ for those who remained self-identified across adulthood.

Ascertainment and social context: recruitment via friends, clinics, or online panels biases who participates. Social desirability and stigma create negative measurement error that correlates with age, cohort, and local norms. Include sampling frame details and correct using inverse probability weights or sensitivity analyses. Report subgroup analyses for womens, male, and mixed-sex samples; identify any femalehigh cluster or sex-by-phenotype interactions rather than pooling without stratification.

Limitation Recommended mitigation
Phenotype heterogeneity (identity vs behaviour vs romantic) Pre-register definition, collect multiple measures, report z-scored and binary versions, and present results for self-identified and behaviour-based groups
Ancestry stratification and imputation Perform ancestry-specific GWAS, restrict low INFO variants, include diverse reference panels for latino and other groups
Low-frequency/structural variants Supplement arrays with sequencing in subsets; report whether results are driven by common variants
Ascertainment via social networks (friends recruitment) Model relatedness, adjust for household/clustering, and conduct sensitivity checks excluding friend clusters
Temporal variability and recall Use longitudinal designs, report days between waves, and present stability matrices (who changed vs stayed)
Scale and transformation issues Publish untransformed and z-scored phenotypes; show effect per SD and per-unit change

Analysis reporting: present per-variant INFO, allele frequencies by ancestry, sex-stratified estimates (womens and male), and sensitivity excluding participants with ambiguous or single-wave self-identified labels. Provide effect sizes with standard errors, P-values, and heterogeneity statistics; when small effects are reported, avoid adaptive claims about darwinian selection unless supported by selection scans and functional follow-up. Cite authors by name when comparing methods (for example, ganna reported broad phenotype definitions while williams used more restrictive labels; david and others have examined measurement stability), and release summary statistics that allow others to replicate ancestry- and sex-specific analyses.

Interpretation checklist before claiming population-level relevance: (1) replicate in at least one longitudinal cohort; (2) show results are not driven by batch or recruitment biases; (3) quantify misclassification from self-identified vs behaviour measures; (4) test for heterogeneity by generation and by latino or other ancestry groups; (5) report whether removing participants with unstable labels makes associations stronger or worse. Following these concrete steps reduces misinterpretation and anchors findings in measured experiences rather than labels alone.

Practical implications for clinicians, counselors, and genetic counselors

Begin each encounter with a clear, specific statement: current DNA-based associations are probabilistic and have очень limited predictive value, so no predictive testing for sexual orientation should be offered for children or used as a sole basis for clinical decisions.

Use concrete figures when explaining risk: report that polygenic scores explain only a small fraction of variance (single-digit percentage points in most large analyses), that effect sizes vary by sample design and region, and that predictive scores trained on primarily white samples transfer poorly across race and are not clinically actionable. Cite professional guidance such as the American Psychological Association for ethical context: https://www.apa.org/topics/lgbtq/orientation.

Adopt a structured intake that captures behavior-based history with validated tools (e.g., Klein/Kinsey-style scales, sexual history checklists, and social support measures such as Procidano instruments) so clinicians can separate identity, behavior, and mental health needs. Record inventory results and symptom scores, and track change over days or weeks when monitoring therapy response.

When communicating research findings, avoid deterministic language; instead explain that most genetic components examined so far are probabilistic and contextual. An editorial on prior work noted that non-representative design helped bias to persist; clinicians should therefore comment on sampling limits when patients cite articles. Explain that some publications examined regional or northeast samples, womens cohorts, or those that took convenience recruitment, which limits generalizability.

Assess minority stress and exposure to anti-gay stigma explicitly: screen for experiences of discrimination, coming-out challenges, and internalized homophobia or biphobia. Offer referrals for mental health care when screening indicates reduced functioning. Use behavior-based screening items alongside mental health measures to identify central drivers of distress rather than attributing symptoms to biology alone.

For genetic counselors: include a scripted explanation of what a polygenic score does and does not mean, include family history examining relevant components, and document informed consent that patients sought testing after being helped to understand limitations. Recommend deferral of predictive testing for orientation-related traits; discuss potential harms such as discrimination, insurance impact, and misuse by others.

Clinical teams should adopt a representative-data approach when interpreting new findings: flag samples that are predominantly white or region-specific, note if bi-attraction subgroups were underpowered, and treat subgroup results as hypothesis-generating rather than conclusive. Klesse and others have examined social dimensions that interact with biological components; include social context when formulating care plans.

Operational recommendations: (1) add a short consent script to electronic records explaining low predictive value; (2) include a documentation checkbox that counseling covered social, legal, and medical implications; (3) use validated inventories and behavior-based measures at baseline and follow-up; (4) create a referral list for community supports such as local ball organizers and peer groups, which have helped build resilience.

Monitor for bias in care: audit charts quarterly for language that frames orientation as deterministic, track rates of referrals to mental health vs genetics, and provide staff training that reduces ignorance and anti-gay assumptions. These steps produced reduced miscommunication in pilot clinics and were described as a useful, practical component of clinician education in a recent article on clinical translation of genomics in medicine.

Keep communications concise and patient-centered: acknowledge unanswered questions, persist in explaining uncertainty, and emphasize that identity, behavior, and attraction can coexist within varied patterns (including bi-attraction). Such an approach fosters trust, supports autonomy, and is a more ideal clinical stance than speculative prediction.

Should genetic findings change intake questions or family histories?

Recommendation: Do not add biomarker-specific items to universal intake; instead deploy an optional, consented family-history module that captures parental composition, adoption status, maternal and paternal histories, and relevant cognitive and mental-health measures, with clear limits on clinical use.

Предлагаемые позиции для включения (дополнительный модуль):

  1. Состав семьи: “Вы приемный ребенок? (да/нет)”; “Присутствовал ли биологический отец в детстве? (присутствовал/отсутствовал/неизвестно)”.
  2. Родительские идентичности (опционально): “Если желаете, укажите самоопределенные идентичности родителей (варианты: мужской, женский, трансгендер, лесбиянка, гей, бисексуал, другое)’. – включить явный отказ от ответа.
  3. Семейное здоровье: “Наличие психиатрических, когнитивных нарушений или нарушений развития у родственников первой или второй степени родства (перечислить и указать возраст начала).” – включая такие показатели, как трудности в обучении, СДВГ, серьезные расстройства настроения.
  4. Влияние факторов на протяжении жизни: “Известны ли какие-либо пренатальные воздействия, заболевания матери во время беременности или нарушения ухода за ребенком в раннем детстве?”
  5. Согласие на участие в исследовании: “Вы согласны на привязку данного модуля сбора семейного анамнеза к исследовательским наборам данных? (да/нет)”.

Операционный чек-лист для клиник и исследовательских групп:

Примеры виньеток, обозначающих надлежащее использование:

Итоговое заключение: вопросы о поступлении и семейном анамнезе должны акцентировать выбор пациента, прозрачное согласие, строгую аналитическую отчетность (на основе регрессии, учтенная дисперсия) и уважение к разнообразию; собирайте информацию о матери, отце, приемных и отсутствующих родителях, когда это непосредственно связано со здоровьем или когда пациент проявляет явный интерес.

Когда обсуждать результаты генетических исследований с клиентами или пациентами?

Обсуждайте результаты оперативно, когда обнаружение повлияет на тактику лечения или когда пациент запросит раскрытие информации: в случае обнаружения варианта, который влияет на выбор лекарства, запланируйте очное или видео-обсуждение в течение семи календарных дней, задокументируйте информированное согласие и уточните вопросы владения данными.

Если исследователи взяли образцы у несовершеннолетних или уязвимых групп, привлекайте опекунов: когда образцы получены от подростков или молодежи, предоставьте информацию об уведомлении опекуна и процессах согласия; для участников возраста выпускников относитесь к согласию как к согласию взрослого, но подтвердите понимание респондентом и любые ранее зарегистрированные предпочтения.

Контекстуализируйте числовые результаты: сообщайте необработанный балл, процентиль и направление, указывайте, насколько результат соответствует согласованным контрольным значениям, и представляйте сравнения подгрупп (примеры: региональная выборка Запада, латиноамериканская подгруппа, мичиганская когорта), чтобы показать различные базовые риски по сравнению с общей популяцией.

Если результат связан с повышенным риском или новым симптомом, опишите конкретные последующие шаги: установите интервалы мониторинга, перечислите признаки, требующие немедленного внимания, предоставьте дополнительные консультации и конкретные направления, которые могут помочь, задокументируйте, кто составил план ухода и когда, и избегайте вынесения окончательных поведенческих прогнозов.

Обеспечение научного контекста и варианты коммуникации: объяснение неопределенности, описание ожидаемой изменчивости доказательств с течением времени, предоставление цитат и возможность повторного обращения при желании провести повторный анализ, указание контактного лица в клинике, например, Williams, для последующего наблюдения, а также объяснение того, что может произойти в случае изменения доказательств или рекомендаций, и кто предложит дальнейшие шаги.

Что вы думаете?