Page 188 - Contributed Paper Session (CPS) - Volume 7
P. 188
Dataset CPS2057 Ana C. M. Ciconelle et al.
to quantify and call CNVs from SNP platforms and to analyse such data
considering family based designs and characterize the patterns of the CNVs
detected in this population.
2. Methodology
Dataset
Due to multiple waves of immigration, Brazil has a highly admixed
population, which can be driven by genetic and environmental influences on
several traits. The Baependi Heart Study is being conducted by the Heart
Institute since 2005 to develop a longitudinal family‐based cohort study for
understanding the variation of cardiovascular risk factors within the Brazilian
population and disentangle its genetic and environmental components. The
data provides information about 105 families (1,666 individuals, 723 male and
943 females) living in the village of Baependi, in the state of Minas Gerais,
Brazil. Data from 631 nuclear families were available, with offspring ranging
from 1 to 14. The number of generations per family varied from 2 to 4 (54% of
the families had 3 generations, and 45% had 2 generations). Only individuals
aged 18 years or older were considered eligible for participating in the study.
The mean age was 44 years, with a range of 18 to 100 years.
For each participant a questionnaire was used to obtain information
regarding family relationships, demographic characteristics, medical history
and environmental risk factors. Anthropometric measures, physical and clinical
examination and electrocardiogram of the participants were performed by
trained medical students. Genomic DNA was extracted by standard procedures.
From DNA samples, genotyping with SNP array was made based on Affymetrix
Platform 6.0 and 1,120 CEL files were obtained, which stores the intensity
values of each probe array for a single sample and several others information.
More details are described in Egan et al. (2016).
Overview
The methodology used in this work is summarized by Figure 1, which
describes the pre‐processing of SNP data, the CNV calling and the CNV
analysis. For the pre‐processing of SNP data and the CNV calling, the software
Affymetrix Power Tools (APT) (Affymetrix, 2017), PennCNV by Wang et al.
(2007) and packages from the R environment were used. Using APT, given the
CEL files, signal intensity values for probes are normalized through quantile
normalization. Then, the median polish is applied to get the final cleaned
intensity values for alleles A and B for each SNP. Also, the individual genotype
calls is made using the Birdseed algorithm. For each SNP in each sample, the
genotype will be coded as 0, 1 and 2 for AA, AB, BB and ‐1 for missing values,
respectively, with its corresponding confidence scores. In addition, a final
report will infer the sample sex. PennCNV generates canonical genotype
clustering files based on the output files from APT. These files contain cluster
175 | I S I W S C 2 0 1 9