{"id":147,"date":"2023-07-12T15:21:54","date_gmt":"2023-07-12T07:21:54","guid":{"rendered":"https:\/\/www.kz-hub.tech\/?p=147"},"modified":"2023-07-12T15:21:54","modified_gmt":"2023-07-12T07:21:54","slug":"convert-sequenza-segment-txt-to-gistic-input","status":"publish","type":"post","link":"https:\/\/www.kz-hub.tech\/index.php\/2023\/07\/12\/convert-sequenza-segment-txt-to-gistic-input\/","title":{"rendered":"Convert Sequenza *segment.txt to GISTIC input"},"content":{"rendered":"<p><a href=\"http:\/\/crazyhottommy.blogspot.com\/2017\/11\/run-gistic2-with-sequenza-segmentation.html\">\u539f\u6587\u94fe\u63a5<\/a><\/p>\n<p>Gistic was designed for SNP6 array data. I saw many papers use it for whole exome sequencing data as well.<\/p>\n<h3>Input format for gistic:<\/h3>\n<h4>segment file:<\/h4>\n<p>(1) Sample (sample name)<br \/>\n(2) Chromosome (chromosome number)<br \/>\n(3) Start Position (segment start position, in bases)<br \/>\n(4) End Position (segment end position, in bases)<br \/>\n(5) Num markers (number of markers in segment)<br \/>\n(6) Seg.CN (log2() -1 of copy number)<\/p>\n<p>The conversion should be log2 (logarithm base 2) - 1, so that copy number 2 is 0.<br \/>\nEvery segment start and end in the segments file should appear in the markers file, not the other way around.<br \/>\nwhen the copy number is 0 (a homozygous deletion of both copies). You can\u2019t do a log2(0)-1, just put a small number e.g. -5<\/p>\n<h4>marker file:<\/h4>\n<p><a href=\"https:\/\/groups.google.com\/a\/broadinstitute.org\/forum\/#!searchin\/gistic-forum\/marker$20file\/gistic-forum\/Vq9WWDiy7jU\/BSFg2zmBZ1EJ\">https:\/\/groups.google.com\/a\/broadinstitute.org\/forum\/#!searchin\/gistic-forum\/marker$20file\/gistic-forum\/Vq9WWDiy7jU\/BSFg2zmBZ1EJ<\/a><br \/>\n(1) Marker Name<br \/>\n(2) Chromosome<br \/>\n(3) Marker Position (in bases)<br \/>\nNote gistic2 does not require a marker file anymore.<\/p>\n<p>sequenza gives a segment file. Segmentation was done by copynumber bioconductor package.<br \/>\n13 columns of the <em>segments.txt file:<br \/>\n&quot;chromosome&quot; &quot;start.pos&quot; &quot;end.pos&quot; &quot;Bf&quot; &quot;N.BAF&quot; &quot;sd.BAF&quot; &quot;depth.ratio&quot; &quot;N.ratio&quot; &quot;sd.ratio&quot; &quot;CNt&quot; &quot;A&quot; &quot;B&quot; &quot;LPP&quot;<br \/>\nWe only need the chromosome, start.pos, end.pos, N.BAF and depth.ratio columns.<br \/>\nThe depth.ratio column is the GC content normalized ratio. a depth ratio of 1 means it has copy number of 2 (the same as the normal blood control in my case).<br \/>\nIt is not log2(2^ depth.ratio) -1 rather log2(2 <\/em> depth.ratio) - 1<\/p>\n<p>I have a bunch of sgement files in the same folder.<br \/>\nadd the sample name in the final column and do the log2 math in R.<\/p>\n<pre><code class=\"language-R\">library(tidyverse)\nlibrary(readr)\nseg_files&lt;- list.files(&quot;.&quot;, pattern = &quot;*segments.txt&quot;, full.names = F) \n\nseg_dat_list &lt;- lapply(seg_files, function(f) {\n        dat&lt;- read_tsv(f, col_names = T, col_types = cols(.default = col_character()))\n        sample&lt;- gsub(&quot;_vs_.*segments.txt&quot;, &quot;&quot;, f)\n        dat$sample&lt;- sample\n        return(dat)\n})\n\nseg_dat &lt;- do.call(rbind, seg_dat_list)\n\ngistic_input&lt;- seg_dat %&gt;% select(sample, chromosome, start.pos, end.pos, N.BAF, depth.ratio) %&gt;% mutate(depth.ratio = as.numeric(depth.ratio)) %&gt;% mutate(depth.ratio = log2(2 * depth.ratio) -1)\n\nwrite_tsv(gistic_input, &quot;all_segments.txt&quot;)<\/code><\/pre>\n<p>Back to bash:<\/p>\n<pre><code class=\"language-bash\">## marker file:\n\ncat all_segments.txt | sed &#039;1d&#039; | cut -f2,3 &gt; markers.txt\ncat all_segments.txt | sed &#039;1d&#039; | cut -f2,4 &gt;&gt; markers.txt\n\n## sort the files by chromosome, take the unique ones and number the markers.\n\ncat markers.txt | sort -V -k1,1 -k2,2nr | uniq | nl &gt; markers_gistic.txt<\/code><\/pre>\n<p>Run gistic<br \/>\nmodify the gistic2 script a bit. e.g. change MCR_ROOT folder path<\/p>\n<pre><code>#!\/bin\/sh\n## set MCR environment and launch GISTIC executable\n\n## NOTE: change the line below if you have installed the Matlab MCR in an alternative location\nMCR_ROOT=\/scratch\/genomic_med\/apps\/Matlab_Complier_runTime\nMCR_VER=v83\n\necho Setting Matlab MCR root to $MCR_ROOT\n\n## set up environment variables\nLD_LIBRARY_PATH=$MCR_ROOT\/$MCR_VER\/runtime\/glnxa64:$LD_LIBRARY_PATH\nLD_LIBRARY_PATH=$MCR_ROOT\/$MCR_VER\/bin\/glnxa64:$LD_LIBRARY_PATH\nLD_LIBRARY_PATH=$MCR_ROOT\/$MCR_VER\/sys\/os\/glnxa64:$LD_LIBRARY_PATH\nexport LD_LIBRARY_PATH\nXAPPLRESDIR=$MCR_ROOT\/$MCR_VER\/MATLAB_Component_Runtime\/v83\/X11\/app-defaults\nexport XAPPLRESDIR\n\n## launch GISTIC executable\n.\/gp_gistic2_from_seg $@<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u539f\u6587\u94fe\u63a5 Gistic was designed for SNP6 array data. I saw man&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-147","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/comments?post=147"}],"version-history":[{"count":1,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/147\/revisions"}],"predecessor-version":[{"id":148,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/147\/revisions\/148"}],"wp:attachment":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/media?parent=147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/categories?post=147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/tags?post=147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}