3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
9 def fail_with_msg(msg):
15 fail_with_msg("Can't run without pypdf installed.")
17 from reportlab.lib.pagesizes import A4
19 fail_with_msg("Can't run without reportlab installed.")
22 A4_WIDTH, A4_HEIGHT = A4
23 POINTS_PER_CM = 10 * 72 / 25.4
24 CUT_DEPTH = 1.95 * POINTS_PER_CM
25 CUT_WIDTH = 1.05 * POINTS_PER_CM
26 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
27 SPINE_LIMIT = 1 * POINTS_PER_CM
31 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
32 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
34 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
35 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
37 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
38 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
40 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
41 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
43 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
44 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
46 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
47 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
49 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
50 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
52 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
53 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
55 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
56 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
58 Same as --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
59 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
61 Same as -n, but draw lines marking printable-region margins, page quarts, spine margins:
62 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
66 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
68 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
70 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
72 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
81 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
83 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
87 def validate_page_range(p_string, err_msg_prefix):
88 err_msg = "%s: invalid page range string: %s" % (err_msg_prefix, p_string)
89 if '-' not in p_string:
90 raise ValueError("%s: page range string lacks '-': %s" % (err_msg_prefix, p_string))
91 tokens = p_string.split("-")
93 raise ValueError("%s: page range string has too many '-': %s" % (err_msg_prefix, p_string))
94 for i, token in enumerate(tokens):
97 if i == 0 and token == "start":
99 if i == 1 and token == "end":
104 raise ValueError("%s: page range string carries values that are neither integer, nor 'start', nor 'end': %s" % (err_msg_prefix, p_string))
106 raise ValueError("%s: page range string may not carry page numbers <1: %s" % (err_msg_prefix, p_string))
110 start = int(tokens[0])
114 if start > 0 and end > 0 and start > end:
115 raise ValueError("%s: page range starts higher than it ends: %s" % (err_msg_prefix, p_string))
117 def split_crops_string(c_string):
118 initial_split = c_string.split(':')
119 if len(initial_split) > 1:
120 page_range = initial_split[0]
121 crops = initial_split[1]
124 crops = initial_split[0]
125 return page_range, crops
127 def parse_page_range(range_string, pages):
129 end_page = len(pages)
131 start, end = range_string.split('-')
132 if not (len(start) == 0 or start == "start"):
133 start_page = int(start) - 1
134 if not (len(end) == 0 or end == "end"):
136 return start_page, end_page
139 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
140 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
141 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
142 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
143 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
144 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
145 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
146 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
147 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
148 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
149 args = parser.parse_args()
151 # some basic input validation
152 for filename in args.input_file:
153 if not os.path.isfile(filename):
154 raise ValueError("-i: %s is not a file" % filename)
156 with open(filename, 'rb') as file:
157 pypdf.PdfReader(file)
158 except pypdf.errors.PdfStreamError:
159 raise ValueError("-i: cannot interpret %s as PDF file" % filename)
161 for p_string in args.page_range:
162 validate_page_range(p_string, "-p")
163 if len(args.page_range) > len(args.input_file):
164 raise ValueError("more -p arguments than -i arguments")
166 for c_string in args.crops:
167 initial_split = c_string.split(':')
168 if len(initial_split) > 2:
169 raise ValueError("-c: cropping string has multiple ':': %s" % c_string)
170 page_range, crops = split_crops_string(c_string)
171 crops = crops.split(",")
173 validate_page_range(page_range, "-c")
175 raise ValueError("-c: cropping should contain three ',': %s" % c_string)
180 raise ValueError("-c: non-number crop in %s" % c_string)
182 for r in args.rotate_page:
186 raise ValueError("-r: non-integer value: %s" % r)
188 raise ValueError("-r: value must not be <1: %s" % r)
190 float(args.print_margin)
192 raise ValueError("-m: non-float value: %s" % arg.print_margin)
199 # select pages from input files
203 for i, input_file in enumerate(args.input_file):
204 file = open(input_file, 'rb')
205 opened_files += [file]
206 reader = pypdf.PdfReader(file)
208 if args.page_range and len(args.page_range) > i:
209 range_string = args.page_range[i]
210 start_page, end_page = parse_page_range(range_string, reader.pages)
211 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
212 raise ValueError("-p: page range goes beyond pages of input file: %s" % range_string)
213 for old_page_num in range(start_page, end_page):
215 page = reader.pages[old_page_num]
216 pages_to_add += [page]
217 print("-i, -p: read in %s page number %d as new page %d" % (input_file, old_page_num+1, new_page_num))
219 # we can do some more input validations now that we know how many pages output should have
221 for c_string in args.crops:
222 page_range, _= split_crops_string(c_string)
224 start, end = parse_page_range(page_range, pages_to_add)
225 if end > len(pages_to_add):
226 raise ValueError("-c: page range goes beyond number of pages we're building: %s" % page_range)
228 for r in args.rotate_page:
229 if r > len(pages_to_add):
230 raise ValueError("-r: page number beyond number of pages we're building: %d" % r)
232 # rotate page canvas (as opposed to using PDF's /Rotate command)
234 for rotate_page in args.rotate_page:
235 page = pages_to_add[rotate_page - 1]
236 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
237 page.add_transformation(pypdf.Transformation().rotate(-90))
238 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
239 print("-r: rotating (by 90°) page", rotate_page)
241 # if necessary, pad pages to multiple of 8
243 mod_to_8 = len(pages_to_add) % 8
245 print("-n: number of input pages %d not multiple of 8, padding to that" % len(pages_to_add))
246 for _ in range(8 - mod_to_8):
247 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
248 pages_to_add += [new_page]
250 # normalize all pages to portrait A4
251 for page in pages_to_add:
252 if "/Rotate" in page:
253 page.rotate(360 - page["/Rotate"])
254 page.mediabox.left = 0
255 page.mediabox.bottom = 0
256 page.mediabox.top = A4_HEIGHT
257 page.mediabox.right = A4_WIDTH
258 page.cropbox = page.mediabox
260 # determine page crops, zooms, crop symmetry
261 crops_at_page = [(0,0,0,0)]*len(pages_to_add)
262 zoom_at_page = [1]*len(pages_to_add)
264 for c_string in args.crops:
265 page_range, crops = split_crops_string(c_string)
266 start_page, end_page = parse_page_range(page_range, pages_to_add)
267 crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm = [float(x) for x in crops.split(',')]
268 crop_left = crop_left_cm * POINTS_PER_CM
269 crop_bottom = crop_bottom_cm * POINTS_PER_CM
270 crop_right = crop_right_cm * POINTS_PER_CM
271 crop_top = crop_top_cm * POINTS_PER_CM
273 print("-c, -t: to pages %d to %d applying crops: left %.2fcm, bottom %.2fcm, right %.2fcm, top %.2fcm (but alternating left and right crop between even and odd pages)" % (start_page + 1, end_page, crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm))
275 print("-c: to pages %d to %d applying crops: left %.2fcm, bottom %.2fcm, right %.2fcm, top %.2fcm" % (start_page + 1, end_page, crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm))
276 cropped_width = A4_WIDTH - crop_left - crop_right
277 cropped_height = A4_HEIGHT - crop_bottom - crop_top
279 zoom_horizontal = A4_WIDTH / (A4_WIDTH - crop_left - crop_right)
280 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - crop_bottom - crop_top)
281 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
282 raise ValueError("crops would create opposing zoom directions")
283 elif zoom_horizontal + zoom_vertical > 2:
284 zoom = min(zoom_horizontal, zoom_vertical)
286 zoom = max(zoom_horizontal, zoom_vertical)
287 for page_num in range(start_page, end_page):
288 if args.symmetry and page_num % 2:
289 crops_at_page[page_num] = (crop_right, crop_bottom, crop_left, crop_top)
291 crops_at_page[page_num] = (crop_left, crop_bottom, crop_right, crop_top)
292 zoom_at_page[page_num] = zoom
294 writer = pypdf.PdfWriter()
297 print("building 1-input-page-per-output-page book")
299 for i, page in enumerate(pages_to_add):
300 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[i]
301 zoom = zoom_at_page[i]
302 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left, ty=-crop_bottom))
303 page.add_transformation(pypdf.Transformation().scale(zoom, zoom))
304 cropped_width = A4_WIDTH - crop_left - crop_right
305 cropped_height = A4_HEIGHT - crop_bottom - crop_top
306 page.mediabox.right = cropped_width * zoom
307 page.mediabox.top = cropped_height * zoom
308 writer.add_page(page)
309 odd_page = not odd_page
310 print("built page number %d (of %d)" % (i+1, len(pages_to_add)))
313 print("-n: building 4-input-pages-per-output-page book")
314 print("-m: applying printable-area margin of %.2fcm" % args.print_margin)
316 print("-a: drawing page borders, spine limits")
318 printable_margin = args.print_margin * POINTS_PER_CM
319 printable_scale = (A4_WIDTH - 2*printable_margin)/A4_WIDTH
320 half_width = A4_WIDTH / n_pages_per_axis
321 half_height = A4_HEIGHT / n_pages_per_axis
322 section_scale_factor = 1 / n_pages_per_axis
323 spine_part_of_page = (SPINE_LIMIT / half_width) / printable_scale
324 bonus_shrink_factor = 1 - spine_part_of_page
330 for page in pages_to_add:
337 new_i_order += [8 * n_eights + 3,
346 new_page_order += [eight_pack[3]] # page front, upper left
347 new_page_order += [eight_pack[0]] # page front, upper right
348 new_page_order += [eight_pack[7]] # page front, lower left
349 new_page_order += [eight_pack[4]] # page front, lower right
350 new_page_order += [eight_pack[1]] # page back, upper left
351 new_page_order += [eight_pack[2]] # page back, upper right
352 new_page_order += [eight_pack[5]] # page back, lower left
353 new_page_order += [eight_pack[6]] # page back, lower right
357 for j, page in enumerate(new_page_order):
359 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
361 # in-section transformations: align pages on top, left-hand pages to left, right-hand to right
362 new_i = new_i_order[j]
363 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[new_i]
364 zoom = zoom_at_page[new_i]
365 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / zoom - (A4_HEIGHT - crop_top))))
367 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left))
368 elif i == 1 or i == 3:
369 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / zoom - (A4_WIDTH - crop_right))))
370 page.add_transformation(pypdf.Transformation().scale(zoom * bonus_shrink_factor, zoom * bonus_shrink_factor))
372 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin/printable_scale))
374 # outer section transformations
375 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
377 y_section = A4_HEIGHT
378 page.mediabox.bottom = half_height
379 page.mediabox.top = A4_HEIGHT
382 page.mediabox.bottom = 0
383 page.mediabox.top = half_height
386 page.mediabox.left = 0
387 page.mediabox.right = half_width
389 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
391 page.mediabox.left = half_width
392 page.mediabox.right = A4_WIDTH
393 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
394 page.add_transformation(pypdf.Transformation().scale(section_scale_factor, section_scale_factor))
395 new_page.merge_page(page)
397 print("merged page number %d (of %d)" % (page_count, len(pages_to_add)))
400 from reportlab.pdfgen import canvas
403 packet = io.BytesIO()
404 c = canvas.Canvas(packet, pagesize=A4)
406 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
407 c.line(0, half_height, A4_WIDTH, half_height)
408 c.line(0, 0, A4_WIDTH, 0)
409 c.line(0, A4_HEIGHT, 0, 0)
410 c.line(half_width, A4_HEIGHT, half_width, 0)
411 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
413 new_pdf = pypdf.PdfReader(packet)
414 new_page.merge_page(new_pdf.pages[0])
415 printable_offset_x = printable_margin
416 printable_offset_y = printable_margin * A4_HEIGHT / A4_WIDTH
417 new_page.add_transformation(pypdf.Transformation().scale(printable_scale, printable_scale))
418 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
419 x_left_SPINE_LIMIT = half_width * bonus_shrink_factor
420 x_right_SPINE_LIMIT = A4_WIDTH - x_left_SPINE_LIMIT
421 if args.analyze or front_page:
422 packet = io.BytesIO()
423 c = canvas.Canvas(packet, pagesize=A4)
427 c.line(x_left_SPINE_LIMIT, A4_HEIGHT, x_left_SPINE_LIMIT, 0)
428 c.line(x_right_SPINE_LIMIT, A4_HEIGHT, x_right_SPINE_LIMIT, 0)
432 start_up_left_left_x = x_left_SPINE_LIMIT - 0.5 * CUT_WIDTH
433 start_up_left_right_x = x_left_SPINE_LIMIT + 0.5 * CUT_WIDTH
434 middle_point_up_left_y = half_height + MIDDLE_POINT_DEPTH
435 end_point_up_left_y = half_height + CUT_DEPTH
436 c.line(start_up_left_right_x, half_height, x_left_SPINE_LIMIT, end_point_up_left_y)
437 c.line(x_left_SPINE_LIMIT, end_point_up_left_y, x_left_SPINE_LIMIT, middle_point_up_left_y)
438 c.line(x_left_SPINE_LIMIT, middle_point_up_left_y, start_up_left_left_x, half_height)
440 start_down_right_left_x = x_right_SPINE_LIMIT - 0.5 * CUT_WIDTH
441 start_down_right_right_x = x_right_SPINE_LIMIT + 0.5 * CUT_WIDTH
442 middle_point_down_right_y = half_height - MIDDLE_POINT_DEPTH
443 end_point_down_right_y = half_height - CUT_DEPTH
444 c.line(start_down_right_left_x, half_height, x_right_SPINE_LIMIT, end_point_down_right_y)
445 c.line(x_right_SPINE_LIMIT, end_point_down_right_y, x_right_SPINE_LIMIT, middle_point_down_right_y)
446 c.line(x_right_SPINE_LIMIT, middle_point_down_right_y, start_down_right_right_x, half_height)
447 if args.analyze or front_page:
449 new_pdf = pypdf.PdfReader(packet)
450 new_page.merge_page(new_pdf.pages[0])
451 writer.add_page(new_page)
453 front_page = not front_page
456 for file in opened_files:
458 with open(args.output_file, 'wb') as output_file:
459 writer.write(output_file)
462 if __name__ == "__main__":
465 except ValueError as e: